Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v3 1/2] ethtool: implement Energy Detect Powerdown support via phy-tunable
From: Michal Kubecek @ 2019-09-10  8:00 UTC (permalink / raw)
  To: netdev
  Cc: Alexandru Ardelean, devicetree, linux-kernel, davem, robh+dt,
	mark.rutland, f.fainelli, hkallweit1, andrew
In-Reply-To: <20190909131251.3634-2-alexandru.ardelean@analog.com>

On Mon, Sep 09, 2019 at 04:12:50PM +0300, Alexandru Ardelean wrote:
> The `phy_tunable_id` has been named `ETHTOOL_PHY_EDPD` since it looks like
> this feature is common across other PHYs (like EEE), and defining
> `ETHTOOL_PHY_ENERGY_DETECT_POWER_DOWN` seems too long.
> 
> The way EDPD works, is that the RX block is put to a lower power mode,
> except for link-pulse detection circuits. The TX block is also put to low
> power mode, but the PHY wakes-up periodically to send link pulses, to avoid
> lock-ups in case the other side is also in EDPD mode.
> 
> Currently, there are 2 PHY drivers that look like they could use this new
> PHY tunable feature: the `adin` && `micrel` PHYs.
> 
> The ADIN's datasheet mentions that TX pulses are at intervals of 1 second
> default each, and they can be disabled. For the Micrel KSZ9031 PHY, the
> datasheet does not mention whether they can be disabled, but mentions that
> they can modified.
> 
> The way this change is structured, is similar to the PHY tunable downshift
> control:
> * a `ETHTOOL_PHY_EDPD_DFLT_TX_INTERVAL` value is exposed to cover a default
>   TX interval; some PHYs could specify a certain value that makes sense
> * `ETHTOOL_PHY_EDPD_NO_TX` would disable TX when EDPD is enabled
> * `ETHTOOL_PHY_EDPD_DISABLE` will disable EDPD
> 
> This should allow PHYs to:
> * enable EDPD and not enable TX pulses (interval would be 0)
> * enable EDPD and configure TX pulse interval; note that TX interval units
>   would be PHY specific; we could consider `seconds` as units, but it could
>   happen that some PHYs would be prefer milliseconds as a unit;
>   a maximum of 65533 units should be sufficient

Sorry for missing the discussion on previous version but I don't really
like the idea of leaving the choice of units to PHY. Both for manual
setting and system configuration, it would be IMHO much more convenient
to have the interpretation universal for all NICs.

Seconds as units seem too coarse and maximum of ~18 hours way too big.
Milliseconds would be more practical from granularity point of view,
would maximum of ~65 seconds be sufficient?

Michal Kubecek

> * disable EDPD
> 
> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>

^ permalink raw reply

* Re: [PATCH] tcp: fix tcp_disconnect() not clear tp->fastopen_rsk sometimes
From: David Miller @ 2019-09-10  8:09 UTC (permalink / raw)
  To: chunguo.feng
  Cc: edumazet, kuznet, yoshfuji, ast, daniel, netdev, kafai,
	songliubraving, yhs, linux-kernel, bpf
In-Reply-To: <20190906093429.930-1-chunguo.feng@amlogic.com>

From: chunguo feng <chunguo.feng@amlogic.com>
Date: Fri, 6 Sep 2019 17:34:29 +0800

> From: fengchunguo <chunguo.feng@amlogic.com>
> 
> This patch avoids fastopen_rsk not be cleared every times, then occur 
> the below BUG_ON:
> tcp_v4_destroy_sock
> 	->BUG_ON(tp->fastopen_rsk);
> 
> When playback some videos from netwrok,used tcp_disconnect continually.
 ...
> Signed-off-by: fengchunguo <chunguo.feng@amlogic.com>

This still needs review.

^ permalink raw reply

* Re: [PATCH net v2] bridge/mdb: remove wrong use of NLM_F_MULTI
From: David Miller @ 2019-09-10  8:13 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: roopa, netdev, bridge, nikolay
In-Reply-To: <20190906094703.21300-1-nicolas.dichtel@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Fri,  6 Sep 2019 11:47:02 +0200

> NLM_F_MULTI must be used only when a NLMSG_DONE message is sent at the end.
> In fact, NLMSG_DONE is sent only at the end of a dump.
> 
> Libraries like libnl will wait forever for NLMSG_DONE.
> 
> Fixes: 949f1e39a617 ("bridge: mdb: notify on router port add and del")
> CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH] net/mlx5: reduce stack usage in FW tracer
From: Arnd Bergmann @ 2019-09-10  8:14 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: cai@lca.pw, linux-rdma@vger.kernel.org, davem@davemloft.net,
	Moshe Shemesh, Feras Daoud, linux-kernel@vger.kernel.org,
	Eran Ben Elisha, netdev@vger.kernel.org, leon@kernel.org,
	Erez Shitrit
In-Reply-To: <5abccf6452a9d4efa2a1593c0af6d41703d4f16f.camel@mellanox.com>

On Mon, Sep 9, 2019 at 11:53 PM Saeed Mahameed <saeedm@mellanox.com> wrote:
> On Mon, 2019-09-09 at 22:18 +0200, Arnd Bergmann wrote:

> > To do this right, a better approach may be to just rely on ftrace,
> > storing
> > the (pointer to the) format string and the arguments in the buffer
> > without
> > creating a string. Would that be an option here?
>
> I am not sure how this would work, since the format parameters can
> changes depending on the FW string and the specific traces.

Ah, so the format string comes from the firmware? I didn't look
at the code in enough detail to understand why it's done like this,
only enough to notice that it's rather unusual.

Possibly trace_mlx5_fw might still get away with copying the format
string and the arguments, leaving the snprintf() to the time we read
the buffer, but I don't know enough about ftrace to be sure that
would actually work, and you'd need to duplicate it in
mlx5_devlink_fmsg_fill_trace().

> > A more minimal approach might be to move what is now the on-stack
> > buffer into the mlx5_fw_tracer function. I see that you already store
> > a copy of the string in there from mlx5_fw_tracer_save_trace(),
> > which conveniently also holds a mutex already that protects
> > it from concurrent access.
> >
>
> This sounds plausible.
>
> So for now let's do this or the noinline approach, Please let me know
> which one do you prefer, if it is the mutex protected buffer, i can do
> it myself.
>
> I will open an internal task and discussion then address your valuable
> points in a future submission, since we already in rc8 I don't want to
> take the risk now.

Yes, that sounds like a good plan. If you can't avoid the snprintf
entirely, then the mutex protected buffer should be helpful, and
also avoid a strncpy() along with the stack buffer.

      Arnd

^ permalink raw reply

* [RFC PATCH 0/4] mdev based hardware virtio offloading support
From: Jason Wang @ 2019-09-10  8:19 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev
  Cc: linux-kernel, kwankhede, alex.williamson, cohuck, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller, idos,
	xiao.w.wang, haotian.wang

Hi all:

There are hardware that can do virtio datapath offloading while having
its own control path. This path tries to implement a mdev based
unified API to support using kernel virtio driver to drive those
devices. This is done by introducing a new mdev transport for virtio
(virtio_mdev) and register itself as a new kind of mdev driver. Then
it provides a unified way for kernel virtio driver to talk with mdev
device implementation.

Though the series only contain kernel driver support, the goal is to
make the transport generic enough to support userspace drivers. This
means vhost-mdev[1] could be built on top as well by resuing the
transport.

A sample driver is also implemented which simulate a virito-net
loopback ethernet device on top of vringh + workqueue. This could be
used as a reference implementation for real hardware driver.

Notes:

- Some of the key transport command for vhost-mdev(userspace driver)
  is not introduced. This includes:
  1) set/get virtqueue state (idx etc), this could be simply done by
     introducing new transport command
  2) dirty pages tracking, could be simply done by introducing new
     transport command
  3) set/get device internal state, this requires more thought, of
     course we can introduce device specific transport command, but it
     would be better to have a unified API
- Current mdev_parent_ops assumes all pointers are userspace pointer,
  this block the kernel driver, this series just abuse those as kernel
  pointer and this could be addressed by inventing new parent_ops.
- For quick POC, mdev transport was just derived from virtio-MMIO,
  I'm pretty sure it has lots of space to be optimized, please share
  your thought.

Please review.

[1] https://lkml.org/lkml/2019/8/28/35

Jason Wang (4):
  vringh: fix copy direction of vringh_iov_push_kern()
  mdev: introduce helper to set per device dma ops
  virtio: introudce a mdev based transport
  docs: Sample driver to demonstrate how to implement virtio-mdev
    framework

 drivers/vfio/mdev/Kconfig        |   7 +
 drivers/vfio/mdev/Makefile       |   1 +
 drivers/vfio/mdev/mdev_core.c    |   7 +
 drivers/vfio/mdev/virtio_mdev.c  | 500 ++++++++++++++++++++
 drivers/vhost/vringh.c           |   8 +-
 include/linux/mdev.h             |   2 +
 include/uapi/linux/virtio_mdev.h | 131 ++++++
 samples/Kconfig                  |   7 +
 samples/vfio-mdev/Makefile       |   1 +
 samples/vfio-mdev/mvnet.c        | 766 +++++++++++++++++++++++++++++++
 10 files changed, 1429 insertions(+), 1 deletion(-)
 create mode 100644 drivers/vfio/mdev/virtio_mdev.c
 create mode 100644 include/uapi/linux/virtio_mdev.h
 create mode 100644 samples/vfio-mdev/mvnet.c

-- 
2.19.1


^ permalink raw reply

* [RFC PATCH 1/4] vringh: fix copy direction of vringh_iov_push_kern()
From: Jason Wang @ 2019-09-10  8:19 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev
  Cc: linux-kernel, kwankhede, alex.williamson, cohuck, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller, idos,
	xiao.w.wang, haotian.wang
In-Reply-To: <20190910081935.30516-1-jasowang@redhat.com>

We want to copy from iov to buf, so the direction was wrong.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vringh.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 08ad0d1f0476..a0a2d74967ef 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -852,6 +852,12 @@ static inline int xfer_kern(void *src, void *dst, size_t len)
 	return 0;
 }
 
+static inline int kern_xfer(void *dst, void *src, size_t len)
+{
+	memcpy(dst, src, len);
+	return 0;
+}
+
 /**
  * vringh_init_kern - initialize a vringh for a kernelspace vring.
  * @vrh: the vringh to initialize.
@@ -958,7 +964,7 @@ EXPORT_SYMBOL(vringh_iov_pull_kern);
 ssize_t vringh_iov_push_kern(struct vringh_kiov *wiov,
 			     const void *src, size_t len)
 {
-	return vringh_iov_xfer(wiov, (void *)src, len, xfer_kern);
+	return vringh_iov_xfer(wiov, (void *)src, len, kern_xfer);
 }
 EXPORT_SYMBOL(vringh_iov_push_kern);
 
-- 
2.19.1


^ permalink raw reply related

* Re: ❌ FAIL: Stable queue: queue-5.2
From: Hangbin Liu @ 2019-09-10  8:19 UTC (permalink / raw)
  To: CKI Project
  Cc: Linux Stable maillist, netdev, Jan Stancek, Xiumei Mu,
	David Howells, linux-afs
In-Reply-To: <cki.77A5953448.UY7ROQ6BKT@redhat.com>

On Wed, Aug 28, 2019 at 08:36:14AM -0400, CKI Project wrote:
> 
> Hello,
> 
> We ran automated tests on a patchset that was proposed for merging into this
> kernel tree. The patches were applied to:
> 
>        Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
>             Commit: f7d5b3dc4792 - Linux 5.2.10
> 
> The results of these automated tests are provided below.
> 
>     Overall result: FAILED (see details below)
>              Merge: OK
>            Compile: OK
>              Tests: FAILED
> 
> All kernel binaries, config files, and logs are available for download here:
> 
>   https://artifacts.cki-project.org/pipelines/128519
> 
> 
> 
> One or more kernel tests failed:
> 
>   x86_64:
>     ❌ Networking socket: fuzz

Sorry, maybe the info is a little late, I just found the call traces for this
failure.


[ 9492.446228] BUG: kernel NULL pointer dereference, address: 0000000000000010 
[ 9492.447493] #PF: supervisor write access in kernel mode 
[ 9492.448489] #PF: error_code(0x0002) - not-present page 
[ 9492.449410] PGD 800000010902c067 P4D 800000010902c067 PUD 104202067 PMD 0  
[ 9492.450663] Oops: 0002 [#1] SMP PTI 
[ 9492.451348] CPU: 0 PID: 19353 Comm: socket Tainted: G        W         5.2.10-f7d5b3d.cki #1 
[ 9492.453040] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 
[ 9492.454153] RIP: 0010:rxrpc_unuse_local+0xa/0x20 [rxrpc] 
[ 9492.455110] Code: ce e9 c4 fe ff ff 0f 0b e9 34 dd 00 00 e9 95 dd 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 b8 ff ff ff ff <3e> 0f c1 47 10 83 f8 01 74 05 e9 a7 f5 ff ff e9 e2 f7 ff ff 66 90 
[ 9492.458362] RSP: 0018:ffffa756008bbeb0 EFLAGS: 00010246 
[ 9492.459329] RAX: 00000000ffffffff RBX: ffff95fed42c0000 RCX: ffffc755ffc63b37 
[ 9492.460690] RDX: 0000000000000001 RSI: 0000000000000046 RDI: 0000000000000000 
[ 9492.461940] RBP: ffff95ff04fed000 R08: 0000000000000001 R09: ffffc755ffc63b60 
[ 9492.463220] R10: 0000000000000060 R11: 0000000000000000 R12: ffff95ff04fed0e4 
[ 9492.464508] R13: ffff95feaa84c780 R14: 0000000000000000 R15: 0000000000000000 
[ 9492.465781] FS:  00007f86bd101740(0000) GS:ffff95ffbba00000(0000) knlGS:0000000000000000 
[ 9492.467156] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[ 9492.468185] CR2: 0000000000000010 CR3: 000000002e34a004 CR4: 00000000007606f0 
[ 9492.469435] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[ 9492.470754] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 
[ 9492.472050] PKRU: 55555554 
[ 9492.472562] Call Trace: 
[ 9492.473025]  rxrpc_release+0x138/0x1e0 [rxrpc] 
[ 9492.473885]  __sock_release+0x89/0xa0 
[ 9492.474564]  __sys_socket+0xd4/0xf0 
[ 9492.475200]  __x64_sys_socket+0x16/0x20 
[ 9492.475903]  do_syscall_64+0x5f/0x1a0 
[ 9492.476551]  entry_SYSCALL_64_after_hwframe+0x44/0xa9 
[ 9492.477446] RIP: 0033:0x7f86bd20069b 
[ 9492.478094] Code: 73 01 c3 48 8b 0d ed 37 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 29 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd 37 0c 00 f7 d8 64 89 01 48 
[ 9492.481381] RSP: 002b:00007ffcbb797dc8 EFLAGS: 00000217 ORIG_RAX: 0000000000000029 
[ 9492.482744] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f86bd20069b 
[ 9492.483945] RDX: 000000000000000a RSI: 0000000000000002 RDI: 0000000000000021 
[ 9492.485220] RBP: 00007ffcbb797e10 R08: 00007f86bd2c41f4 R09: 00007f86bd2c4260 
[ 9492.486505] R10: 00000000ffffffff R11: 0000000000000217 R12: 00000000004012b0 
[ 9492.487769] R13: 00007ffcbb797ef0 R14: 0000000000000000 R15: 0000000000000000 
[ 9492.489048] Modules linked in: nfnetlink cmtp kernelcapi l2tp_ip6 l2tp_ip rfcomm pptp gre l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel bnep can_bcm hidp can_raw kcm pppoe pppox ppp_generic slhc vmw_vsock_vmci_transport vsock vmw_vmci psnap ieee802154_socket ieee802154 rose bluetooth ecdh_generic ecc mpls_router ip_tunnel netrom ax25 smc ib_core af_key fcrypt pcbc rxrpc nfc rfkill atm can mlx4_en mlx4_core nls_utf8 isofs dummy minix binfmt_misc nfsv3 nfs_acl nfs lockd grace fscache sctp rds brd vfat fat btrfs xor zstd_compress raid6_pq zstd_decompress loop tun ip6table_nat ip6_tables xt_conntrack iptable_filter xt_MASQUERADE xt_comment iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth bridge stp llc overlay fuse nfit libnvdimm sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_net pcspkr net_failover joydev failover virtio_balloon i2c_piix4 ip_tables xfs libcrc32c qxl drm_kms_helper ttm drm crc32c_intel virtio_blk serio_raw ata_generic pat
 a_acpi 
[ 9492.489083]  floppy qemu_fw_cfg [last unloaded: can] 
[ 9492.505349] CR2: 0000000000000010 
[ 9492.505948] ---[ end trace afa9902ac3c49830 ]--- 

Thanks
Hangbin
> 
> We hope that these logs can help you find the problem quickly. For the full
> detail on our testing procedures, please scroll to the bottom of this message.
> 
> Please reply to this email if you have any questions about the tests that we
> ran or if you have any suggestions on how to make future tests more effective.
> 
>         ,-.   ,-.
>        ( C ) ( K )  Continuous
>         `-',-.`-'   Kernel
>           ( I )     Integration
>            `-'
> ______________________________________________________________________________
> 
> Merge testing
> -------------
> 
> We cloned this repository and checked out the following commit:
> 
>   Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
>   Commit: f7d5b3dc4792 - Linux 5.2.10
> 
> 
> We grabbed the 54831dad38d2 commit of the stable queue repository.
> 
> We then merged the patchset with `git am`:
> 
>   asoc-simple_card_utils.h-care-null-dai-at-asoc_simpl.patch
>   asoc-simple-card-fix-an-use-after-free-in-simple_dai.patch
>   asoc-simple-card-fix-an-use-after-free-in-simple_for.patch
>   asoc-audio-graph-card-fix-use-after-free-in-graph_da.patch
>   asoc-audio-graph-card-fix-an-use-after-free-in-graph.patch
>   asoc-audio-graph-card-add-missing-const-at-graph_get.patch
>   regulator-axp20x-fix-dcdca-and-dcdcd-for-axp806.patch
>   regulator-axp20x-fix-dcdc5-and-dcdc6-for-axp803.patch
>   asoc-samsung-odroid-fix-an-use-after-free-issue-for-.patch
>   asoc-samsung-odroid-fix-a-double-free-issue-for-cpu_.patch
>   asoc-intel-bytcht_es8316-add-quirk-for-irbis-nb41-ne.patch
>   hid-logitech-hidpp-add-usb-pid-for-a-few-more-suppor.patch
>   hid-add-044f-b320-thrustmaster-inc.-2-in-1-dt.patch
>   mips-kernel-only-use-i8253-clocksource-with-periodic.patch
>   mips-fix-cacheinfo.patch
>   libbpf-sanitize-var-to-conservative-1-byte-int.patch
>   netfilter-ebtables-fix-a-memory-leak-bug-in-compat.patch
>   asoc-dapm-fix-handling-of-custom_stop_condition-on-d.patch
>   asoc-sof-use-__u32-instead-of-uint32_t-in-uapi-heade.patch
>   spi-pxa2xx-balance-runtime-pm-enable-disable-on-erro.patch
>   bpf-sockmap-sock_map_delete-needs-to-use-xchg.patch
>   bpf-sockmap-synchronize_rcu-before-free-ing-map.patch
>   bpf-sockmap-only-create-entry-if-ulp-is-not-already-.patch
>   selftests-bpf-fix-sendmsg6_prog-on-s390.patch
>   asoc-dapm-fix-a-memory-leak-bug.patch
>   bonding-force-slave-speed-check-after-link-state-rec.patch
>   net-mvpp2-don-t-check-for-3-consecutive-idle-frames-.patch
>   selftests-forwarding-gre_multipath-enable-ipv4-forwa.patch
>   selftests-forwarding-gre_multipath-fix-flower-filter.patch
>   selftests-bpf-add-another-gso_segs-access.patch
>   libbpf-fix-using-uninitialized-ioctl-results.patch
>   can-dev-call-netif_carrier_off-in-register_candev.patch
>   can-mcp251x-add-error-check-when-wq-alloc-failed.patch
>   can-gw-fix-error-path-of-cgw_module_init.patch
>   asoc-fail-card-instantiation-if-dai-format-setup-fai.patch
>   staging-fbtft-fix-gpio-handling.patch
>   libbpf-silence-gcc8-warning-about-string-truncation.patch
>   st21nfca_connectivity_event_received-null-check-the-.patch
>   st_nci_hci_connectivity_event_received-null-check-th.patch
>   nl-mac-80211-fix-interface-combinations-on-crypto-co.patch
>   asoc-ti-davinci-mcasp-fix-clk-pdir-handling-for-i2s-.patch
>   asoc-rockchip-fix-mono-capture.patch
>   asoc-ti-davinci-mcasp-correct-slot_width-posed-const.patch
>   net-usb-qmi_wwan-add-the-broadmobi-bm818-card.patch
>   qed-rdma-fix-the-hw_ver-returned-in-device-attribute.patch
>   isdn-misdn-hfcsusb-fix-possible-null-pointer-derefer.patch
>   habanalabs-fix-f-w-download-in-be-architecture.patch
>   mac80211_hwsim-fix-possible-null-pointer-dereference.patch
>   net-stmmac-manage-errors-returned-by-of_get_mac_addr.patch
>   netfilter-ipset-actually-allow-destination-mac-addre.patch
>   netfilter-ipset-copy-the-right-mac-address-in-bitmap.patch
>   netfilter-ipset-fix-rename-concurrency-with-listing.patch
>   rxrpc-fix-potential-deadlock.patch
>   rxrpc-fix-the-lack-of-notification-when-sendmsg-fail.patch
>   nvmem-use-the-same-permissions-for-eeprom-as-for-nvm.patch
>   iwlwifi-mvm-avoid-races-in-rate-init-and-rate-perfor.patch
>   iwlwifi-dbg_ini-move-iwl_dbg_tlv_load_bin-out-of-deb.patch
>   iwlwifi-dbg_ini-move-iwl_dbg_tlv_free-outside-of-deb.patch
>   iwlwifi-fix-locking-in-delayed-gtk-setting.patch
>   iwlwifi-mvm-send-lq-command-always-async.patch
>   enetc-fix-build-error-without-phylib.patch
>   isdn-hfcsusb-fix-misdn-driver-crash-caused-by-transf.patch
>   net-phy-phy_led_triggers-fix-a-possible-null-pointer.patch
>   perf-bench-numa-fix-cpu0-binding.patch
>   spi-pxa2xx-add-support-for-intel-tiger-lake.patch
>   can-sja1000-force-the-string-buffer-null-terminated.patch
>   can-peak_usb-force-the-string-buffer-null-terminated.patch
>   asoc-amd-acp3x-use-dma_ops-of-parent-device-for-acp3.patch
>   net-ethernet-qlogic-qed-force-the-string-buffer-null.patch
>   enetc-select-phylib-while-config_fsl_enetc_vf-is-set.patch
>   nfsv4-fix-a-credential-refcount-leak-in-nfs41_check_.patch
>   nfsv4-when-recovering-state-fails-with-eagain-retry-.patch
>   nfsv4.1-fix-open-stateid-recovery.patch
>   nfsv4.1-only-reap-expired-delegations.patch
>   nfsv4-fix-a-potential-sleep-while-atomic-in-nfs4_do_.patch
>   nfs-fix-regression-whereby-fscache-errors-are-appear.patch
>   hid-quirks-set-the-increment_usage_on_duplicate-quir.patch
>   hid-input-fix-a4tech-horizontal-wheel-custom-usage.patch
>   drm-rockchip-suspend-dp-late.patch
>   smb3-fix-potential-memory-leak-when-processing-compo.patch
>   smb3-kernel-oops-mounting-a-encryptdata-share-with-c.patch
>   sched-deadline-fix-double-accounting-of-rq-running-b.patch
>   sched-psi-reduce-psimon-fifo-priority.patch
>   sched-psi-do-not-require-setsched-permission-from-th.patch
>   s390-protvirt-avoid-memory-sharing-for-diag-308-set-.patch
>   s390-mm-fix-dump_pagetables-top-level-page-table-wal.patch
>   s390-put-_stext-and-_etext-into-.text-section.patch
>   ata-rb532_cf-fix-unused-variable-warning-in-rb532_pa.patch
>   net-cxgb3_main-fix-a-resource-leak-in-a-error-path-i.patch
>   net-stmmac-fix-issues-when-number-of-queues-4.patch
>   net-stmmac-tc-do-not-return-a-fragment-entry.patch
>   drm-amdgpu-pin-the-csb-buffer-on-hw-init-for-gfx-v8.patch
>   net-hisilicon-make-hip04_tx_reclaim-non-reentrant.patch
>   net-hisilicon-fix-hip04-xmit-never-return-tx_busy.patch
>   net-hisilicon-fix-dma_map_single-failed-on-arm64.patch
>   nfsv4-ensure-state-recovery-handles-etimedout-correc.patch
>   libata-have-ata_scsi_rw_xlat-fail-invalid-passthroug.patch
>   libata-add-sg-safety-checks-in-sff-pio-transfers.patch
>   x86-lib-cpu-address-missing-prototypes-warning.patch
>   drm-vmwgfx-fix-memory-leak-when-too-many-retries-hav.patch
>   block-aoe-fix-kernel-crash-due-to-atomic-sleep-when-.patch
>   block-bfq-handle-null-return-value-by-bfq_init_rq.patch
>   perf-ftrace-fix-failure-to-set-cpumask-when-only-one.patch
>   perf-cpumap-fix-writing-to-illegal-memory-in-handlin.patch
>   perf-pmu-events-fix-missing-cpu_clk_unhalted.core-ev.patch
>   dt-bindings-riscv-fix-the-schema-compatible-string-f.patch
>   kvm-arm64-don-t-write-junk-to-sysregs-on-reset.patch
>   kvm-arm-don-t-write-junk-to-cp15-registers-on-reset.patch
>   selftests-kvm-adding-config-fragments.patch
>   iwlwifi-mvm-disable-tx-amsdu-on-older-nics.patch
>   hid-wacom-correct-misreported-ekr-ring-values.patch
>   hid-wacom-correct-distance-scale-for-2nd-gen-intuos-devices.patch
>   revert-kvm-x86-mmu-zap-only-the-relevant-pages-when-removing-a-memslot.patch
>   revert-dm-bufio-fix-deadlock-with-loop-device.patch
>   clk-socfpga-stratix10-fix-rate-caclulationg-for-cnt_clks.patch
>   ceph-clear-page-dirty-before-invalidate-page.patch
>   ceph-don-t-try-fill-file_lock-on-unsuccessful-getfilelock-reply.patch
>   libceph-fix-pg-split-vs-osd-re-connect-race.patch
>   drm-amdgpu-gfx9-update-pg_flags-after-determining-if-gfx-off-is-possible.patch
>   drm-nouveau-don-t-retry-infinitely-when-receiving-no-data-on-i2c-over-aux.patch
>   scsi-ufs-fix-null-pointer-dereference-in-ufshcd_config_vreg_hpm.patch
>   gpiolib-never-report-open-drain-source-lines-as-input-to-user-space.patch
>   drivers-hv-vmbus-fix-virt_to_hvpfn-for-x86_pae.patch
>   userfaultfd_release-always-remove-uffd-flags-and-clear-vm_userfaultfd_ctx.patch
>   x86-retpoline-don-t-clobber-rflags-during-call_nospec-on-i386.patch
>   x86-apic-handle-missing-global-clockevent-gracefully.patch
>   x86-cpu-amd-clear-rdrand-cpuid-bit-on-amd-family-15h-16h.patch
>   x86-boot-save-fields-explicitly-zero-out-everything-else.patch
>   x86-boot-fix-boot-regression-caused-by-bootparam-sanitizing.patch
>   ib-hfi1-unsafe-psn-checking-for-tid-rdma-read-resp-packet.patch
>   ib-hfi1-add-additional-checks-when-handling-tid-rdma-read-resp-packet.patch
>   ib-hfi1-add-additional-checks-when-handling-tid-rdma-write-data-packet.patch
>   ib-hfi1-drop-stale-tid-rdma-packets-that-cause-tiderr.patch
>   psi-get-poll_work-to-run-when-calling-poll-syscall-next-time.patch
>   dm-kcopyd-always-complete-failed-jobs.patch
>   dm-dust-use-dust-block-size-for-badblocklist-index.patch
>   dm-btree-fix-order-of-block-initialization-in-btree_split_beneath.patch
>   dm-integrity-fix-a-crash-due-to-bug_on-in-__journal_read_write.patch
>   dm-raid-add-missing-cleanup-in-raid_ctr.patch
>   dm-space-map-metadata-fix-missing-store-of-apply_bops-return-value.patch
>   dm-table-fix-invalid-memory-accesses-with-too-high-sector-number.patch
>   dm-zoned-improve-error-handling-in-reclaim.patch
>   dm-zoned-improve-error-handling-in-i-o-map-code.patch
>   dm-zoned-properly-handle-backing-device-failure.patch
>   genirq-properly-pair-kobject_del-with-kobject_add.patch
>   mm-z3fold.c-fix-race-between-migration-and-destruction.patch
>   mm-page_alloc-move_freepages-should-not-examine-struct-page-of-reserved-memory.patch
>   mm-memcontrol-flush-percpu-vmstats-before-releasing-memcg.patch
>   mm-memcontrol-flush-percpu-vmevents-before-releasing-memcg.patch
>   mm-page_owner-handle-thp-splits-correctly.patch
>   mm-zsmalloc.c-migration-can-leave-pages-in-zs_empty-indefinitely.patch
>   mm-zsmalloc.c-fix-race-condition-in-zs_destroy_pool.patch
>   mm-kasan-fix-false-positive-invalid-free-reports-with-config_kasan_sw_tags-y.patch
>   xfs-fix-missing-ilock-unlock-when-xfs_setattr_nonsize-fails-due-to-edquot.patch
>   ib-hfi1-drop-stale-tid-rdma-packets.patch
>   dm-zoned-fix-potential-null-dereference-in-dmz_do_re.patch
>   io_uring-fix-potential-hang-with-polled-io.patch
>   io_uring-don-t-enter-poll-loop-if-we-have-cqes-pendi.patch
>   io_uring-add-need_resched-check-in-inner-poll-loop.patch
>   powerpc-allow-flush_-inval_-dcache_range-to-work-across-ranges-4gb.patch
>   rxrpc-fix-local-endpoint-refcounting.patch
>   rxrpc-fix-read-after-free-in-rxrpc_queue_local.patch
>   rxrpc-fix-local-endpoint-replacement.patch
> 
> Compile testing
> ---------------
> 
> We compiled the kernel for 3 architectures:
> 
>     aarch64:
>       make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
> 
>     ppc64le:
>       make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
> 
>     x86_64:
>       make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
> 
> 
> Hardware testing
> ----------------
> We booted each kernel and ran the following tests:
> 
>   aarch64:
>       Host 1:
>          ✅ Boot test [0]
>          ✅ xfstests: xfs [1]
>          ✅ selinux-policy: serge-testsuite [2]
>          ✅ lvm thinp sanity [3]
>          ✅ storage: software RAID testing [4]
>          🚧 ✅ Storage blktests [5]
> 
>       Host 2:
> 
>          ⚡ Internal infrastructure issues prevented one or more tests (marked
>          with ⚡⚡⚡) from running on this architecture.
>          This is not the fault of the kernel that was tested.
> 
>          ⚡⚡⚡ Boot test [0]
>          ⚡⚡⚡ Podman system integration test (as root) [6]
>          ⚡⚡⚡ Podman system integration test (as user) [6]
>          ⚡⚡⚡ Loopdev Sanity [7]
>          ⚡⚡⚡ jvm test suite [8]
>          ⚡⚡⚡ AMTU (Abstract Machine Test Utility) [9]
>          ⚡⚡⚡ LTP: openposix test suite [10]
>          ⚡⚡⚡ Ethernet drivers sanity [11]
>          ⚡⚡⚡ Networking socket: fuzz [12]
>          ⚡⚡⚡ audit: audit testsuite test [13]
>          ⚡⚡⚡ httpd: mod_ssl smoke sanity [14]
>          ⚡⚡⚡ iotop: sanity [15]
>          ⚡⚡⚡ tuned: tune-processes-through-perf [16]
>          ⚡⚡⚡ Usex - version 1.9-29 [17]
>          ⚡⚡⚡ storage: SCSI VPD [18]
>          ⚡⚡⚡ stress: stress-ng [19]
>          🚧 ⚡⚡⚡ LTP lite [20]
> 
> 
>   ppc64le:
>       Host 1:
>          ✅ Boot test [0]
>          ✅ xfstests: xfs [1]
>          ✅ selinux-policy: serge-testsuite [2]
>          ✅ lvm thinp sanity [3]
>          ✅ storage: software RAID testing [4]
>          🚧 ✅ Storage blktests [5]
> 
>       Host 2:
>          ✅ Boot test [0]
>          ✅ Podman system integration test (as root) [6]
>          ✅ Podman system integration test (as user) [6]
>          ✅ Loopdev Sanity [7]
>          ✅ jvm test suite [8]
>          ✅ AMTU (Abstract Machine Test Utility) [9]
>          ✅ LTP: openposix test suite [10]
>          ✅ Ethernet drivers sanity [11]
>          ✅ Networking socket: fuzz [12]
>          ✅ audit: audit testsuite test [13]
>          ✅ httpd: mod_ssl smoke sanity [14]
>          ✅ iotop: sanity [15]
>          ✅ tuned: tune-processes-through-perf [16]
>          ✅ Usex - version 1.9-29 [17]
>          🚧 ✅ LTP lite [20]
> 
> 
>   x86_64:
>       Host 1:
>          ✅ Boot test [0]
>          ✅ Podman system integration test (as root) [6]
>          ✅ Podman system integration test (as user) [6]
>          ✅ Loopdev Sanity [7]
>          ✅ jvm test suite [8]
>          ✅ AMTU (Abstract Machine Test Utility) [9]
>          ✅ LTP: openposix test suite [10]
>          ✅ Ethernet drivers sanity [11]
>          ❌ Networking socket: fuzz [12]
>          ⚡⚡⚡ audit: audit testsuite test [13]
>          ⚡⚡⚡ httpd: mod_ssl smoke sanity [14]
>          ⚡⚡⚡ iotop: sanity [15]
>          ⚡⚡⚡ tuned: tune-processes-through-perf [16]
>          ⚡⚡⚡ pciutils: sanity smoke test [21]
>          ⚡⚡⚡ Usex - version 1.9-29 [17]
>          ⚡⚡⚡ storage: SCSI VPD [18]
>          ⚡⚡⚡ stress: stress-ng [19]
>          🚧 ❌ LTP lite [20]
> 
>       Host 2:
>          ✅ Boot test [0]
>          ✅ xfstests: xfs [1]
>          ✅ selinux-policy: serge-testsuite [2]
>          ✅ lvm thinp sanity [3]
>          ✅ storage: software RAID testing [4]
>          🚧 ✅ Storage blktests [5]
> 
> 
>   Test source:
>     💚 Pull requests are welcome for new tests or improvements to existing tests!
>     [0]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution/kpkginstall
>     [1]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/filesystems/xfs/xfstests
>     [2]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/packages/selinux-policy/serge-testsuite
>     [3]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/lvm/thinp/sanity
>     [4]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/swraid/trim
>     [5]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/blk
>     [6]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/container/podman
>     [7]: https://github.com/CKI-project/tests-beaker/archive/master.zip#filesystems/loopdev/sanity
>     [8]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/jvm
>     [9]: https://github.com/CKI-project/tests-beaker/archive/master.zip#misc/amtu
>     [10]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution/ltp/openposix_testsuite
>     [11]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/networking/driver/sanity
>     [12]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/networking/socket/fuzz
>     [13]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/audit/audit-testsuite
>     [14]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/httpd/mod_ssl-smoke
>     [15]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/iotop/sanity
>     [16]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/tuned/tune-processes-through-perf
>     [17]: https://github.com/CKI-project/tests-beaker/archive/master.zip#standards/usex/1.9-29
>     [18]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/scsi/vpd
>     [19]: https://github.com/CKI-project/tests-beaker/archive/master.zip#stress/stress-ng
>     [20]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution/ltp-upstream/lite
>     [21]: https://github.com/CKI-project/tests-beaker/archive/master.zip#pciutils/sanity-smoke
> 
> Waived tests
> ------------
> If the test run included waived tests, they are marked with 🚧. Such tests are
> executed but their results are not taken into account. Tests are waived when
> their results are not reliable enough, e.g. when they're just introduced or are
> being fixed.

^ permalink raw reply

* [RFC PATCH 2/4] mdev: introduce helper to set per device dma ops
From: Jason Wang @ 2019-09-10  8:19 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev
  Cc: linux-kernel, kwankhede, alex.williamson, cohuck, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller, idos,
	xiao.w.wang, haotian.wang
In-Reply-To: <20190910081935.30516-1-jasowang@redhat.com>

This patch introduces mdev_set_dma_ops() which allows parent to set
per device DMA ops. This help for the kernel driver to setup a correct
DMA mappings.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vfio/mdev/mdev_core.c | 7 +++++++
 include/linux/mdev.h          | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b558d4cfd082..eb28552082d7 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -13,6 +13,7 @@
 #include <linux/uuid.h>
 #include <linux/sysfs.h>
 #include <linux/mdev.h>
+#include <linux/dma-mapping.h>
 
 #include "mdev_private.h"
 
@@ -27,6 +28,12 @@ static struct class_compat *mdev_bus_compat_class;
 static LIST_HEAD(mdev_list);
 static DEFINE_MUTEX(mdev_list_lock);
 
+void mdev_set_dma_ops(struct mdev_device *mdev, struct dma_map_ops *ops)
+{
+	set_dma_ops(&mdev->dev, ops);
+}
+EXPORT_SYMBOL(mdev_set_dma_ops);
+
 struct device *mdev_parent_dev(struct mdev_device *mdev)
 {
 	return mdev->parent->dev;
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index 0ce30ca78db0..7195f40bf8bf 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -145,4 +145,6 @@ struct device *mdev_parent_dev(struct mdev_device *mdev);
 struct device *mdev_dev(struct mdev_device *mdev);
 struct mdev_device *mdev_from_dev(struct device *dev);
 
+void mdev_set_dma_ops(struct mdev_device *mdev, struct dma_map_ops *ops);
+
 #endif /* MDEV_H */
-- 
2.19.1


^ permalink raw reply related

* [RFC PATCH 3/4] virtio: introudce a mdev based transport
From: Jason Wang @ 2019-09-10  8:19 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev
  Cc: linux-kernel, kwankhede, alex.williamson, cohuck, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller, idos,
	xiao.w.wang, haotian.wang
In-Reply-To: <20190910081935.30516-1-jasowang@redhat.com>

This path introduces a new mdev transport for virtio. This is used to
use kernel virtio driver to drive the mediated device that is capable
of populating virtqueue directly.

A new virtio-mdev driver will be registered to the mdev bus, when a
new virtio-mdev device is probed, it will register the device with
mdev based config ops. This means, unlike the exist hardware
transport, this is a software transport between mdev driver and mdev
device. The transport was implemented through:

- configuration access was implemented through parent_ops->read()/write()
- vq/config callback was implemented through parent_ops->ioctl()

This transport is derived from virtio MMIO protocol and was wrote for
kernel driver. But for the transport itself, but the design goal is to
be generic enough to support userspace driver (this part will be added
in the future).

Note:
- current mdev assume all the parameter of parent_ops was from
  userspace. This prevents us from implementing the kernel mdev
  driver. For a quick POC, this patch just abuse those parameter and
  assume the mdev device implementation will treat them as kernel
  pointer. This should be addressed in the formal series by extending
  mdev_parent_ops.
- for a quick POC, I just drive the transport from MMIO, I'm pretty
  there's lot of optimization space for this.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vfio/mdev/Kconfig        |   7 +
 drivers/vfio/mdev/Makefile       |   1 +
 drivers/vfio/mdev/virtio_mdev.c  | 500 +++++++++++++++++++++++++++++++
 include/uapi/linux/virtio_mdev.h | 131 ++++++++
 4 files changed, 639 insertions(+)
 create mode 100644 drivers/vfio/mdev/virtio_mdev.c
 create mode 100644 include/uapi/linux/virtio_mdev.h

diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
index 5da27f2100f9..c488c31fc137 100644
--- a/drivers/vfio/mdev/Kconfig
+++ b/drivers/vfio/mdev/Kconfig
@@ -16,3 +16,10 @@ config VFIO_MDEV_DEVICE
 	default n
 	help
 	  VFIO based driver for Mediated devices.
+
+config VIRTIO_MDEV_DEVICE
+	tristate "VIRTIO driver for Mediated devices"
+	depends on VFIO_MDEV && VIRTIO
+	default n
+	help
+	  VIRTIO based driver for Mediated devices.
diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile
index 101516fdf375..99d31e29c23e 100644
--- a/drivers/vfio/mdev/Makefile
+++ b/drivers/vfio/mdev/Makefile
@@ -4,3 +4,4 @@ mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o
 
 obj-$(CONFIG_VFIO_MDEV) += mdev.o
 obj-$(CONFIG_VFIO_MDEV_DEVICE) += vfio_mdev.o
+obj-$(CONFIG_VIRTIO_MDEV_DEVICE) += virtio_mdev.o
diff --git a/drivers/vfio/mdev/virtio_mdev.c b/drivers/vfio/mdev/virtio_mdev.c
new file mode 100644
index 000000000000..5ff09089297e
--- /dev/null
+++ b/drivers/vfio/mdev/virtio_mdev.c
@@ -0,0 +1,500 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VIRTIO based driver for Mediated device
+ *
+ * Copyright (c) 2019, Red Hat. All rights reserved.
+ *     Author: Jason Wang <jasowang@redhat.com>
+ *
+ * Based on Virtio MMIO driver.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/uuid.h>
+#include <linux/mdev.h>
+#include <linux/virtio.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_ring.h>
+#include <uapi/linux/virtio_mdev.h>
+#include "mdev_private.h"
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Red Hat Corporation"
+#define DRIVER_DESC     "VIRTIO based driver for Mediated device"
+
+#define to_virtio_mdev_device(dev) \
+	container_of(dev, struct virtio_mdev_device, vdev)
+
+struct virtio_mdev_device {
+	struct virtio_device vdev;
+	struct mdev_device *mdev;
+	unsigned long version;
+
+	struct virtqueue **vqs;
+	spinlock_t lock;
+};
+
+struct virtio_mdev_vq_info {
+	/* the actual virtqueue */
+	struct virtqueue *vq;
+
+	/* the list node for the virtqueues list */
+	struct list_head node;
+};
+
+static u32 virtio_mdev_readl(struct mdev_device *mdev,
+			     loff_t off)
+{
+	struct mdev_parent *parent = mdev->parent;
+	ssize_t len;
+	u32 val;
+
+	if (unlikely(!parent->ops->read))
+		return 0xFFFFFFFF;
+
+	len = parent->ops->read(mdev, (char *)&val, 4, &off);
+	if (len != 4)
+		return 0xFFFFFFFF;
+
+	return val;
+}
+
+static void virtio_mdev_writel(struct mdev_device *mdev,
+			       loff_t off, u32 val)
+{
+	struct mdev_parent *parent = mdev->parent;
+
+	if (unlikely(!parent->ops->write))
+		return;
+
+	parent->ops->write(mdev, (char *)&val, 4, &off);
+
+	return;
+}
+
+static void virtio_mdev_get(struct virtio_device *vdev, unsigned offset,
+			    void *buf, unsigned len)
+{
+	struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+	struct mdev_device *mdev = vm_dev->mdev;
+	struct mdev_parent *parent = mdev->parent;
+
+	loff_t off = offset + VIRTIO_MDEV_CONFIG;
+
+	switch (len) {
+	case 1:
+		*(u8 *)buf = parent->ops->read(mdev, buf, 1, &off);
+		break;
+	case 2:
+		*(u16 *)buf = parent->ops->read(mdev, buf, 2, &off);
+		break;
+	case 4:
+		*(u32 *)buf = parent->ops->read(mdev, buf, 4, &off);
+		break;
+	case 8:
+		*(u32 *)buf = parent->ops->read(mdev, buf, 4, &off);
+		*((u32 *)buf + 1) = parent->ops->read(mdev, buf, 4, &off);
+		break;
+	default:
+		BUG();
+	}
+
+	return;
+}
+
+static void virtio_mdev_set(struct virtio_device *vdev, unsigned offset,
+			    const void *buf, unsigned len)
+{
+	struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+	struct mdev_device *mdev = vm_dev->mdev;
+	struct mdev_parent *parent = mdev->parent;
+	loff_t off = offset + VIRTIO_MDEV_CONFIG;
+
+	switch (len) {
+	case 1:
+	case 2:
+	case 4:
+	case 8:
+		break;
+	default:
+		BUG();
+	}
+
+	parent->ops->write(mdev, buf, len, &off);
+
+	return;
+}
+
+static u32 virtio_mdev_generation(struct virtio_device *vdev)
+{
+	struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+
+	if (vm_dev->version == 1)
+		return 0;
+	else
+		return virtio_mdev_readl(vm_dev->mdev,
+					 VIRTIO_MDEV_CONFIG_GENERATION);
+}
+
+static u8 virtio_mdev_get_status(struct virtio_device *vdev)
+{
+	struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+
+	return virtio_mdev_readl(vm_dev->mdev, VIRTIO_MDEV_STATUS) & 0xff;
+}
+
+static void virtio_mdev_set_status(struct virtio_device *vdev, u8 status)
+{
+	struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_STATUS, status);
+}
+
+static void virtio_mdev_reset(struct virtio_device *vdev)
+{
+	struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_STATUS, 0);
+}
+
+static bool virtio_mdev_notify(struct virtqueue *vq)
+{
+	struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vq->vdev);
+
+	/* We write the queue's selector into the notification register to
+	 * signal the other end */
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_NOTIFY,
+			   vq->index);
+	return true;
+}
+
+static irqreturn_t virtio_mdev_config_cb(void *private)
+{
+	struct virtio_mdev_device *vm_dev = private;
+
+	virtio_config_changed(&vm_dev->vdev);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t virtio_mdev_virtqueue_cb(void *private)
+{
+	struct virtio_mdev_vq_info *info = private;
+
+	return vring_interrupt(0, info->vq);
+}
+
+static struct virtqueue *
+virtio_mdev_setup_vq(struct virtio_device *vdev, unsigned index,
+		     void (*callback)(struct virtqueue *vq),
+		     const char *name, bool ctx)
+{
+	struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+	struct mdev_device *mdev= vm_dev->mdev;
+	struct mdev_parent *parent = mdev->parent;
+	struct virtio_mdev_vq_info *info;
+	struct virtio_mdev_callback cb;
+	struct virtqueue *vq;
+	unsigned long flags;
+	u32 align, num;
+	u64 addr;
+	int err;
+
+	if (!name)
+		return NULL;
+
+	/* Select the queue we're interested in */
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_SEL, index);
+
+	/* Queue shouldn't already be set up. */
+	if (virtio_mdev_readl(vm_dev->mdev, VIRTIO_MDEV_QUEUE_READY)) {
+		err = -ENOENT;
+		goto error_available;
+	}
+
+	/* Allocate and fill out our active queue description */
+	info = kmalloc(sizeof(*info), GFP_KERNEL);
+	if (!info) {
+		err = -ENOMEM;
+		goto error_kmalloc;
+	}
+
+	num = virtio_mdev_readl(vm_dev->mdev, VIRTIO_MDEV_QUEUE_NUM_MAX);
+	if (num == 0) {
+		err = -ENOENT;
+		goto error_new_virtqueue;
+	}
+
+	/* Create the vring */
+	align = virtio_mdev_readl(vm_dev->mdev, VIRTIO_MDEV_QUEUE_ALIGN);
+	vq = vring_create_virtqueue(index, num, align, vdev,
+				    true, true, ctx,
+				    virtio_mdev_notify, callback, name);
+	if (!vq) {
+		err = -ENOMEM;
+		goto error_new_virtqueue;
+	}
+
+	/* Setup virtqueue callback */
+	cb.callback = virtio_mdev_virtqueue_cb;
+	cb.private = info;
+	err = parent->ops->ioctl(mdev, VIRTIO_MDEV_SET_VQ_CALLBACK,
+				 (unsigned long)&cb);
+	if (err) {
+		err = -EINVAL;
+		goto error_callback;
+	}
+
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_NUM,
+			   virtqueue_get_vring_size(vq));
+	addr = virtqueue_get_desc_addr(vq);
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_DESC_LOW, (u32)addr);
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_DESC_HIGH,
+			   (u32)(addr >> 32));
+
+	addr = virtqueue_get_avail_addr(vq);
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_AVAIL_LOW, (u32)addr);
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_AVAIL_HIGH,
+			   (u32)(addr >> 32));
+
+	addr = virtqueue_get_used_addr(vq);
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_USED_LOW, (u32)addr);
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_USED_HIGH, (u32)(addr >> 32));
+
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_READY, 1);
+
+	vq->priv = info;
+	info->vq = vq;
+
+	return vq;
+
+error_callback:
+	vring_del_virtqueue(vq);
+error_new_virtqueue:
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_READY, 0);
+	WARN_ON(virtio_mdev_readl(vm_dev->mdev, VIRTIO_MDEV_QUEUE_READY));
+	kfree(info);
+error_kmalloc:
+error_available:
+	return ERR_PTR(err);
+
+}
+
+static void virtio_mdev_del_vq(struct virtqueue *vq)
+{
+	struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vq->vdev);
+	struct virtio_mdev_vq_info *info = vq->priv;
+	unsigned long flags;
+	unsigned int index = vq->index;
+
+	/* Select and deactivate the queue */
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_SEL, index);
+	virtio_mdev_writel(vm_dev->mdev,VIRTIO_MDEV_QUEUE_READY, 0);
+	WARN_ON(virtio_mdev_readl(vm_dev->mdev, VIRTIO_MDEV_QUEUE_READY));
+
+	vring_del_virtqueue(vq);
+
+	kfree(info);
+}
+
+static void virtio_mdev_del_vqs(struct virtio_device *vdev)
+{
+	struct virtqueue *vq, *n;
+
+	list_for_each_entry_safe(vq, n, &vdev->vqs, list)
+		virtio_mdev_del_vq(vq);
+
+	return;
+}
+
+static int virtio_mdev_find_vqs(struct virtio_device *vdev, unsigned nvqs,
+				struct virtqueue *vqs[],
+				vq_callback_t *callbacks[],
+				const char * const names[],
+				const bool *ctx,
+				struct irq_affinity *desc)
+{
+	struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+	struct mdev_device *mdev = vm_dev->mdev;
+	struct mdev_parent *parent = mdev->parent;
+	struct virtio_mdev_callback cb;
+	int i, err, queue_idx = 0;
+	vm_dev->vqs = kmalloc_array(queue_idx, sizeof(*vm_dev->vqs),
+				    GFP_KERNEL);
+	if (!vm_dev->vqs)
+		return -ENOMEM;
+
+	for (i = 0; i < nvqs; ++i) {
+		if (!names[i]) {
+			vqs[i] = NULL;
+			continue;
+		}
+
+		vqs[i] = virtio_mdev_setup_vq(vdev, queue_idx++,
+					      callbacks[i], names[i], ctx ?
+					      ctx[i] : false);
+		if (IS_ERR(vqs[i])) {
+			err = PTR_ERR(vqs[i]);
+			goto err_setup_vq;
+		}
+	}
+
+	cb.callback = virtio_mdev_config_cb;
+	cb.private = vm_dev;
+	err = parent->ops->ioctl(mdev, VIRTIO_MDEV_SET_CONFIG_CALLBACK,
+				 (unsigned long)&cb);
+	if (err)
+		goto err_setup_vq;
+
+	return 0;
+
+err_setup_vq:
+	kfree(vm_dev->vqs);
+	virtio_mdev_del_vqs(vdev);
+	return err;
+}
+
+static u64 virtio_mdev_get_features(struct virtio_device *vdev)
+{
+	struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+	u64 features;
+
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_DEVICE_FEATURES_SEL, 1);
+	features = virtio_mdev_readl(vm_dev->mdev, VIRTIO_MDEV_DEVICE_FEATURES);
+	features <<= 32;
+
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_DEVICE_FEATURES_SEL, 0);
+	features |= virtio_mdev_readl(vm_dev->mdev, VIRTIO_MDEV_DEVICE_FEATURES);
+
+	return features;
+}
+
+static int virtio_mdev_finalize_features(struct virtio_device *vdev)
+{
+	struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+
+	/* Give virtio_ring a chance to accept features. */
+	vring_transport_features(vdev);
+
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_DRIVER_FEATURES_SEL, 1);
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_DRIVER_FEATURES,
+			   (u32)(vdev->features >> 32));
+
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_DRIVER_FEATURES_SEL, 0);
+	virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_DRIVER_FEATURES,
+			   (u32)vdev->features);
+
+	return 0;
+}
+
+static const char *virtio_mdev_bus_name(struct virtio_device *vdev)
+{
+	struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+	struct mdev_device *mdev = vm_dev->mdev;
+
+	return dev_name(&mdev->dev);
+}
+
+static const struct virtio_config_ops virtio_mdev_config_ops = {
+	.get		= virtio_mdev_get,
+	.set		= virtio_mdev_set,
+	.generation	= virtio_mdev_generation,
+	.get_status	= virtio_mdev_get_status,
+	.set_status	= virtio_mdev_set_status,
+	.reset		= virtio_mdev_reset,
+	.find_vqs	= virtio_mdev_find_vqs,
+	.del_vqs	= virtio_mdev_del_vqs,
+	.get_features	= virtio_mdev_get_features,
+	.finalize_features = virtio_mdev_finalize_features,
+	.bus_name	= virtio_mdev_bus_name,
+};
+
+static void virtio_mdev_release_dev(struct device *_d)
+{
+	struct virtio_device *vdev =
+	       container_of(_d, struct virtio_device, dev);
+	struct virtio_mdev_device *vm_dev =
+	       container_of(vdev, struct virtio_mdev_device, vdev);
+
+	devm_kfree(_d, vm_dev);
+}
+
+static int virtio_mdev_probe(struct device *dev)
+{
+	struct mdev_device *mdev = to_mdev_device(dev);
+	struct virtio_mdev_device *vm_dev;
+	unsigned long magic;
+	int rc;
+
+	magic = virtio_mdev_readl(mdev, VIRTIO_MDEV_MAGIC_VALUE);
+	if (magic != ('v' | 'i' << 8 | 'r' << 16 | 't' << 24)) {
+		dev_warn(dev, "Wrong magic value 0x%08lx!\n", magic);
+		return -ENODEV;
+	}
+
+	vm_dev = devm_kzalloc(dev, sizeof(*vm_dev), GFP_KERNEL);
+	if (!vm_dev)
+		return -ENOMEM;
+
+	vm_dev->vdev.dev.parent = dev;
+	vm_dev->vdev.dev.release = virtio_mdev_release_dev;
+	vm_dev->vdev.config = &virtio_mdev_config_ops;
+	vm_dev->mdev = mdev;
+	vm_dev->vqs = NULL;
+	spin_lock_init(&vm_dev->lock);
+
+	vm_dev->version = virtio_mdev_readl(mdev, VIRTIO_MDEV_VERSION);
+	if (vm_dev->version != 1) {
+		dev_err(dev, "Version %ld not supported!\n",
+			vm_dev->version);
+		return -ENXIO;
+	}
+
+	vm_dev->vdev.id.device = virtio_mdev_readl(mdev, VIRTIO_MDEV_DEVICE_ID);
+	if (vm_dev->vdev.id.device == 0)
+		return -ENODEV;
+
+	vm_dev->vdev.id.vendor = virtio_mdev_readl(mdev, VIRTIO_MDEV_VENDOR_ID);
+	rc = register_virtio_device(&vm_dev->vdev);
+	if (rc)
+		put_device(dev);
+
+	dev_set_drvdata(dev, vm_dev);
+
+	return rc;
+
+}
+
+static void virtio_mdev_remove(struct device *dev)
+{
+	struct virtio_mdev_device *vm_dev = dev_get_drvdata(dev);
+
+	unregister_virtio_device(&vm_dev->vdev);
+}
+
+static struct mdev_driver virtio_mdev_driver = {
+	.name	= "virtio_mdev",
+	.probe	= virtio_mdev_probe,
+	.remove	= virtio_mdev_remove,
+};
+
+static int __init virtio_mdev_init(void)
+{
+	return mdev_register_driver(&virtio_mdev_driver, THIS_MODULE);
+}
+
+static void __exit virtio_mdev_exit(void)
+{
+	mdev_unregister_driver(&virtio_mdev_driver);
+}
+
+module_init(virtio_mdev_init)
+module_exit(virtio_mdev_exit)
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/include/uapi/linux/virtio_mdev.h b/include/uapi/linux/virtio_mdev.h
new file mode 100644
index 000000000000..8040de6b960a
--- /dev/null
+++ b/include/uapi/linux/virtio_mdev.h
@@ -0,0 +1,131 @@
+/*
+ * Virtio mediated device driver
+ *
+ * Copyright 2019, Red Hat Corp.
+ *
+ * Based on Virtio MMIO driver by ARM Ltd, copyright ARM Ltd. 2011
+ *
+ * This header is BSD licensed so anyone can use the definitions to implement
+ * compatible drivers/servers.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+#ifndef _LINUX_VIRTIO_MDEV_H
+#define _LINUX_VIRTIO_MDEV_H
+
+#include <linux/interrupt.h>
+#include <linux/vringh.h>
+#include <uapi/linux/virtio_net.h>
+
+/*
+ * Ioctls
+ */
+
+struct virtio_mdev_callback {
+	irqreturn_t (*callback)(void *);
+	void *private;
+};
+
+#define VIRTIO_MDEV 0xAF
+#define VIRTIO_MDEV_SET_VQ_CALLBACK _IOW(VIRTIO_MDEV, 0x00, \
+					 struct virtio_mdev_callback)
+#define VIRTIO_MDEV_SET_CONFIG_CALLBACK _IOW(VIRTIO_MDEV, 0x01, \
+					struct virtio_mdev_callback)
+
+#define VIRTIO_MDEV_DEVICE_API_STRING		"virtio-mdev"
+
+/*
+ * Control registers
+ */
+
+/* Magic value ("virt" string) - Read Only */
+#define VIRTIO_MDEV_MAGIC_VALUE		0x000
+
+/* Virtio device version - Read Only */
+#define VIRTIO_MDEV_VERSION		0x004
+
+/* Virtio device ID - Read Only */
+#define VIRTIO_MDEV_DEVICE_ID		0x008
+
+/* Virtio vendor ID - Read Only */
+#define VIRTIO_MDEV_VENDOR_ID		0x00c
+
+/* Bitmask of the features supported by the device (host)
+ * (32 bits per set) - Read Only */
+#define VIRTIO_MDEV_DEVICE_FEATURES	0x010
+
+/* Device (host) features set selector - Write Only */
+#define VIRTIO_MDEV_DEVICE_FEATURES_SEL	0x014
+
+/* Bitmask of features activated by the driver (guest)
+ * (32 bits per set) - Write Only */
+#define VIRTIO_MDEV_DRIVER_FEATURES	0x020
+
+/* Activated features set selector - Write Only */
+#define VIRTIO_MDEV_DRIVER_FEATURES_SEL	0x024
+
+/* Queue selector - Write Only */
+#define VIRTIO_MDEV_QUEUE_SEL		0x030
+
+/* Maximum size of the currently selected queue - Read Only */
+#define VIRTIO_MDEV_QUEUE_NUM_MAX	0x034
+
+/* Queue size for the currently selected queue - Write Only */
+#define VIRTIO_MDEV_QUEUE_NUM		0x038
+
+/* Ready bit for the currently selected queue - Read Write */
+#define VIRTIO_MDEV_QUEUE_READY		0x044
+
+/* Alignment of virtqueue - Read Only */
+#define VIRTIO_MDEV_QUEUE_ALIGN		0x048
+
+/* Queue notifier - Write Only */
+#define VIRTIO_MDEV_QUEUE_NOTIFY	0x050
+
+/* Device status register - Read Write */
+#define VIRTIO_MDEV_STATUS		0x060
+
+/* Selected queue's Descriptor Table address, 64 bits in two halves */
+#define VIRTIO_MDEV_QUEUE_DESC_LOW	0x080
+#define VIRTIO_MDEV_QUEUE_DESC_HIGH	0x084
+
+/* Selected queue's Available Ring address, 64 bits in two halves */
+#define VIRTIO_MDEV_QUEUE_AVAIL_LOW	0x090
+#define VIRTIO_MDEV_QUEUE_AVAIL_HIGH	0x094
+
+/* Selected queue's Used Ring address, 64 bits in two halves */
+#define VIRTIO_MDEV_QUEUE_USED_LOW	0x0a0
+#define VIRTIO_MDEV_QUEUE_USED_HIGH	0x0a4
+
+/* Configuration atomicity value */
+#define VIRTIO_MDEV_CONFIG_GENERATION	0x0fc
+
+/* The config space is defined by each driver as
+ * the per-driver configuration space - Read Write */
+#define VIRTIO_MDEV_CONFIG		0x100
+
+#endif
+
+
+/* Ready bit for the currently selected queue - Read Write */
-- 
2.19.1


^ permalink raw reply related

* [RFC PATCH 4/4] docs: Sample driver to demonstrate how to implement virtio-mdev framework
From: Jason Wang @ 2019-09-10  8:19 UTC (permalink / raw)
  To: mst, jasowang, kvm, virtualization, netdev
  Cc: linux-kernel, kwankhede, alex.williamson, cohuck, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller, idos,
	xiao.w.wang, haotian.wang
In-Reply-To: <20190910081935.30516-1-jasowang@redhat.com>

This sample driver creates mdev device that simulate virtio net device
over virtio mdev transport. The device is implemented through vringh
and workqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 samples/Kconfig            |   7 +
 samples/vfio-mdev/Makefile |   1 +
 samples/vfio-mdev/mvnet.c  | 766 +++++++++++++++++++++++++++++++++++++
 3 files changed, 774 insertions(+)
 create mode 100644 samples/vfio-mdev/mvnet.c

diff --git a/samples/Kconfig b/samples/Kconfig
index c8dacb4dda80..a1a1ca2c00b7 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -131,6 +131,13 @@ config SAMPLE_VFIO_MDEV_MDPY
 	  mediated device.  It is a simple framebuffer and supports
 	  the region display interface (VFIO_GFX_PLANE_TYPE_REGION).
 
+config SAMPLE_VIRTIO_MDEV_NET
+        tristate "Build virtio mdev net example mediated device sample code -- loadable modules only"
+	depends on VIRTIO_MDEV_DEVICE && VHOST_RING && m
+	help
+	  Build a networking sample device for use as a virtio
+	  mediated device.
+
 config SAMPLE_VFIO_MDEV_MDPY_FB
 	tristate "Build VFIO mdpy example guest fbdev driver -- loadable module only"
 	depends on FB && m
diff --git a/samples/vfio-mdev/Makefile b/samples/vfio-mdev/Makefile
index 10d179c4fdeb..f34af90ed0a0 100644
--- a/samples/vfio-mdev/Makefile
+++ b/samples/vfio-mdev/Makefile
@@ -3,3 +3,4 @@ obj-$(CONFIG_SAMPLE_VFIO_MDEV_MTTY) += mtty.o
 obj-$(CONFIG_SAMPLE_VFIO_MDEV_MDPY) += mdpy.o
 obj-$(CONFIG_SAMPLE_VFIO_MDEV_MDPY_FB) += mdpy-fb.o
 obj-$(CONFIG_SAMPLE_VFIO_MDEV_MBOCHS) += mbochs.o
+obj-$(CONFIG_SAMPLE_VIRTIO_MDEV_NET) += mvnet.o
diff --git a/samples/vfio-mdev/mvnet.c b/samples/vfio-mdev/mvnet.c
new file mode 100644
index 000000000000..da295b41955e
--- /dev/null
+++ b/samples/vfio-mdev/mvnet.c
@@ -0,0 +1,766 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Mediated virtual virtio-net device driver.
+ *
+ * Copyright (c) 2019, Red Hat Inc. All rights reserved.
+ *     Author: Jason Wang <jasowang@redhat.com>
+ *
+ * Sample driver that creates mdev device that simulates ethernet
+ * device virtio mdev transport.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/fs.h>
+#include <linux/poll.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/uuid.h>
+#include <linux/iommu.h>
+#include <linux/sysfs.h>
+#include <linux/file.h>
+#include <linux/etherdevice.h>
+#include <linux/mdev.h>
+#include <uapi/linux/virtio_mdev.h>
+
+#define VERSION_STRING  "0.1"
+#define DRIVER_AUTHOR   "NVIDIA Corporation"
+
+#define MVNET_CLASS_NAME "mvnet"
+
+#define MVNET_NAME       "mvnet"
+
+/*
+ * Global Structures
+ */
+
+static struct mvnet_dev {
+	struct class	*vd_class;
+	struct idr	vd_idr;
+	struct device	dev;
+} mvnet_dev;
+
+struct mvnet_virtqueue {
+	struct vringh vring;
+	struct vringh_kiov iov;
+	unsigned short head;
+	bool ready;
+	u32 desc_addr_lo;
+	u32 desc_addr_hi;
+	u32 device_addr_lo;
+	u32 device_addr_hi;
+	u32 driver_addr_lo;
+	u32 driver_addr_hi;
+	u64 desc_addr;
+	u64 device_addr;
+	u64 driver_addr;
+	void *private;
+	irqreturn_t (*cb)(void *);
+};
+
+#define MVNET_QUEUE_ALIGN PAGE_SIZE
+#define MVNET_QUEUE_MAX 256
+#define MVNET_MAGIC_VALUE ('v' | 'i' << 8 | 'r' << 16 | 't' << 24)
+#define MVNET_VERSION 0x1 /* Implies virtio 1.0 */
+#define MVNET_DEVICE_ID 0x1 /* network card */
+#define MVNET_VENDOR_ID 0 /* is this correct ? */
+#define MVNET_DEVICE_FEATURES VIRTIO_F_VERSION_1
+
+u64 mvnet_features = (1ULL << VIRTIO_F_ANY_LAYOUT) |
+	             (1ULL << VIRTIO_F_VERSION_1) |
+		     (1ULL << VIRTIO_F_IOMMU_PLATFORM) ;
+
+/* State of each mdev device */
+struct mvnet_state {
+	struct mvnet_virtqueue vqs[2];
+	struct work_struct work;
+	spinlock_t lock;
+	struct mdev_device *mdev;
+	struct virtio_net_config config;
+	struct virtio_mdev_callback *cbs;
+	void *buffer;
+	u32 queue_sel;
+	u32 driver_features_sel;
+	u32 driver_features[2];
+	u32 device_features_sel;
+	u32 status;
+	u32 generation;
+	u32 num;
+	struct list_head next;
+};
+
+static struct mutex mdev_list_lock;
+static struct list_head mdev_devices_list;
+
+static void mvnet_queue_ready(struct mvnet_state *mvnet, unsigned idx)
+{
+	struct mvnet_virtqueue *vq = &mvnet->vqs[idx];
+	int ret;
+
+	vq->desc_addr = (u64)vq->desc_addr_hi << 32 | vq->desc_addr_lo;
+	vq->device_addr = (u64)vq->device_addr_hi << 32 | vq->device_addr_lo;
+	vq->driver_addr = (u64)vq->driver_addr_hi << 32 | vq->driver_addr_lo;
+
+	ret = vringh_init_kern(&vq->vring, mvnet_features, MVNET_QUEUE_MAX,
+			       false, (struct vring_desc *)vq->desc_addr,
+			       (struct vring_avail *)vq->driver_addr,
+			       (struct vring_used *)vq->device_addr);
+}
+
+static ssize_t mvnet_read_config(struct mdev_device *mdev,
+				 u32 *val, loff_t pos)
+{
+	struct mvnet_state *mvnet;
+	struct mvnet_virtqueue *vq;
+	u32 queue_sel;
+
+	if (!mdev || !val)
+		return -EINVAL;
+
+	mvnet = mdev_get_drvdata(mdev);
+	if (!mvnet) {
+		pr_err("%s mvnet not found\n", __func__);
+		return -EINVAL;
+	}
+
+	queue_sel = mvnet->queue_sel;
+	vq = &mvnet->vqs[queue_sel];
+
+	switch (pos) {
+	case VIRTIO_MDEV_MAGIC_VALUE:
+		*val = MVNET_MAGIC_VALUE;
+		break;
+	case VIRTIO_MDEV_VERSION:
+		*val = MVNET_VERSION;
+		break;
+	case VIRTIO_MDEV_DEVICE_ID:
+		*val = MVNET_DEVICE_ID;
+		break;
+	case VIRTIO_MDEV_VENDOR_ID:
+		*val = MVNET_VENDOR_ID;
+		break;
+	case VIRTIO_MDEV_DEVICE_FEATURES:
+		if (mvnet->device_features_sel)
+			*val = mvnet_features >> 32;
+		else
+			*val = mvnet_features;
+		break;
+	case VIRTIO_MDEV_QUEUE_NUM_MAX:
+		*val = MVNET_QUEUE_MAX;
+		break;
+	case VIRTIO_MDEV_QUEUE_READY:
+		*val = vq->ready;
+		break;
+	case VIRTIO_MDEV_QUEUE_ALIGN:
+		*val = MVNET_QUEUE_ALIGN;
+		break;
+	case VIRTIO_MDEV_STATUS:
+		*val = mvnet->status;
+		break;
+	case VIRTIO_MDEV_QUEUE_DESC_LOW:
+		*val = vq->desc_addr_lo;
+		break;
+	case VIRTIO_MDEV_QUEUE_DESC_HIGH:
+		*val = vq->desc_addr_hi;
+		break;
+	case VIRTIO_MDEV_QUEUE_AVAIL_LOW:
+		*val = vq->driver_addr_lo;
+		break;
+	case VIRTIO_MDEV_QUEUE_AVAIL_HIGH:
+		*val = vq->driver_addr_hi;
+		break;
+	case VIRTIO_MDEV_QUEUE_USED_LOW:
+		*val = vq->device_addr_lo;
+		break;
+	case VIRTIO_MDEV_QUEUE_USED_HIGH:
+		*val = vq->device_addr_hi;
+		break;
+	case VIRTIO_MDEV_CONFIG_GENERATION:
+		*val = 1;
+		break;
+	default:
+		pr_err("Unsupported mdev read offset at 0x%x\n", pos);
+		break;
+	}
+
+	return 4;
+}
+
+static ssize_t mvnet_read_net_config(struct mdev_device *mdev,
+				     char *buf, size_t count, loff_t pos)
+{
+	struct mvnet_state *mvnet = mdev_get_drvdata(mdev);
+
+	if (!mvnet) {
+		pr_err("%s mvnet not found\n", __func__);
+		return -EINVAL;
+	}
+
+	if (pos + count > sizeof(mvnet->config))
+		return -EINVAL;
+
+	memcpy(buf, &mvnet->config + (unsigned)pos, count);
+
+	return count;
+}
+
+static void mvnet_vq_reset(struct mvnet_virtqueue *vq)
+{
+	vq->ready = 0;
+	vq->desc_addr_lo = vq->desc_addr_hi = 0;
+	vq->device_addr_lo = vq->device_addr_hi = 0;
+	vq->driver_addr_lo = vq->driver_addr_hi = 0;
+	vq->desc_addr = 0;
+	vq->driver_addr = 0;
+	vq->device_addr = 0;
+	vringh_init_kern(&vq->vring, mvnet_features, MVNET_QUEUE_MAX,
+			false, 0, 0, 0);
+}
+
+static void mvnet_reset(struct mvnet_state *mvnet)
+{
+	int i;
+
+	for (i = 0; i < 2; i++)
+		mvnet_vq_reset(&mvnet->vqs[i]);
+
+	mvnet->queue_sel = 0;
+	mvnet->driver_features_sel = 0;
+	mvnet->device_features_sel = 0;
+	mvnet->status = 0;
+	++mvnet->generation;
+}
+
+static ssize_t mvnet_write_config(struct mdev_device *mdev,
+				  u32 *val, loff_t pos)
+{
+	struct mvnet_state *mvnet;
+	struct mvnet_virtqueue *vq;
+	u32 queue_sel;
+
+	if (!mdev || !val)
+		return -EINVAL;
+
+	mvnet = mdev_get_drvdata(mdev);
+	if (!mvnet) {
+		pr_err("%s mvnet not found\n", __func__);
+		return -EINVAL;
+	}
+
+	queue_sel = mvnet->queue_sel;
+	vq = &mvnet->vqs[queue_sel];
+
+	switch (pos) {
+	case VIRTIO_MDEV_DEVICE_FEATURES_SEL:
+		mvnet->device_features_sel = *val;
+		break;
+	case VIRTIO_MDEV_DRIVER_FEATURES:
+		mvnet->driver_features[mvnet->driver_features_sel] = *val;
+		break;
+	case VIRTIO_MDEV_DRIVER_FEATURES_SEL:
+		mvnet->driver_features_sel = *val;
+		break;
+	case VIRTIO_MDEV_QUEUE_SEL:
+		mvnet->queue_sel = *val;
+		break;
+	case VIRTIO_MDEV_QUEUE_NUM:
+		mvnet->num = *val;
+		break;
+	case VIRTIO_MDEV_QUEUE_READY:
+		vq->ready = *val;
+		if (vq->ready) {
+			spin_lock(&mvnet->lock);
+			mvnet_queue_ready(mvnet, queue_sel);
+			spin_unlock(&mvnet->lock);
+		}
+		break;
+	case VIRTIO_MDEV_QUEUE_NOTIFY:
+		if (vq->ready)
+			schedule_work(&mvnet->work);
+		break;
+	case VIRTIO_MDEV_STATUS:
+		mvnet->status = *val;
+		if (*val == 0) {
+			spin_lock(&mvnet->lock);
+			mvnet_reset(mvnet);
+			spin_unlock(&mvnet->lock);
+		}
+		break;
+	case VIRTIO_MDEV_QUEUE_DESC_LOW:
+		vq->desc_addr_lo = *val;
+		break;
+	case VIRTIO_MDEV_QUEUE_DESC_HIGH:
+		vq->desc_addr_hi = *val;
+		break;
+	case VIRTIO_MDEV_QUEUE_AVAIL_LOW:
+		vq->driver_addr_lo = *val;
+		break;
+	case VIRTIO_MDEV_QUEUE_AVAIL_HIGH:
+		vq->driver_addr_hi = *val;
+		break;
+	case VIRTIO_MDEV_QUEUE_USED_LOW:
+		vq->device_addr_lo = *val;
+		break;
+	case VIRTIO_MDEV_QUEUE_USED_HIGH:
+		vq->device_addr_hi = *val;
+		break;
+	default:
+		pr_err("Unsupported write offset! 0x%x\n", pos);
+		break;
+	}
+	spin_unlock(&mvnet->lock);
+
+	return 4;
+}
+
+static void mvnet_work(struct work_struct *work)
+{
+	struct mvnet_state *mvnet = container_of(work, struct
+						 mvnet_state, work);
+	struct mvnet_virtqueue *txq = &mvnet->vqs[1];
+	struct mvnet_virtqueue *rxq = &mvnet->vqs[0];
+	size_t read, write, total_write;
+	unsigned long flags;
+	int err;
+	int pkts = 0;
+
+	spin_lock(&mvnet->lock);
+
+	if (!txq->ready || !rxq->ready)
+		goto out;
+
+	while (true) {
+		total_write = 0;
+		err = vringh_getdesc_kern(&txq->vring, &txq->iov, NULL,
+					  &txq->head, GFP_KERNEL);
+		if (err <= 0)
+			break;
+
+		err = vringh_getdesc_kern(&rxq->vring, NULL, &rxq->iov,
+					  &rxq->head, GFP_KERNEL);
+		if (err <= 0) {
+			vringh_complete_kern(&txq->vring, txq->head, 0);
+			break;
+		}
+
+		while (true) {
+			read = vringh_iov_pull_kern(&txq->iov, mvnet->buffer,
+						    PAGE_SIZE);
+			if (read <= 0)
+				break;
+
+			write = vringh_iov_push_kern(&rxq->iov, mvnet->buffer,
+						     read);
+			if (write <= 0)
+				break;
+
+			total_write += write;
+		}
+
+		/* Make sure data is wrote before advancing index */
+		smp_wmb();
+
+		vringh_complete_kern(&txq->vring, txq->head, 0);
+		vringh_complete_kern(&rxq->vring, rxq->head, total_write);
+
+		/* Make sure used is visible before rasing the
+		   interrupt */
+		smp_wmb();
+
+		local_bh_disable();
+		if (txq->cb)
+			txq->cb(txq->private);
+		if (rxq->cb)
+			rxq->cb(rxq->private);
+		local_bh_enable();
+
+		pkts ++;
+		if (pkts > 4) {
+			schedule_work(&mvnet->work);
+			goto out;
+		}
+	}
+
+out:
+	spin_unlock(&mvnet->lock);
+}
+
+static dma_addr_t mvnet_map_page(struct device *dev, struct page *page,
+				 unsigned long offset, size_t size,
+				 enum dma_data_direction dir,
+				 unsigned long attrs)
+{
+	/* Vringh can only use VA */
+	return page_address(page) + offset;
+}
+
+static void mvnet_unmap_page(struct device *dev, dma_addr_t dma_addr,
+			     size_t size, enum dma_data_direction dir,
+			     unsigned long attrs)
+{
+	return ;
+}
+
+static void *mvnet_alloc_coherent(struct device *dev, size_t size,
+				  dma_addr_t *dma_addr, gfp_t flag,
+				  unsigned long attrs)
+{
+	void *ret = kmalloc(size, flag);
+
+	if (ret == NULL)
+		*dma_addr = DMA_MAPPING_ERROR;
+	else
+		*dma_addr = ret;
+
+	return ret;
+}
+
+static void mvnet_free_coherent(struct device *dev, size_t size,
+				void *vaddr, dma_addr_t dma_addr,
+				unsigned long attrs)
+{
+	kfree(dma_addr);
+}
+
+static const struct dma_map_ops mvnet_dma_ops = {
+	.map_page = mvnet_map_page,
+	.unmap_page = mvnet_unmap_page,
+	.alloc = mvnet_alloc_coherent,
+	.free = mvnet_free_coherent,
+};
+
+static int mvnet_create(struct kobject *kobj, struct mdev_device *mdev)
+{
+	struct mvnet_state *mvnet;
+	struct virtio_net_config *config;
+
+	if (!mdev)
+		return -EINVAL;
+
+	mvnet = kzalloc(sizeof(struct mvnet_state), GFP_KERNEL);
+	if (mvnet == NULL)
+		return -ENOMEM;
+
+	mvnet->buffer = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!mvnet->buffer) {
+		kfree(mvnet);
+		return -ENOMEM;
+	}
+
+	config = &mvnet->config;
+	config->mtu = 1500;
+	config->status = VIRTIO_NET_S_LINK_UP;
+	eth_random_addr(config->mac);
+
+	INIT_WORK(&mvnet->work, mvnet_work);
+
+	spin_lock_init(&mvnet->lock);
+	mvnet->mdev = mdev;
+	mdev_set_drvdata(mdev, mvnet);
+
+	mutex_lock(&mdev_list_lock);
+	list_add(&mvnet->next, &mdev_devices_list);
+	mutex_unlock(&mdev_list_lock);
+
+	mdev_set_dma_ops(mdev, &mvnet_dma_ops);
+
+	return 0;
+}
+
+static int mvnet_remove(struct mdev_device *mdev)
+{
+	struct mvnet_state *mds, *tmp_mds;
+	struct mvnet_state *mvnet = mdev_get_drvdata(mdev);
+	int ret = -EINVAL;
+
+	mutex_lock(&mdev_list_lock);
+	list_for_each_entry_safe(mds, tmp_mds, &mdev_devices_list, next) {
+		if (mvnet == mds) {
+			list_del(&mvnet->next);
+			mdev_set_drvdata(mdev, NULL);
+			kfree(mvnet->buffer);
+			kfree(mvnet);
+			ret = 0;
+			break;
+		}
+	}
+	mutex_unlock(&mdev_list_lock);
+
+	return ret;
+}
+
+static ssize_t mvnet_read(struct mdev_device *mdev, char __user *buf,
+			  size_t count, loff_t *ppos)
+{
+	ssize_t ret;
+
+	if (*ppos < VIRTIO_MDEV_CONFIG) {
+		if (count == 4)
+			ret = mvnet_read_config(mdev, (u32 *)buf, *ppos);
+		else
+			ret = -EINVAL;
+		*ppos += 4;
+	} else if (*ppos < VIRTIO_MDEV_CONFIG + sizeof(struct virtio_net_config)) {
+		ret = mvnet_read_net_config(mdev, buf, count,
+					    *ppos - VIRTIO_MDEV_CONFIG);
+		*ppos += count;
+	} else {
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static ssize_t mvnet_write(struct mdev_device *mdev, const char __user *buf,
+			   size_t count, loff_t *ppos)
+{
+	int ret;
+
+	if (*ppos < VIRTIO_MDEV_CONFIG) {
+		if (count == 4)
+			ret = mvnet_write_config(mdev, (u32 *)buf, *ppos);
+		else
+			ret = -EINVAL;
+		*ppos += 4;
+	} else {
+		/* No writable net config */
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static long mvnet_ioctl(struct mdev_device *mdev, unsigned int cmd,
+			unsigned long arg)
+{
+	int ret = 0;
+	struct mvnet_state *mvnet;
+	struct virtio_mdev_callback *cb;
+
+	if (!mdev)
+		return -EINVAL;
+
+	mvnet = mdev_get_drvdata(mdev);
+	if (!mvnet)
+		return -ENODEV;
+
+	spin_lock(&mvnet->lock);
+
+	switch (cmd) {
+	case VIRTIO_MDEV_SET_VQ_CALLBACK:
+		cb = (struct virtio_mdev_callback *)arg;
+		mvnet->vqs[mvnet->queue_sel].cb = cb->callback;
+		mvnet->vqs[mvnet->queue_sel].private = cb->private;
+		break;
+	case VIRTIO_MDEV_SET_CONFIG_CALLBACK:
+		break;
+	default:
+		pr_err("Not supportted ioctl cmd 0x%x\n", cmd);
+		ret = -ENOTTY;
+		break;
+	}
+
+	spin_unlock(&mvnet->lock);
+
+	return ret;
+}
+
+static int mvnet_open(struct mdev_device *mdev)
+{
+	pr_info("%s\n", __func__);
+	return 0;
+}
+
+static void mvnet_close(struct mdev_device *mdev)
+{
+	pr_info("%s\n", __func__);
+}
+
+static ssize_t
+sample_mvnet_dev_show(struct device *dev, struct device_attribute *attr,
+		     char *buf)
+{
+	return sprintf(buf, "This is phy device\n");
+}
+
+static DEVICE_ATTR_RO(sample_mvnet_dev);
+
+static struct attribute *mvnet_dev_attrs[] = {
+	&dev_attr_sample_mvnet_dev.attr,
+	NULL,
+};
+
+static const struct attribute_group mvnet_dev_group = {
+	.name  = "mvnet_dev",
+	.attrs = mvnet_dev_attrs,
+};
+
+static const struct attribute_group *mvnet_dev_groups[] = {
+	&mvnet_dev_group,
+	NULL,
+};
+
+static ssize_t
+sample_mdev_dev_show(struct device *dev, struct device_attribute *attr,
+		     char *buf)
+{
+	if (mdev_from_dev(dev))
+		return sprintf(buf, "This is MDEV %s\n", dev_name(dev));
+
+	return sprintf(buf, "\n");
+}
+
+static DEVICE_ATTR_RO(sample_mdev_dev);
+
+static struct attribute *mdev_dev_attrs[] = {
+	&dev_attr_sample_mdev_dev.attr,
+	NULL,
+};
+
+static const struct attribute_group mdev_dev_group = {
+	.name  = "vendor",
+	.attrs = mdev_dev_attrs,
+};
+
+static const struct attribute_group *mdev_dev_groups[] = {
+	&mdev_dev_group,
+	NULL,
+};
+
+#define MVNET_STRING_LEN 16
+
+static ssize_t
+name_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+	char name[MVNET_STRING_LEN];
+	const char *name_str = "virtio-net";
+
+	snprintf(name, MVNET_STRING_LEN, "%s", dev_driver_string(dev));
+	if (!strcmp(kobj->name, name))
+		return sprintf(buf, "%s\n", name_str);
+
+	return -EINVAL;
+}
+
+static MDEV_TYPE_ATTR_RO(name);
+
+static ssize_t
+available_instances_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+	return sprintf(buf, "%d\n", INT_MAX);
+}
+
+static MDEV_TYPE_ATTR_RO(available_instances);
+
+static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
+			       char *buf)
+{
+	return sprintf(buf, "%s\n", VIRTIO_MDEV_DEVICE_API_STRING);
+}
+
+static MDEV_TYPE_ATTR_RO(device_api);
+
+static struct attribute *mdev_types_attrs[] = {
+	&mdev_type_attr_name.attr,
+	&mdev_type_attr_device_api.attr,
+	&mdev_type_attr_available_instances.attr,
+	NULL,
+};
+
+static struct attribute_group mdev_type_group = {
+	.name  = "",
+	.attrs = mdev_types_attrs,
+};
+
+static struct attribute_group *mdev_type_groups[] = {
+	&mdev_type_group,
+	NULL,
+};
+
+static const struct mdev_parent_ops mdev_fops = {
+	.owner                  = THIS_MODULE,
+	.dev_attr_groups        = mvnet_dev_groups,
+	.mdev_attr_groups       = mdev_dev_groups,
+	.supported_type_groups  = mdev_type_groups,
+	.create                 = mvnet_create,
+	.remove			= mvnet_remove,
+	.open                   = mvnet_open,
+	.release                = mvnet_close,
+	.read                   = mvnet_read,
+	.write                  = mvnet_write,
+	.ioctl		        = mvnet_ioctl,
+};
+
+static void mvnet_device_release(struct device *dev)
+{
+	dev_dbg(dev, "mvnet: released\n");
+}
+
+static int __init mvnet_dev_init(void)
+{
+	int ret = 0;
+
+	pr_info("mvnet_dev: %s\n", __func__);
+
+	memset(&mvnet_dev, 0, sizeof(mvnet_dev));
+
+	idr_init(&mvnet_dev.vd_idr);
+
+	mvnet_dev.vd_class = class_create(THIS_MODULE, MVNET_CLASS_NAME);
+
+	if (IS_ERR(mvnet_dev.vd_class)) {
+		pr_err("Error: failed to register mvnet_dev class\n");
+		ret = PTR_ERR(mvnet_dev.vd_class);
+		goto failed1;
+	}
+
+	mvnet_dev.dev.class = mvnet_dev.vd_class;
+	mvnet_dev.dev.release = mvnet_device_release;
+	dev_set_name(&mvnet_dev.dev, "%s", MVNET_NAME);
+
+	ret = device_register(&mvnet_dev.dev);
+	if (ret)
+		goto failed2;
+
+	ret = mdev_register_device(&mvnet_dev.dev, &mdev_fops);
+	if (ret)
+		goto failed3;
+
+	mutex_init(&mdev_list_lock);
+	INIT_LIST_HEAD(&mdev_devices_list);
+
+	goto all_done;
+
+failed3:
+
+	device_unregister(&mvnet_dev.dev);
+failed2:
+	class_destroy(mvnet_dev.vd_class);
+
+failed1:
+all_done:
+	return ret;
+}
+
+static void __exit mvnet_dev_exit(void)
+{
+	mvnet_dev.dev.bus = NULL;
+	mdev_unregister_device(&mvnet_dev.dev);
+
+	device_unregister(&mvnet_dev.dev);
+	idr_destroy(&mvnet_dev.vd_idr);
+	class_destroy(mvnet_dev.vd_class);
+	mvnet_dev.vd_class = NULL;
+	pr_info("mvnet_dev: Unloaded!\n");
+}
+
+module_init(mvnet_dev_init)
+module_exit(mvnet_dev_exit)
+
+MODULE_LICENSE("GPL v2");
+MODULE_INFO(supported, "Test driver that simulate serial port over PCI");
+MODULE_VERSION(VERSION_STRING);
+MODULE_AUTHOR(DRIVER_AUTHOR);
-- 
2.19.1


^ permalink raw reply related

* Re: [PATCH] net/mlx4_en: ethtool: make array modes static const, makes object smaller
From: David Miller @ 2019-09-10  8:29 UTC (permalink / raw)
  To: colin.king; +Cc: tariqt, netdev, linux-rdma, kernel-janitors, linux-kernel
In-Reply-To: <20190906115348.16621-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Fri,  6 Sep 2019 12:53:48 +0100

> From: Colin Ian King <colin.king@canonical.com>
> 
> Don't populate the array modes on the stack but instead make it
> static const. Makes the object code smaller by 303 bytes.
> 
> Before:
>    text	   data	    bss	    dec	    hex	filename
>   51240	   5008	   1312	  57560	   e0d8 mellanox/mlx4/en_ethtool.o
> 
> After:
>    text	   data	    bss	    dec	    hex	filename
>   50937	   5008	   1312	  57257	   dfa9	mellanox/mlx4/en_ethtool.o
> 
> (gcc version 9.2.1, amd64)
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied to net-next.

^ permalink raw reply

* RE: [PATCH net-next v2 1/2] net: stmmac: Only enable enhanced addressing mode when needed
From: Jose Abreu @ 2019-09-10  8:32 UTC (permalink / raw)
  To: Thierry Reding, Jose Abreu
  Cc: David S . Miller, Giuseppe Cavallaro, Alexandre Torgue,
	Jon Hunter, Bitan Biswas, netdev@vger.kernel.org,
	linux-tegra@vger.kernel.org
In-Reply-To: <20190909191127.GA23804@mithrandir>

From: Thierry Reding <thierry.reding@gmail.com>
Date: Sep/09/2019, 20:11:27 (UTC+00:00)

> On Mon, Sep 09, 2019 at 04:07:04PM +0000, Jose Abreu wrote:
> > From: Thierry Reding <thierry.reding@gmail.com>
> > Date: Sep/09/2019, 16:25:45 (UTC+00:00)
> > 
> > > @@ -92,6 +92,7 @@ struct stmmac_dma_cfg {
> > >  	int fixed_burst;
> > >  	int mixed_burst;
> > >  	bool aal;
> > > +	bool eame;
> > 
> > bools should not be used in struct's, please change to int.
> 
> Huh? Since when? "aal" right above it is also bool. Can you provide a
> specific rationale for why we shouldn't use bool in structs?

Please see https://lkml.org/lkml/2017/11/21/384.

---
Thanks,
Jose 
Miguel Abreu

^ permalink raw reply

* RE: [PATCH net-next v2 2/2] net: stmmac: Support enhanced addressing mode for DWMAC 4.10
From: Jose Abreu @ 2019-09-10  8:35 UTC (permalink / raw)
  To: Thierry Reding, Jose Abreu
  Cc: David S . Miller, Giuseppe Cavallaro, Alexandre Torgue,
	Jon Hunter, Bitan Biswas, netdev@vger.kernel.org,
	linux-tegra@vger.kernel.org
In-Reply-To: <20190909191329.GB23804@mithrandir>

From: Thierry Reding <thierry.reding@gmail.com>
Date: Sep/09/2019, 20:13:29 (UTC+00:00)

> On Mon, Sep 09, 2019 at 04:05:52PM +0000, Jose Abreu wrote:
> > From: Thierry Reding <thierry.reding@gmail.com>
> > Date: Sep/09/2019, 16:25:46 (UTC+00:00)
> > 
> > > @@ -79,6 +79,10 @@ static void dwmac4_dma_init_rx_chan(void __iomem *ioaddr,
> > >  	value = value | (rxpbl << DMA_BUS_MODE_RPBL_SHIFT);
> > >  	writel(value, ioaddr + DMA_CHAN_RX_CONTROL(chan));
> > >  
> > > +	if (dma_cfg->eame)
> > 
> > There is no need for this check. If EAME is not enabled then upper 32 
> > bits will be zero.
> 
> The idea here was to potentially guard against this register not being
> available on some revisions. Having the check here would avoid access to
> the register if the device doesn't support enhanced addressing.

I see your point but I don't think there will be any problems unless you 
have some strange system that doesn't handle the write accesses to 
unimplemented features properly ...

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH] bpf: validate bpf_func when BPF_JIT is enabled
From: Yonghong Song @ 2019-09-10  8:37 UTC (permalink / raw)
  To: Sami Tolvanen, Alexei Starovoitov, Daniel Borkmann
  Cc: Kees Cook, Martin Lau, Song Liu, netdev@vger.kernel.org,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20190909223236.157099-1-samitolvanen@google.com>



On 9/9/19 11:32 PM, Sami Tolvanen wrote:
> With CONFIG_BPF_JIT, the kernel makes indirect calls to dynamically
> generated code. This change adds basic sanity checking to ensure
> we are jumping to a valid location, which narrows down the attack
> surface on the stored pointer. This also prepares the code for future
> Control-Flow Integrity (CFI) checking, which adds indirect call
> validation to call targets that can be determined at compile-time, but
> cannot validate calls to jited functions.
> 
> In addition, this change adds a weak arch_bpf_jit_check_func function,
> which architectures that implement BPF JIT can override to perform
> additional validation, such as verifying that the pointer points to
> the correct memory region.

You did not mention BPF_BINARY_HEADER_MAGIC and added member
of `magic` in bpf_binary_header. Could you add some details
on what is the purpose for this `magic` member?

> 
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> ---
>   include/linux/filter.h | 26 ++++++++++++++++++++++++--
>   kernel/bpf/core.c      | 25 +++++++++++++++++++++++++
>   2 files changed, 49 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 92c6e31fb008..abfb0e1b21a8 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -511,7 +511,10 @@ struct sock_fprog_kern {
>   	struct sock_filter	*filter;
>   };
>   
> +#define BPF_BINARY_HEADER_MAGIC	0x05de0e82
> +
>   struct bpf_binary_header {
> +	u32 magic;
>   	u32 pages;
>   	/* Some arches need word alignment for their instructions */
>   	u8 image[] __aligned(4);
> @@ -553,20 +556,39 @@ struct sk_filter {
>   
>   DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
>   
> +#ifdef CONFIG_BPF_JIT
> +/*
> + * With JIT, the kernel makes an indirect call to dynamically generated
> + * code. Use bpf_call_func to perform additional validation of the call
> + * target to narrow down attack surface. Architectures implementing BPF
> + * JIT can override arch_bpf_jit_check_func for arch-specific checking.
> + */
> +extern unsigned int bpf_call_func(const struct bpf_prog *prog,
> +				  const void *ctx);
> +
> +extern bool arch_bpf_jit_check_func(const struct bpf_prog *prog);
> +#else
> +static inline unsigned int bpf_call_func(const struct bpf_prog *prog,
> +					 const void *ctx)
> +{
> +	return prog->bpf_func(ctx, prog->insnsi);
> +}
> +#endif
> +
>   #define BPF_PROG_RUN(prog, ctx)	({				\
>   	u32 ret;						\
>   	cant_sleep();						\
>   	if (static_branch_unlikely(&bpf_stats_enabled_key)) {	\
>   		struct bpf_prog_stats *stats;			\
>   		u64 start = sched_clock();			\
> -		ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi);	\
> +		ret = bpf_call_func(prog, ctx);			\
>   		stats = this_cpu_ptr(prog->aux->stats);		\
>   		u64_stats_update_begin(&stats->syncp);		\
>   		stats->cnt++;					\
>   		stats->nsecs += sched_clock() - start;		\
>   		u64_stats_update_end(&stats->syncp);		\
>   	} else {						\
> -		ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi);	\
> +		ret = bpf_call_func(prog, ctx);			\
>   	}							\
>   	ret; })
>   
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 66088a9e9b9e..7aad58f67105 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -792,6 +792,30 @@ void __weak bpf_jit_free_exec(void *addr)
>   	module_memfree(addr);
>   }
>   
> +#ifdef CONFIG_BPF_JIT
> +bool __weak arch_bpf_jit_check_func(const struct bpf_prog *prog)
> +{
> +	return true;
> +}
> +
> +unsigned int bpf_call_func(const struct bpf_prog *prog, const void *ctx)
> +{
> +	const struct bpf_binary_header *hdr = bpf_jit_binary_hdr(prog);
> +
> +	if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited)
> +		return prog->bpf_func(ctx, prog->insnsi);
> +
> +	if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC ||
> +		     !arch_bpf_jit_check_func(prog))) {
> +		WARN(1, "attempt to jump to an invalid address");
> +		return 0;
> +	}
> +
> +	return prog->bpf_func(ctx, prog->insnsi);
> +}

The above can be rewritten as
	if (IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) || prog->jited ||
	    hdr->magic != BPF_BINARY_HEADER_MAGIC ||
	    !arch_bpf_jit_check_func(prog))) {
		WARN(1, "attempt to jump to an invalid address");
		return 0;
	}

	return prog->bpf_func(ctx, prog->insnsi);

BPF_PROG_RUN() will be called during xdp fast path.
Have you measured how much slowdown the above change could
cost for the performance?

> +EXPORT_SYMBOL_GPL(bpf_call_func);
> +#endif
> +
>   struct bpf_binary_header *
>   bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>   		     unsigned int alignment,
> @@ -818,6 +842,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>   	/* Fill space with illegal/arch-dep instructions. */
>   	bpf_fill_ill_insns(hdr, size);
>   
> +	hdr->magic = BPF_BINARY_HEADER_MAGIC;
>   	hdr->pages = pages;
>   	hole = min_t(unsigned int, size - (proglen + sizeof(*hdr)),
>   		     PAGE_SIZE - sizeof(*hdr));
> 

^ permalink raw reply

* [PATCH net] net: sonic: replace dev_kfree_skb in sonic_send_packet
From: Mao Wenan @ 2019-09-10  8:58 UTC (permalink / raw)
  To: tsbogend, davem; +Cc: netdev, linux-kernel, kernel-janitors, Mao Wenan

sonic_send_packet will be processed in irq or none
irq context, so it would better use dev_kfree_skb_any
instead of dev_kfree_skb.

Fixes: d9fb9f384292 ("*sonic/natsemi/ns83829: Move the National Semi-conductor drivers")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
---
 drivers/net/ethernet/natsemi/sonic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/natsemi/sonic.c b/drivers/net/ethernet/natsemi/sonic.c
index 18fd62fbfb64..b339125b2f09 100644
--- a/drivers/net/ethernet/natsemi/sonic.c
+++ b/drivers/net/ethernet/natsemi/sonic.c
@@ -233,7 +233,7 @@ static int sonic_send_packet(struct sk_buff *skb, struct net_device *dev)
 	laddr = dma_map_single(lp->device, skb->data, length, DMA_TO_DEVICE);
 	if (!laddr) {
 		pr_err_ratelimited("%s: failed to map tx DMA buffer.\n", dev->name);
-		dev_kfree_skb(skb);
+		dev_kfree_skb_any(skb);
 		return NETDEV_TX_OK;
 	}
 
-- 
2.20.1


^ permalink raw reply related

* Re: ❌ FAIL: Stable queue: queue-5.2
From: Greg KH @ 2019-09-10  8:58 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: CKI Project, Linux Stable maillist, netdev, Jan Stancek,
	Xiumei Mu, David Howells, linux-afs
In-Reply-To: <20190910081956.GG22496@dhcp-12-139.nay.redhat.com>

On Tue, Sep 10, 2019 at 04:19:56PM +0800, Hangbin Liu wrote:
> On Wed, Aug 28, 2019 at 08:36:14AM -0400, CKI Project wrote:
> > 
> > Hello,
> > 
> > We ran automated tests on a patchset that was proposed for merging into this
> > kernel tree. The patches were applied to:
> > 
> >        Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
> >             Commit: f7d5b3dc4792 - Linux 5.2.10
> > 
> > The results of these automated tests are provided below.
> > 
> >     Overall result: FAILED (see details below)
> >              Merge: OK
> >            Compile: OK
> >              Tests: FAILED
> > 
> > All kernel binaries, config files, and logs are available for download here:
> > 
> >   https://artifacts.cki-project.org/pipelines/128519
> > 
> > 
> > 
> > One or more kernel tests failed:
> > 
> >   x86_64:
> >     ❌ Networking socket: fuzz
> 
> Sorry, maybe the info is a little late, I just found the call traces for this
> failure.

And this is no longer failing?

What is the "fuzz" test?

greg k-h

^ permalink raw reply

* [PATCH net-next 1/7] net: hns3: add ethtool_ops.set_channels support for HNS3 VF driver
From: Huazhong Tan @ 2019-09-10  8:58 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski, Guangbin Huang, Huazhong Tan
In-Reply-To: <1568105908-60983-1-git-send-email-tanhuazhong@huawei.com>

From: Guangbin Huang <huangguangbin2@huawei.com>

This patch adds ethtool_ops.set_channels support for HNS3 VF driver,
and updates related TQP information and RSS information, to support
modification of VF TQP number, and uses current rss_size instead of
max_rss_size to initialize RSS.

Also, fixes a format error in hclgevf_get_rss().

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c |  1 +
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 87 ++++++++++++++++++++--
 2 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index aa692b1..f5a681d 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -1397,6 +1397,7 @@ static const struct ethtool_ops hns3vf_ethtool_ops = {
 	.set_rxfh = hns3_set_rss,
 	.get_link_ksettings = hns3_get_link_ksettings,
 	.get_channels = hns3_get_channels,
+	.set_channels = hns3_set_channels,
 	.get_coalesce = hns3_get_coalesce,
 	.set_coalesce = hns3_set_coalesce,
 	.get_regs_len = hns3_get_regs_len,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index 594cae8..d77dcc2 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -743,7 +743,7 @@ static int hclgevf_get_rss(struct hnae3_handle *handle, u32 *indir, u8 *key,
 }
 
 static int hclgevf_set_rss(struct hnae3_handle *handle, const u32 *indir,
-			   const  u8 *key, const  u8 hfunc)
+			   const u8 *key, const u8 hfunc)
 {
 	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
 	struct hclgevf_rss_cfg *rss_cfg = &hdev->rss_cfg;
@@ -2060,9 +2060,10 @@ static int hclgevf_config_gro(struct hclgevf_dev *hdev, bool en)
 static int hclgevf_rss_init_hw(struct hclgevf_dev *hdev)
 {
 	struct hclgevf_rss_cfg *rss_cfg = &hdev->rss_cfg;
-	int i, ret;
+	int ret;
+	u32 i;
 
-	rss_cfg->rss_size = hdev->rss_size_max;
+	rss_cfg->rss_size = hdev->nic.kinfo.rss_size;
 
 	if (hdev->pdev->revision >= 0x21) {
 		rss_cfg->hash_algo = HCLGEVF_RSS_HASH_ALGO_SIMPLE;
@@ -2099,13 +2100,13 @@ static int hclgevf_rss_init_hw(struct hclgevf_dev *hdev)
 
 	/* Initialize RSS indirect table */
 	for (i = 0; i < HCLGEVF_RSS_IND_TBL_SIZE; i++)
-		rss_cfg->rss_indirection_tbl[i] = i % hdev->rss_size_max;
+		rss_cfg->rss_indirection_tbl[i] = i % rss_cfg->rss_size;
 
 	ret = hclgevf_set_rss_indir_table(hdev);
 	if (ret)
 		return ret;
 
-	return hclgevf_set_rss_tc_mode(hdev, hdev->rss_size_max);
+	return hclgevf_set_rss_tc_mode(hdev, rss_cfg->rss_size);
 }
 
 static int hclgevf_init_vlan_config(struct hclgevf_dev *hdev)
@@ -2835,6 +2836,81 @@ static void hclgevf_get_tqps_and_rss_info(struct hnae3_handle *handle,
 	*max_rss_size = hdev->rss_size_max;
 }
 
+static void hclgevf_update_rss_size(struct hnae3_handle *handle,
+				    u32 new_tqps_num)
+{
+	struct hnae3_knic_private_info *kinfo = &handle->kinfo;
+	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+	u16 max_rss_size;
+
+	kinfo->req_rss_size = new_tqps_num;
+
+	max_rss_size = min_t(u16, hdev->rss_size_max,
+			     hdev->num_tqps / kinfo->num_tc);
+
+	/* Set to user value, no larger than max_rss_size. */
+	if (kinfo->req_rss_size != kinfo->rss_size && kinfo->req_rss_size &&
+	    kinfo->req_rss_size <= max_rss_size) {
+		dev_info(&hdev->pdev->dev, "rss changes from %u to %u\n",
+			 kinfo->rss_size, kinfo->req_rss_size);
+		kinfo->rss_size = kinfo->req_rss_size;
+	} else if (kinfo->rss_size > max_rss_size ||
+		   (!kinfo->req_rss_size && kinfo->rss_size < max_rss_size)) {
+		/* Set to the maximum specification value (max_rss_size). */
+		dev_info(&hdev->pdev->dev, "rss changes from %u to %u\n",
+			 kinfo->rss_size, max_rss_size);
+		kinfo->rss_size = max_rss_size;
+	}
+
+	kinfo->num_tqps = kinfo->num_tc * kinfo->rss_size;
+}
+
+static int hclgevf_set_channels(struct hnae3_handle *handle, u32 new_tqps_num,
+				bool rxfh_configured)
+{
+	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+	struct hnae3_knic_private_info *kinfo = &handle->kinfo;
+	u16 cur_rss_size = kinfo->rss_size;
+	u16 cur_tqps = kinfo->num_tqps;
+	u32 *rss_indir;
+	unsigned int i;
+	int ret;
+
+	hclgevf_update_rss_size(handle, new_tqps_num);
+
+	ret = hclgevf_set_rss_tc_mode(hdev, kinfo->rss_size);
+	if (ret)
+		return ret;
+
+	/* RSS indirection table has been configuared by user */
+	if (rxfh_configured)
+		goto out;
+
+	/* Reinitializes the rss indirect table according to the new RSS size */
+	rss_indir = kcalloc(HCLGEVF_RSS_IND_TBL_SIZE, sizeof(u32), GFP_KERNEL);
+	if (!rss_indir)
+		return -ENOMEM;
+
+	for (i = 0; i < HCLGEVF_RSS_IND_TBL_SIZE; i++)
+		rss_indir[i] = i % kinfo->rss_size;
+
+	ret = hclgevf_set_rss(handle, rss_indir, NULL, 0);
+	if (ret)
+		dev_err(&hdev->pdev->dev, "set rss indir table fail, ret=%d\n",
+			ret);
+
+	kfree(rss_indir);
+
+out:
+	if (!ret)
+		dev_info(&hdev->pdev->dev,
+			 "Channels changed, rss_size from %u to %u, tqps from %u to %u",
+			 cur_rss_size, kinfo->rss_size,
+			 cur_tqps, kinfo->rss_size * kinfo->num_tc);
+
+	return ret;
+}
+
 static int hclgevf_get_status(struct hnae3_handle *handle)
 {
 	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
@@ -3042,6 +3118,7 @@ static const struct hnae3_ae_ops hclgevf_ops = {
 	.enable_hw_strip_rxvtag = hclgevf_en_hw_strip_rxvtag,
 	.reset_event = hclgevf_reset_event,
 	.set_default_reset_request = hclgevf_set_def_reset_request,
+	.set_channels = hclgevf_set_channels,
 	.get_channels = hclgevf_get_channels,
 	.get_tqps_and_rss_info = hclgevf_get_tqps_and_rss_info,
 	.get_regs_len = hclgevf_get_regs_len,
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next 4/7] net: hns3: fix port setting handle for fibre port
From: Huazhong Tan @ 2019-09-10  8:58 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski, Guangbin Huang, Huazhong Tan
In-Reply-To: <1568105908-60983-1-git-send-email-tanhuazhong@huawei.com>

From: Guangbin Huang <huangguangbin2@huawei.com>

For hardware doesn't support use specified speed and duplex
to negotiate, it's unnecessary to check and modify the port
speed and duplex for fibre port when autoneg is on.

Fixes: 22f48e24a23d ("net: hns3: add autoneg and change speed support for fibre port")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index f5a681d..680c350 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -726,6 +726,12 @@ static int hns3_check_ksettings_param(const struct net_device *netdev,
 	u8 duplex;
 	int ret;
 
+	/* hw doesn't support use specified speed and duplex to negotiate,
+	 * unnecessary to check them when autoneg on.
+	 */
+	if (cmd->base.autoneg)
+		return 0;
+
 	if (ops->get_ksettings_an_result) {
 		ops->get_ksettings_an_result(handle, &autoneg, &speed, &duplex);
 		if (cmd->base.autoneg == autoneg && cmd->base.speed == speed &&
@@ -787,6 +793,15 @@ static int hns3_set_link_ksettings(struct net_device *netdev,
 			return ret;
 	}
 
+	/* hw doesn't support use specified speed and duplex to negotiate,
+	 * ignore them when autoneg on.
+	 */
+	if (cmd->base.autoneg) {
+		netdev_info(netdev,
+			    "autoneg is on, ignore the speed and duplex\n");
+		return 0;
+	}
+
 	if (ops->cfg_mac_speed_dup_h)
 		ret = ops->cfg_mac_speed_dup_h(handle, cmd->base.speed,
 					       cmd->base.duplex);
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next 5/7] net: hns3: modify some logs format
From: Huazhong Tan @ 2019-09-10  8:58 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski, Guangbin Huang, Huazhong Tan
In-Reply-To: <1568105908-60983-1-git-send-email-tanhuazhong@huawei.com>

From: Guangbin Huang <huangguangbin2@huawei.com>

The pfc_en and pfc_map need to be displayed in hexadecimal notation,
printing dma address should use %pad, and the end of printed string
needs to be add "\n".

This patch modifies them.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c      | 7 +++++--
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c  | 2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 +-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
index 5cf4c1e..28961a6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
@@ -166,6 +166,7 @@ static int hns3_dbg_bd_info(struct hnae3_handle *h, const char *cmd_buf)
 	struct hns3_enet_ring *ring;
 	u32 tx_index, rx_index;
 	u32 q_num, value;
+	dma_addr_t addr;
 	int cnt;
 
 	cnt = sscanf(&cmd_buf[8], "%u %u", &q_num, &tx_index);
@@ -194,8 +195,9 @@ static int hns3_dbg_bd_info(struct hnae3_handle *h, const char *cmd_buf)
 	}
 
 	tx_desc = &ring->desc[tx_index];
+	addr = le64_to_cpu(tx_desc->addr);
 	dev_info(dev, "TX Queue Num: %u, BD Index: %u\n", q_num, tx_index);
-	dev_info(dev, "(TX)addr: 0x%llx\n", tx_desc->addr);
+	dev_info(dev, "(TX)addr: %pad\n", &addr);
 	dev_info(dev, "(TX)vlan_tag: %u\n", tx_desc->tx.vlan_tag);
 	dev_info(dev, "(TX)send_size: %u\n", tx_desc->tx.send_size);
 	dev_info(dev, "(TX)vlan_tso: %u\n", tx_desc->tx.type_cs_vlan_tso);
@@ -217,8 +219,9 @@ static int hns3_dbg_bd_info(struct hnae3_handle *h, const char *cmd_buf)
 	rx_index = (cnt == 1) ? value : tx_index;
 	rx_desc	 = &ring->desc[rx_index];
 
+	addr = le64_to_cpu(rx_desc->addr);
 	dev_info(dev, "RX Queue Num: %u, BD Index: %u\n", q_num, rx_index);
-	dev_info(dev, "(RX)addr: 0x%llx\n", rx_desc->addr);
+	dev_info(dev, "(RX)addr: %pad\n", &addr);
 	dev_info(dev, "(RX)l234_info: %u\n", rx_desc->rx.l234_info);
 	dev_info(dev, "(RX)pkt_len: %u\n", rx_desc->rx.pkt_len);
 	dev_info(dev, "(RX)size: %u\n", rx_desc->rx.size);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
index 816f920..c063301 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
@@ -342,7 +342,7 @@ static int hclge_ieee_setpfc(struct hnae3_handle *h, struct ieee_pfc *pfc)
 	hdev->tm_info.pfc_en = pfc->pfc_en;
 
 	netif_dbg(h, drv, netdev,
-		  "set pfc: pfc_en=%u, pfc_map=%u, num_tc=%u\n",
+		  "set pfc: pfc_en=%x, pfc_map=%x, num_tc=%u\n",
 		  pfc->pfc_en, pfc_map, hdev->tm_info.num_tc);
 
 	hclge_tm_pfc_info_update(hdev);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 8d4dc1b..bc5bad3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3751,7 +3751,7 @@ static void hclge_reset_event(struct pci_dev *pdev, struct hnae3_handle *handle)
 	else if (time_after(jiffies, (hdev->last_reset_time + 4 * 5 * HZ)))
 		hdev->reset_level = HNAE3_FUNC_RESET;
 
-	dev_info(&hdev->pdev->dev, "received reset event , reset type is %d",
+	dev_info(&hdev->pdev->dev, "received reset event, reset type is %d\n",
 		 hdev->reset_level);
 
 	/* request reset & schedule reset task */
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next 3/7] net: hns3: fix shaper parameter algorithm
From: Huazhong Tan @ 2019-09-10  8:58 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski, Yonglong Liu, Huazhong Tan
In-Reply-To: <1568105908-60983-1-git-send-email-tanhuazhong@huawei.com>

From: Yonglong Liu <liuyonglong@huawei.com>

Currently when hns3 driver configures the tm shaper to limit
bandwidth below 20Mbit using the parameters calculated by
hclge_shaper_para_calc(), the actual bandwidth limited by tm
hardware module is not accurate enough, for example, 1.28 Mbit
when the user is configuring 1 Mbit.

This patch adjusts the ir_calc to be closer to ir, and
always calculate the ir_b parameter when user is configuring
a small bandwidth. Also, removes an unnecessary parenthesis
when calculating denominator.

Fixes: 848440544b41 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
index e829101..9f0e35f 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
@@ -81,16 +81,13 @@ static int hclge_shaper_para_calc(u32 ir, u8 shaper_level,
 		return 0;
 	} else if (ir_calc > ir) {
 		/* Increasing the denominator to select ir_s value */
-		while (ir_calc > ir) {
+		while (ir_calc >= ir && ir) {
 			ir_s_calc++;
 			ir_calc = DIVISOR_IR_B_126 / (tick * (1 << ir_s_calc));
 		}
 
-		if (ir_calc == ir)
-			*ir_b = 126;
-		else
-			*ir_b = (ir * tick * (1 << ir_s_calc) +
-				 (DIVISOR_CLK >> 1)) / DIVISOR_CLK;
+		*ir_b = (ir * tick * (1 << ir_s_calc) + (DIVISOR_CLK >> 1)) /
+			DIVISOR_CLK;
 	} else {
 		/* Increasing the numerator to select ir_u value */
 		u32 numerator;
@@ -104,7 +101,7 @@ static int hclge_shaper_para_calc(u32 ir, u8 shaper_level,
 		if (ir_calc == ir) {
 			*ir_b = 126;
 		} else {
-			u32 denominator = (DIVISOR_CLK * (1 << --ir_u_calc));
+			u32 denominator = DIVISOR_CLK * (1 << --ir_u_calc);
 			*ir_b = (ir * tick + (denominator >> 1)) / denominator;
 		}
 	}
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next 2/7] net: hns3: revert to old channel when setting new channel num fail
From: Huazhong Tan @ 2019-09-10  8:58 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski, Peng Li, Huazhong Tan
In-Reply-To: <1568105908-60983-1-git-send-email-tanhuazhong@huawei.com>

From: Peng Li <lipeng321@huawei.com>

After setting new channel num, it needs free old ring memory and
allocate new ring memory. If there is no enough memory and allocate
new ring memory fail, the ring may initialize fail. To make sure
the network interface can work normally, driver should revert the
channel to the old configuration.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 51 ++++++++++++++++++-------
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 9f3f8e3..8dbaf36 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -4410,6 +4410,30 @@ static int hns3_reset_notify(struct hnae3_handle *handle,
 	return ret;
 }
 
+static int hns3_change_channels(struct hnae3_handle *handle, u32 new_tqp_num,
+				bool rxfh_configured)
+{
+	int ret;
+
+	ret = handle->ae_algo->ops->set_channels(handle, new_tqp_num,
+						 rxfh_configured);
+	if (ret) {
+		dev_err(&handle->pdev->dev,
+			"Change tqp num(%u) fail.\n", new_tqp_num);
+		return ret;
+	}
+
+	ret = hns3_reset_notify(handle, HNAE3_INIT_CLIENT);
+	if (ret)
+		return ret;
+
+	ret =  hns3_reset_notify(handle, HNAE3_UP_CLIENT);
+	if (ret)
+		hns3_reset_notify(handle, HNAE3_UNINIT_CLIENT);
+
+	return ret;
+}
+
 int hns3_set_channels(struct net_device *netdev,
 		      struct ethtool_channels *ch)
 {
@@ -4450,24 +4474,23 @@ int hns3_set_channels(struct net_device *netdev,
 		return ret;
 
 	org_tqp_num = h->kinfo.num_tqps;
-	ret = h->ae_algo->ops->set_channels(h, new_tqp_num, rxfh_configured);
+	ret = hns3_change_channels(h, new_tqp_num, rxfh_configured);
 	if (ret) {
-		ret = h->ae_algo->ops->set_channels(h, org_tqp_num,
-						    rxfh_configured);
-		if (ret) {
-			/* If revert to old tqp failed, fatal error occurred */
-			dev_err(&netdev->dev,
-				"Revert to old tqp num fail, ret=%d", ret);
-			return ret;
+		int ret1;
+
+		netdev_warn(netdev,
+			    "Change channels fail, revert to old value\n");
+		ret1 = hns3_change_channels(h, org_tqp_num, rxfh_configured);
+		if (ret1) {
+			netdev_err(netdev,
+				   "revert to old channel fail\n");
+			return ret1;
 		}
-		dev_info(&netdev->dev,
-			 "Change tqp num fail, Revert to old tqp num");
-	}
-	ret = hns3_reset_notify(h, HNAE3_INIT_CLIENT);
-	if (ret)
+
 		return ret;
+	}
 
-	return hns3_reset_notify(h, HNAE3_UP_CLIENT);
+	return 0;
 }
 
 static const struct hns3_hw_error_info hns3_hw_err[] = {
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next 0/7] net: hns3: add a feature & bugfixes & cleanups
From: Huazhong Tan @ 2019-09-10  8:58 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski, Huazhong Tan

This patch-set includes a VF feature, bugfixes and cleanups for the HNS3
ethernet controller driver.

[patch 01/07] adds ethtool_ops.set_channels support for HNS3 VF driver

[patch 02/07] adds a recovery for setting channel fail.

[patch 03/07] fixes an error related to shaper parameter algorithm.

[patch 04/07] fixes an error related to ksetting.

[patch 05/07] adds cleanups for some log pinting.

[patch 06/07] adds a NULL pointer check before function calling.

[patch 07/07] adds some debugging information for reset issue.

Guangbin Huang (4):
  net: hns3: add ethtool_ops.set_channels support for HNS3 VF driver
  net: hns3: fix port setting handle for fibre port
  net: hns3: modify some logs format
  net: hns3: check NULL pointer before use

Huazhong Tan (1):
  net: hns3: add some DFX info for reset issue

Peng Li (1):
  net: hns3: revert to old channel when setting new channel num fail

Yonglong Liu (1):
  net: hns3: fix shaper parameter algorithm

 drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c |  7 +-
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c    | 54 ++++++++++----
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 16 ++++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c |  2 +-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 32 +++++---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 13 ++--
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  2 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c  | 11 +--
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 87 ++++++++++++++++++++--
 9 files changed, 178 insertions(+), 46 deletions(-)

-- 
2.7.4


^ permalink raw reply

* [PATCH net-next 6/7] net: hns3: check NULL pointer before use
From: Huazhong Tan @ 2019-09-10  8:58 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski, Guangbin Huang, Huazhong Tan
In-Reply-To: <1568105908-60983-1-git-send-email-tanhuazhong@huawei.com>

From: Guangbin Huang <huangguangbin2@huawei.com>

This patch checks ops->set_default_reset_request whether is NULL
before using it in function hns3_slot_reset.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 8dbaf36..616cad0 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2006,7 +2006,8 @@ static pci_ers_result_t hns3_slot_reset(struct pci_dev *pdev)
 
 	ops = ae_dev->ops;
 	/* request the reset */
-	if (ops->reset_event && ops->get_reset_level) {
+	if (ops->reset_event && ops->get_reset_level &&
+	    ops->set_default_reset_request) {
 		if (ae_dev->hw_err_reset_req) {
 			reset_type = ops->get_reset_level(ae_dev,
 						&ae_dev->hw_err_reset_req);
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next 7/7] net: hns3: add some DFX info for reset issue
From: Huazhong Tan @ 2019-09-10  8:58 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski, Huazhong Tan
In-Reply-To: <1568105908-60983-1-git-send-email-tanhuazhong@huawei.com>

This patch adds more information for reset DFX. Also, adds some
cleanups to reset info, move reset_fail_cnt into struct
hclge_rst_stats, and modifies some print formats.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 32 ++++++++++++++++------
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 11 ++++----
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  2 +-
 3 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
index 6dcce48..d0128d7 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
@@ -931,22 +931,36 @@ static void hclge_dbg_fd_tcam(struct hclge_dev *hdev)
 
 static void hclge_dbg_dump_rst_info(struct hclge_dev *hdev)
 {
-	dev_info(&hdev->pdev->dev, "PF reset count: %d\n",
+	dev_info(&hdev->pdev->dev, "PF reset count: %u\n",
 		 hdev->rst_stats.pf_rst_cnt);
-	dev_info(&hdev->pdev->dev, "FLR reset count: %d\n",
+	dev_info(&hdev->pdev->dev, "FLR reset count: %u\n",
 		 hdev->rst_stats.flr_rst_cnt);
-	dev_info(&hdev->pdev->dev, "CORE reset count: %d\n",
-		 hdev->rst_stats.core_rst_cnt);
-	dev_info(&hdev->pdev->dev, "GLOBAL reset count: %d\n",
+	dev_info(&hdev->pdev->dev, "GLOBAL reset count: %u\n",
 		 hdev->rst_stats.global_rst_cnt);
-	dev_info(&hdev->pdev->dev, "IMP reset count: %d\n",
+	dev_info(&hdev->pdev->dev, "IMP reset count: %u\n",
 		 hdev->rst_stats.imp_rst_cnt);
-	dev_info(&hdev->pdev->dev, "reset done count: %d\n",
+	dev_info(&hdev->pdev->dev, "reset done count: %u\n",
 		 hdev->rst_stats.reset_done_cnt);
-	dev_info(&hdev->pdev->dev, "HW reset done count: %d\n",
+	dev_info(&hdev->pdev->dev, "HW reset done count: %u\n",
 		 hdev->rst_stats.hw_reset_done_cnt);
-	dev_info(&hdev->pdev->dev, "reset count: %d\n",
+	dev_info(&hdev->pdev->dev, "reset count: %u\n",
 		 hdev->rst_stats.reset_cnt);
+	dev_info(&hdev->pdev->dev, "reset count: %u\n",
+		 hdev->rst_stats.reset_cnt);
+	dev_info(&hdev->pdev->dev, "reset fail count: %u\n",
+		 hdev->rst_stats.reset_fail_cnt);
+	dev_info(&hdev->pdev->dev, "vector0 interrupt enable status: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_MISC_VECTOR_REG_BASE));
+	dev_info(&hdev->pdev->dev, "reset interrupt source: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_MISC_RESET_STS_REG));
+	dev_info(&hdev->pdev->dev, "reset interrupt status: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_MISC_VECTOR_INT_STS));
+	dev_info(&hdev->pdev->dev, "hardware reset status: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_GLOBAL_RESET_REG));
+	dev_info(&hdev->pdev->dev, "handshake status: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_NIC_CSQ_DEPTH_REG));
+	dev_info(&hdev->pdev->dev, "function reset status: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_FUN_RST_ING));
 }
 
 static void hclge_dbg_get_m7_stats_info(struct hclge_dev *hdev)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index bc5bad3..fd7f943 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3547,12 +3547,12 @@ static bool hclge_reset_err_handle(struct hclge_dev *hdev)
 			 "reset failed because new reset interrupt\n");
 		hclge_clear_reset_cause(hdev);
 		return false;
-	} else if (hdev->reset_fail_cnt < MAX_RESET_FAIL_CNT) {
-		hdev->reset_fail_cnt++;
+	} else if (hdev->rst_stats.reset_fail_cnt < MAX_RESET_FAIL_CNT) {
+		hdev->rst_stats.reset_fail_cnt++;
 		set_bit(hdev->reset_type, &hdev->reset_pending);
 		dev_info(&hdev->pdev->dev,
 			 "re-schedule reset task(%d)\n",
-			 hdev->reset_fail_cnt);
+			 hdev->rst_stats.reset_fail_cnt);
 		return true;
 	}
 
@@ -3679,7 +3679,8 @@ static void hclge_reset(struct hclge_dev *hdev)
 	/* ignore RoCE notify error if it fails HCLGE_RESET_MAX_FAIL_CNT - 1
 	 * times
 	 */
-	if (ret && hdev->reset_fail_cnt < HCLGE_RESET_MAX_FAIL_CNT - 1)
+	if (ret &&
+	    hdev->rst_stats.reset_fail_cnt < HCLGE_RESET_MAX_FAIL_CNT - 1)
 		goto err_reset;
 
 	rtnl_lock();
@@ -3695,7 +3696,7 @@ static void hclge_reset(struct hclge_dev *hdev)
 		goto err_reset;
 
 	hdev->last_reset_time = jiffies;
-	hdev->reset_fail_cnt = 0;
+	hdev->rst_stats.reset_fail_cnt = 0;
 	hdev->rst_stats.reset_done_cnt++;
 	ae_dev->reset_type = HNAE3_NONE_RESET;
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index 870550f..3e9574a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -659,6 +659,7 @@ struct hclge_rst_stats {
 	u32 global_rst_cnt;	/* the number of GLOBAL */
 	u32 imp_rst_cnt;	/* the number of IMP reset */
 	u32 reset_cnt;		/* the number of reset */
+	u32 reset_fail_cnt;	/* the number of reset fail */
 };
 
 /* time and register status when mac tunnel interruption occur */
@@ -725,7 +726,6 @@ struct hclge_dev {
 	unsigned long reset_request;	/* reset has been requested */
 	unsigned long reset_pending;	/* client rst is pending to be served */
 	struct hclge_rst_stats rst_stats;
-	u32 reset_fail_cnt;
 	u32 fw_version;
 	u16 num_vmdq_vport;		/* Num vmdq vport this PF has set up */
 	u16 num_tqps;			/* Num task queue pairs of this PF */
-- 
2.7.4


^ permalink raw reply related

* Re: Default qdisc not correctly initialized with custom MTU
From: Holger Hoffstätte @ 2019-09-10  9:14 UTC (permalink / raw)
  To: Cong Wang; +Cc: Netdev
In-Reply-To: <CAM_iQpWKsSWDZ55kMO6mzDe5C7tHW-ub_eH91hRzZMdUtKJtfA@mail.gmail.com>

On 9/10/19 12:52 AM, Cong Wang wrote:
> On Mon, Sep 9, 2019 at 5:44 AM Holger Hoffstätte
> <holger@applied-asynchrony.com> wrote:
>> I can't help but feel this is a slight bug in terms of initialization order,
>> and that the default qdisc should only be created when it's first being
>> used/attached to a link, not when the sysctls are configured.
> 
> Yeah, this is because the fq_codel qdisc is initialized once and
> doesn't get any notification when the netdev's MTU get changed.

My point was that it shouldn't be created or initialized at all when
the sysctl is configured, only the name should be validated/stored and
queried when needed. If any interface is brought up before that point,
no value (yet) would just mean "trod along with the defaults" to whoever
is doing the work.

> We can "fix" this by adding a NETDEV_CHANGEMTU notifier to
> qdisc's, but I don't know if it is really worth the effort.

This is essentially the opposite of what I had in mind. The problem is
that the entity was created, not that it needs to be notified.
Also I don't think that would work for scenarios with multiple links
using different MTUs.

> Is there any reason you can't change that order?

Yes, because that wouldn't solve anything?
Like i said I can just kick the root qdisc to update itself in
a post interface-setup script, and that works fine. Since I need
that script anyway for setting several other parameters for
the device it's no big deal - just another workaround.

A brief look at the initialization in sch_mq/sch_generic unfortunately
didn't really help clear things up for me, hence I guess my real
question is whether a qdisc *must* be created early for some reason
(assuming sysctls come before link setup), or whether this is something
that could be delayed and done on-demand.

thanks,
Holger

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox