* Re: [PATCH v2 1/9] vgacon: rework Kconfig dependencies
From: Geert Uytterhoeven @ 2023-07-24 12:07 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Michael Ellerman, Arnd Bergmann, linux-fbdev, Thomas Zimmermann,
Helge Deller, Javier Martinez Canillas, David S . Miller,
K. Y. Srinivasan, Ard Biesheuvel, Borislav Petkov, Brian Cain,
Catalin Marinas, Christophe Leroy, Daniel Vetter, Dave Hansen,
Dave Airlie, Deepak Rawat, Dexuan Cui, Dinh Nguyen,
Greg Kroah-Hartman, guoren, Haiyang Zhang, Huacai Chen,
Ingo Molnar, John Paul Adrian Glaubitz, Khalid Aziz,
Linus Walleij, Matt Turner, Max Filippov, Nicholas Piggin,
Palmer Dabbelt, Russell King, Thomas Bogendoerfer,
Thomas Gleixner, WANG Xuerui, Wei Liu, Will Deacon, x86,
linux-alpha, linux-kernel, linux-arm-kernel, linux-efi,
linux-csky@vger.kernel.org, linux-hexagon, linux-ia64, loongarch,
linux-mips, linuxppc-dev, linux-riscv, linux-sh, sparclinux,
linux-hyperv, dri-devel
In-Reply-To: <19631e74-415e-4dcb-b79d-33dcf03d2dfc@app.fastmail.com>
Hi Arnd,
On Fri, Jul 21, 2023 at 10:29 AM Arnd Bergmann <arnd@arndb.de> wrote:
> On Fri, Jul 21, 2023, at 06:59, Michael Ellerman wrote:
> > Arnd Bergmann <arnd@kernel.org> writes:
> >> From: Arnd Bergmann <arnd@arndb.de>
> >>
> >> The list of dependencies here is phrased as an opt-out, but this is missing
> >> a lot of architectures that don't actually support VGA consoles, and some
> >> of the entries are stale:
> >>
> >> - powerpc used to support VGA consoles in the old arch/ppc codebase, but
> >> the merged arch/powerpc never did
> >
> > Not disputing this, but how did you come to that conclusion? I grepped
> > around and couldn't convince myself whether it can work on powerpc or
> > not. ie. currently it's possible to enable CONFIG_VGA_CONSOLE and
> > powerpc does have a struct screen_info defined which seems like it would
> > allow vgacon_startup() to complete.
>
> The VGA console needs both screen_info and vga_con to work. In arch/ppc
> we had both, but in arch/powerpc we only retained the screen_info:
>
> $ git grep vga_con v2.6.26 -- arch/ppc arch/ppc64 arch/powerpc
> v2.6.26:arch/ppc/platforms/pplus.c: conswitchp = &vga_con;
> v2.6.26:arch/ppc/platforms/prep_setup.c: conswitchp = &vga_con;
>
> so after arch/ppc was removed, this became impossible to use on both
> pplus and prep. These two platforms were also (as far as I can tell)
> the only ones to support vga16fb as an alternative to vgacon, but
> both platforms were removed later on.
I did use vgacon and vga16fb on CHRP on a second video card
(initialized using Gabriel Paubert's x86 BIOS emulator), but that was
definitely before the advent of arch/powerpc/.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* RE: [PATCH V3,net-next] net: mana: Add page pool for RX buffers
From: Haiyang Zhang @ 2023-07-24 15:46 UTC (permalink / raw)
To: Jesper Dangaard Brouer, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org
Cc: brouer@redhat.com, Dexuan Cui, KY Srinivasan, Paul Rosswurm,
olaf@aepfle.de, vkuznets@redhat.com, davem@davemloft.net,
wei.liu@kernel.org, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, leon@kernel.org, Long Li,
ssengar@linux.microsoft.com, linux-rdma@vger.kernel.org,
daniel@iogearbox.net, john.fastabend@gmail.com,
bpf@vger.kernel.org, ast@kernel.org, Ajay Sharma, hawk@kernel.org,
tglx@linutronix.de, shradhagupta@linux.microsoft.com,
linux-kernel@vger.kernel.org, Ilias Apalodimas,
Jesper Dangaard Brouer
In-Reply-To: <1af55bbb-7aff-e575-8dc1-8ba64b924580@redhat.com>
> -----Original Message-----
> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
> Sent: Monday, July 24, 2023 7:29 AM
> To: Haiyang Zhang <haiyangz@microsoft.com>; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org
> Cc: brouer@redhat.com; Dexuan Cui <decui@microsoft.com>; KY Srinivasan
> <kys@microsoft.com>; Paul Rosswurm <paulros@microsoft.com>;
> olaf@aepfle.de; vkuznets@redhat.com; davem@davemloft.net;
> wei.liu@kernel.org; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; leon@kernel.org; Long Li <longli@microsoft.com>;
> ssengar@linux.microsoft.com; linux-rdma@vger.kernel.org;
> daniel@iogearbox.net; john.fastabend@gmail.com; bpf@vger.kernel.org;
> ast@kernel.org; Ajay Sharma <sharmaajay@microsoft.com>; hawk@kernel.org;
> tglx@linutronix.de; shradhagupta@linux.microsoft.com; linux-
> kernel@vger.kernel.org; Ilias Apalodimas <ilias.apalodimas@linaro.org>; Jesper
> Dangaard Brouer <hawk@kernel.org>
> Subject: Re: [PATCH V3,net-next] net: mana: Add page pool for RX buffers
>
>
>
> On 21/07/2023 21.05, Haiyang Zhang wrote:
> > Add page pool for RX buffers for faster buffer cycle and reduce CPU
> > usage.
> >
> > The standard page pool API is used.
> >
> > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> > ---
> > V3:
> > Update xdp mem model, pool param, alloc as suggested by Jakub Kicinski
> > V2:
> > Use the standard page pool API as suggested by Jesper Dangaard Brouer
> >
> > ---
> > drivers/net/ethernet/microsoft/mana/mana_en.c | 91 +++++++++++++++----
> > include/net/mana/mana.h | 3 +
> > 2 files changed, 78 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > index a499e460594b..4307f25f8c7a 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> [...]
> > @@ -1659,6 +1679,8 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
> >
> > if (rxq->xdp_flush)
> > xdp_do_flush();
> > +
> > + page_pool_nid_changed(rxq->page_pool, numa_mem_id());
>
> I don't think this page_pool_nid_changed() called is needed, if you do
> as I suggest below (nid = NUMA_NO_NODE).
>
>
> > }
> >
> > static int mana_cq_handler(void *context, struct gdma_queue
> *gdma_queue)
> [...]
>
> > @@ -2008,6 +2041,25 @@ static int mana_push_wqe(struct mana_rxq *rxq)
> > return 0;
> > }
> >
> > +static int mana_create_page_pool(struct mana_rxq *rxq)
> > +{
> > + struct page_pool_params pprm = {};
>
> You are implicitly assigning NUMA node id zero.
>
> > + int ret;
> > +
> > + pprm.pool_size = RX_BUFFERS_PER_QUEUE;
> > + pprm.napi = &rxq->rx_cq.napi;
>
> You likely want to assign pprm.nid to NUMA_NO_NODE
>
> pprm.nid = NUMA_NO_NODE;
>
> For most drivers it is recommended to assign ``NUMA_NO_NODE`` (value -1)
> as the NUMA ID to ``pp_params.nid``. When ``CONFIG_NUMA`` is enabled
> this setting will automatically select the (preferred) NUMA node (via
> ``numa_mem_id()``) based on where NAPI RX-processing is currently
> running. The effect is that page_pool will only use recycled memory when
> NUMA node match running CPU. This assumes CPU refilling driver RX-ring
> will also run RX-NAPI.
>
> If a driver want more control over the NUMA node memory selection,
> drivers can assign (``pp_params.nid``) something else than
> `NUMA_NO_NODE`` and runtime adjust via function
> ``page_pool_nid_changed()``.
Our driver is using NUMA 0 by default, so I implicitly assign NUMA node id
to zero during pool init.
And, if the IRQ/CPU affinity is changed, the page_pool_nid_changed()
will update the nid for the pool. Does this sound good?
Thanks,
-Haiyang
^ permalink raw reply
* Re: [PATCH RFC net-next v5 01/14] af_vsock: generalize vsock_dgram_recvmsg() to all transports
From: Arseniy Krasnov @ 2023-07-24 18:11 UTC (permalink / raw)
To: Bobby Eshleman, Stefan Hajnoczi, Stefano Garzarella,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
VMware PV-Drivers Reviewers
Cc: Dan Carpenter, Simon Horman, kvm, virtualization, netdev,
linux-kernel, linux-hyperv, bpf
In-Reply-To: <20230413-b4-vsock-dgram-v5-1-581bd37fdb26@bytedance.com>
On 19.07.2023 03:50, Bobby Eshleman wrote:
> This commit drops the transport->dgram_dequeue callback and makes
> vsock_dgram_recvmsg() generic to all transports.
>
> To make this possible, two transport-level changes are introduced:
> - implementation of the ->dgram_addr_init() callback to initialize
> the sockaddr_vm structure with data from incoming socket buffers.
> - transport implementations set the skb->data pointer to the beginning
> of the payload prior to adding the skb to the socket's receive queue.
> That is, they must use skb_pull() before enqueuing. This is an
> agreement between the transport and the socket layer that skb->data
> always points to the beginning of the payload (and not, for example,
> the packet header).
>
> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> ---
> drivers/vhost/vsock.c | 1 -
> include/linux/virtio_vsock.h | 5 ---
> include/net/af_vsock.h | 3 +-
> net/vmw_vsock/af_vsock.c | 40 ++++++++++++++++++++++-
> net/vmw_vsock/hyperv_transport.c | 7 ----
> net/vmw_vsock/virtio_transport.c | 1 -
> net/vmw_vsock/virtio_transport_common.c | 9 -----
> net/vmw_vsock/vmci_transport.c | 58 ++++++---------------------------
> net/vmw_vsock/vsock_loopback.c | 1 -
> 9 files changed, 50 insertions(+), 75 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 6578db78f0ae..ae8891598a48 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
> .cancel_pkt = vhost_transport_cancel_pkt,
>
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> - .dgram_dequeue = virtio_transport_dgram_dequeue,
> .dgram_bind = virtio_transport_dgram_bind,
> .dgram_allow = virtio_transport_dgram_allow,
>
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index c58453699ee9..18cbe8d37fca 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -167,11 +167,6 @@ virtio_transport_stream_dequeue(struct vsock_sock *vsk,
> size_t len,
> int type);
> int
> -virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> - struct msghdr *msg,
> - size_t len, int flags);
> -
> -int
> virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> struct msghdr *msg,
> size_t len);
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index 0e7504a42925..305d57502e89 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -120,11 +120,10 @@ struct vsock_transport {
>
> /* DGRAM. */
> int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
> - int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
> - size_t len, int flags);
> int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
> struct msghdr *, size_t len);
> bool (*dgram_allow)(u32 cid, u32 port);
> + void (*dgram_addr_init)(struct sk_buff *skb, struct sockaddr_vm *addr);
>
> /* STREAM. */
> /* TODO: stream_bind() */
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index deb72a8c44a7..ad71e084bf2f 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -1272,11 +1272,15 @@ static int vsock_dgram_connect(struct socket *sock,
> int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> size_t len, int flags)
> {
> + const struct vsock_transport *transport;
> #ifdef CONFIG_BPF_SYSCALL
> const struct proto *prot;
> #endif
> struct vsock_sock *vsk;
> + struct sk_buff *skb;
> + size_t payload_len;
> struct sock *sk;
> + int err;
>
> sk = sock->sk;
> vsk = vsock_sk(sk);
> @@ -1287,7 +1291,41 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> return prot->recvmsg(sk, msg, len, flags, NULL);
> #endif
>
> - return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
> + if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> + return -EOPNOTSUPP;
> +
> + transport = vsk->transport;
> +
> + /* Retrieve the head sk_buff from the socket's receive queue. */
> + err = 0;
> + skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
> + if (!skb)
> + return err;
> +
> + payload_len = skb->len;
> +
> + if (payload_len > len) {
> + payload_len = len;
> + msg->msg_flags |= MSG_TRUNC;
> + }
> +
> + /* Place the datagram payload in the user's iovec. */
> + err = skb_copy_datagram_msg(skb, 0, msg, payload_len);
> + if (err)
> + goto out;
> +
> + if (msg->msg_name) {
> + /* Provide the address of the sender. */
> + DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> +
> + transport->dgram_addr_init(skb, vm_addr);
Do we need check that dgram_addr_init != NULL? because I see that not all transports have this
callback set in this patch
> + msg->msg_namelen = sizeof(*vm_addr);
> + }
> + err = payload_len;
> +
> +out:
> + skb_free_datagram(&vsk->sk, skb);
> + return err;
> }
> EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
>
> diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> index 7cb1a9d2cdb4..7f1ea434656d 100644
> --- a/net/vmw_vsock/hyperv_transport.c
> +++ b/net/vmw_vsock/hyperv_transport.c
> @@ -556,12 +556,6 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> return -EOPNOTSUPP;
> }
>
> -static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
> - size_t len, int flags)
> -{
> - return -EOPNOTSUPP;
> -}
> -
> static int hvs_dgram_enqueue(struct vsock_sock *vsk,
> struct sockaddr_vm *remote, struct msghdr *msg,
> size_t dgram_len)
> @@ -833,7 +827,6 @@ static struct vsock_transport hvs_transport = {
> .shutdown = hvs_shutdown,
>
> .dgram_bind = hvs_dgram_bind,
> - .dgram_dequeue = hvs_dgram_dequeue,
> .dgram_enqueue = hvs_dgram_enqueue,
> .dgram_allow = hvs_dgram_allow,
>
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index e95df847176b..66edffdbf303 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -429,7 +429,6 @@ static struct virtio_transport virtio_transport = {
> .cancel_pkt = virtio_transport_cancel_pkt,
>
> .dgram_bind = virtio_transport_dgram_bind,
> - .dgram_dequeue = virtio_transport_dgram_dequeue,
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> .dgram_allow = virtio_transport_dgram_allow,
>
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index b769fc258931..01ea1402ad40 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -583,15 +583,6 @@ virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> }
> EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_enqueue);
>
> -int
> -virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> - struct msghdr *msg,
> - size_t len, int flags)
> -{
> - return -EOPNOTSUPP;
> -}
> -EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
> -
> s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
> {
> struct virtio_vsock_sock *vvs = vsk->trans;
> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> index b370070194fa..0bbbdb222245 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c
> @@ -641,6 +641,7 @@ static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg)
> sock_hold(sk);
> skb_put(skb, size);
> memcpy(skb->data, dg, size);
> + skb_pull(skb, VMCI_DG_HEADERSIZE);
> sk_receive_skb(sk, skb, 0);
>
> return VMCI_SUCCESS;
> @@ -1731,57 +1732,18 @@ static int vmci_transport_dgram_enqueue(
> return err - sizeof(*dg);
> }
>
> -static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
> - struct msghdr *msg, size_t len,
> - int flags)
> +static void vmci_transport_dgram_addr_init(struct sk_buff *skb,
> + struct sockaddr_vm *addr)
> {
> - int err;
> struct vmci_datagram *dg;
> - size_t payload_len;
> - struct sk_buff *skb;
> -
> - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> - return -EOPNOTSUPP;
> -
> - /* Retrieve the head sk_buff from the socket's receive queue. */
> - err = 0;
> - skb = skb_recv_datagram(&vsk->sk, flags, &err);
> - if (!skb)
> - return err;
> -
> - dg = (struct vmci_datagram *)skb->data;
> - if (!dg)
> - /* err is 0, meaning we read zero bytes. */
> - goto out;
> -
> - payload_len = dg->payload_size;
> - /* Ensure the sk_buff matches the payload size claimed in the packet. */
> - if (payload_len != skb->len - sizeof(*dg)) {
> - err = -EINVAL;
> - goto out;
> - }
> -
> - if (payload_len > len) {
> - payload_len = len;
> - msg->msg_flags |= MSG_TRUNC;
> - }
> + unsigned int cid, port;
>
> - /* Place the datagram payload in the user's iovec. */
> - err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
> - if (err)
> - goto out;
> -
> - if (msg->msg_name) {
> - /* Provide the address of the sender. */
> - DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> - vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
> - msg->msg_namelen = sizeof(*vm_addr);
> - }
> - err = payload_len;
> + WARN_ONCE(skb->head == skb->data, "vmci vsock bug: bad dgram skb");
>
> -out:
> - skb_free_datagram(&vsk->sk, skb);
> - return err;
> + dg = (struct vmci_datagram *)skb->head;
> + cid = dg->src.context;
> + port = dg->src.resource;
> + vsock_addr_init(addr, cid, port);
I think we
1) can short this to:
vsock_addr_init(addr, dg->src.context, dg->src.resource);
2) w/o previous point, cid and port better be u32, as VMCI structure has u32 fields 'context' and
'resource' and 'vsock_addr_init()' also has u32 type for both arguments.
Thanks, Arseniy
> }
>
> static bool vmci_transport_dgram_allow(u32 cid, u32 port)
> @@ -2040,9 +2002,9 @@ static struct vsock_transport vmci_transport = {
> .release = vmci_transport_release,
> .connect = vmci_transport_connect,
> .dgram_bind = vmci_transport_dgram_bind,
> - .dgram_dequeue = vmci_transport_dgram_dequeue,
> .dgram_enqueue = vmci_transport_dgram_enqueue,
> .dgram_allow = vmci_transport_dgram_allow,
> + .dgram_addr_init = vmci_transport_dgram_addr_init,
> .stream_dequeue = vmci_transport_stream_dequeue,
> .stream_enqueue = vmci_transport_stream_enqueue,
> .stream_has_data = vmci_transport_stream_has_data,
> diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> index 5c6360df1f31..2a59dd177c74 100644
> --- a/net/vmw_vsock/vsock_loopback.c
> +++ b/net/vmw_vsock/vsock_loopback.c
> @@ -62,7 +62,6 @@ static struct virtio_transport loopback_transport = {
> .cancel_pkt = vsock_loopback_cancel_pkt,
>
> .dgram_bind = virtio_transport_dgram_bind,
> - .dgram_dequeue = virtio_transport_dgram_dequeue,
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> .dgram_allow = virtio_transport_dgram_allow,
>
>
^ permalink raw reply
* RE: [PATCH V3,net-next] net: mana: Add page pool for RX buffers
From: Haiyang Zhang @ 2023-07-24 18:35 UTC (permalink / raw)
To: Haiyang Zhang, Jesper Dangaard Brouer,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
Cc: brouer@redhat.com, Dexuan Cui, KY Srinivasan, Paul Rosswurm,
olaf@aepfle.de, vkuznets@redhat.com, davem@davemloft.net,
wei.liu@kernel.org, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, leon@kernel.org, Long Li,
ssengar@linux.microsoft.com, linux-rdma@vger.kernel.org,
daniel@iogearbox.net, john.fastabend@gmail.com,
bpf@vger.kernel.org, ast@kernel.org, Ajay Sharma, hawk@kernel.org,
tglx@linutronix.de, shradhagupta@linux.microsoft.com,
linux-kernel@vger.kernel.org, Ilias Apalodimas,
Jesper Dangaard Brouer
In-Reply-To: <PH7PR21MB3116F8A97F3626AB04915B96CA02A@PH7PR21MB3116.namprd21.prod.outlook.com>
> -----Original Message-----
> From: Haiyang Zhang <haiyangz@microsoft.com>
> Sent: Monday, July 24, 2023 11:46 AM
> To: Jesper Dangaard Brouer <jbrouer@redhat.com>; linux-
> hyperv@vger.kernel.org; netdev@vger.kernel.org
> Cc: brouer@redhat.com; Dexuan Cui <decui@microsoft.com>; KY Srinivasan
> <kys@microsoft.com>; Paul Rosswurm <paulros@microsoft.com>;
> olaf@aepfle.de; vkuznets@redhat.com; davem@davemloft.net;
> wei.liu@kernel.org; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; leon@kernel.org; Long Li <longli@microsoft.com>;
> ssengar@linux.microsoft.com; linux-rdma@vger.kernel.org;
> daniel@iogearbox.net; john.fastabend@gmail.com; bpf@vger.kernel.org;
> ast@kernel.org; Ajay Sharma <sharmaajay@microsoft.com>; hawk@kernel.org;
> tglx@linutronix.de; shradhagupta@linux.microsoft.com; linux-
> kernel@vger.kernel.org; Ilias Apalodimas <ilias.apalodimas@linaro.org>; Jesper
> Dangaard Brouer <hawk@kernel.org>
> Subject: RE: [PATCH V3,net-next] net: mana: Add page pool for RX buffers
>
>
>
> > -----Original Message-----
> > From: Jesper Dangaard Brouer <jbrouer@redhat.com>
> > Sent: Monday, July 24, 2023 7:29 AM
> > To: Haiyang Zhang <haiyangz@microsoft.com>; linux-hyperv@vger.kernel.org;
> > netdev@vger.kernel.org
> > Cc: brouer@redhat.com; Dexuan Cui <decui@microsoft.com>; KY Srinivasan
> > <kys@microsoft.com>; Paul Rosswurm <paulros@microsoft.com>;
> > olaf@aepfle.de; vkuznets@redhat.com; davem@davemloft.net;
> > wei.liu@kernel.org; edumazet@google.com; kuba@kernel.org;
> > pabeni@redhat.com; leon@kernel.org; Long Li <longli@microsoft.com>;
> > ssengar@linux.microsoft.com; linux-rdma@vger.kernel.org;
> > daniel@iogearbox.net; john.fastabend@gmail.com; bpf@vger.kernel.org;
> > ast@kernel.org; Ajay Sharma <sharmaajay@microsoft.com>;
> hawk@kernel.org;
> > tglx@linutronix.de; shradhagupta@linux.microsoft.com; linux-
> > kernel@vger.kernel.org; Ilias Apalodimas <ilias.apalodimas@linaro.org>;
> Jesper
> > Dangaard Brouer <hawk@kernel.org>
> > Subject: Re: [PATCH V3,net-next] net: mana: Add page pool for RX buffers
> >
> >
> >
> > On 21/07/2023 21.05, Haiyang Zhang wrote:
> > > Add page pool for RX buffers for faster buffer cycle and reduce CPU
> > > usage.
> > >
> > > The standard page pool API is used.
> > >
> > > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> > > ---
> > > V3:
> > > Update xdp mem model, pool param, alloc as suggested by Jakub Kicinski
> > > V2:
> > > Use the standard page pool API as suggested by Jesper Dangaard Brouer
> > >
> > > ---
> > > drivers/net/ethernet/microsoft/mana/mana_en.c | 91 +++++++++++++++--
> --
> > > include/net/mana/mana.h | 3 +
> > > 2 files changed, 78 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > > index a499e460594b..4307f25f8c7a 100644
> > > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > [...]
> > > @@ -1659,6 +1679,8 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
> > >
> > > if (rxq->xdp_flush)
> > > xdp_do_flush();
> > > +
> > > + page_pool_nid_changed(rxq->page_pool, numa_mem_id());
> >
> > I don't think this page_pool_nid_changed() called is needed, if you do
> > as I suggest below (nid = NUMA_NO_NODE).
> >
> >
> > > }
> > >
> > > static int mana_cq_handler(void *context, struct gdma_queue
> > *gdma_queue)
> > [...]
> >
> > > @@ -2008,6 +2041,25 @@ static int mana_push_wqe(struct mana_rxq
> *rxq)
> > > return 0;
> > > }
> > >
> > > +static int mana_create_page_pool(struct mana_rxq *rxq)
> > > +{
> > > + struct page_pool_params pprm = {};
> >
> > You are implicitly assigning NUMA node id zero.
> >
> > > + int ret;
> > > +
> > > + pprm.pool_size = RX_BUFFERS_PER_QUEUE;
> > > + pprm.napi = &rxq->rx_cq.napi;
> >
> > You likely want to assign pprm.nid to NUMA_NO_NODE
> >
> > pprm.nid = NUMA_NO_NODE;
> >
> > For most drivers it is recommended to assign ``NUMA_NO_NODE`` (value -1)
> > as the NUMA ID to ``pp_params.nid``. When ``CONFIG_NUMA`` is enabled
> > this setting will automatically select the (preferred) NUMA node (via
> > ``numa_mem_id()``) based on where NAPI RX-processing is currently
> > running. The effect is that page_pool will only use recycled memory when
> > NUMA node match running CPU. This assumes CPU refilling driver RX-ring
> > will also run RX-NAPI.
> >
> > If a driver want more control over the NUMA node memory selection,
> > drivers can assign (``pp_params.nid``) something else than
> > `NUMA_NO_NODE`` and runtime adjust via function
> > ``page_pool_nid_changed()``.
>
> Our driver is using NUMA 0 by default, so I implicitly assign NUMA node id
> to zero during pool init.
>
> And, if the IRQ/CPU affinity is changed, the page_pool_nid_changed()
> will update the nid for the pool. Does this sound good?
>
Also, since our driver is getting the default node from here:
gc->numa_node = dev_to_node(&pdev->dev);
I will update this patch to set the default node as above, instead of implicitly
assigning it to 0.
Thanks,
- Haiyang
^ permalink raw reply
* Re: [PATCH v2 1/1] x86/hyperv: Disable IBT when hypercall page lacks ENDBR instruction
From: Peter Zijlstra @ 2023-07-24 21:06 UTC (permalink / raw)
To: Michael Kelley
Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
x86, linux-kernel, linux-hyperv, stable
In-Reply-To: <1690001476-98594-1-git-send-email-mikelley@microsoft.com>
On Fri, Jul 21, 2023 at 09:51:16PM -0700, Michael Kelley wrote:
> On hardware that supports Indirect Branch Tracking (IBT), Hyper-V VMs
> with ConfigVersion 9.3 or later support IBT in the guest. However,
> current versions of Hyper-V have a bug in that there's not an ENDBR64
> instruction at the beginning of the hypercall page. Since hypercalls are
> made with an indirect call to the hypercall page, all hypercall attempts
> fail with an exception and Linux panics.
>
> A Hyper-V fix is in progress to add ENDBR64. But guard against the Linux
> panic by clearing X86_FEATURE_IBT if the hypercall page doesn't start
> with ENDBR. The VM will boot and run without IBT.
>
> If future Linux 32-bit kernels were to support IBT, additional hypercall
> page hackery would be needed to make IBT work for such kernels in a
> Hyper-V VM.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>
> Changes in v2:
> * Use pr_warn() instead of pr_info() [Peter Zijlstra]
>
> arch/x86/hyperv/hv_init.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 6c04b52..5cbee24 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -14,6 +14,7 @@
> #include <asm/apic.h>
> #include <asm/desc.h>
> #include <asm/sev.h>
> +#include <asm/ibt.h>
> #include <asm/hypervisor.h>
> #include <asm/hyperv-tlfs.h>
> #include <asm/mshyperv.h>
> @@ -472,6 +473,26 @@ void __init hyperv_init(void)
> }
>
> /*
> + * Some versions of Hyper-V that provide IBT in guest VMs have a bug
> + * in that there's no ENDBR64 instruction at the entry to the
> + * hypercall page. Because hypercalls are invoked via an indirect call
> + * to the hypercall page, all hypercall attempts fail when IBT is
> + * enabled, and Linux panics. For such buggy versions, disable IBT.
> + *
> + * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall
> + * page, so if future Linux kernel versions enable IBT for 32-bit
> + * builds, additional hypercall page hackery will be required here
> + * to provide an ENDBR32.
> + */
> +#ifdef CONFIG_X86_KERNEL_IBT
> + if (cpu_feature_enabled(X86_FEATURE_IBT) &&
> + *(u32 *)hv_hypercall_pg != gen_endbr()) {
> + setup_clear_cpu_cap(X86_FEATURE_IBT);
> + pr_warn("Hyper-V: Disabling IBT because of Hyper-V bug\n");
> + }
> +#endif
> +
> + /*
> * hyperv_init() is called before LAPIC is initialized: see
> * apic_intr_mode_init() -> x86_platform.apic_post_init() and
> * apic_bsp_setup() -> setup_local_APIC(). The direct-mode STIMER
> --
> 1.8.3.1
>
^ permalink raw reply
* Re: [PATCH V4 net-next] net: mana: Configure hwc timeout from hardware
From: Souradeep Chakrabarti @ 2023-07-25 9:49 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Souradeep Chakrabarti, kys, haiyangz, wei.liu, decui, davem,
edumazet, kuba, pabeni, longli, sharmaajay, cai.huoqing, ssengar,
vkuznets, tglx, linux-hyperv, netdev, linux-kernel, linux-rdma,
schakrabarti
In-Reply-To: <20230719070826.GF8808@unreal>
On Wed, Jul 19, 2023 at 12:43 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Tue, Jul 18, 2023 at 11:00:32AM -0700, Souradeep Chakrabarti wrote:
> > At present hwc timeout value is a fixed value. This patch sets the hwc
> > timeout from the hardware. It now uses a new hardware capability
> > GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG to query and set the value
> > in hwc_timeout.
> >
> > Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> > ---
> > V3 -> V4:
> > * Changing branch to net-next.
> > * Changed the commit message to 75 chars per line.
> > ---
> > .../net/ethernet/microsoft/mana/gdma_main.c | 30 ++++++++++++++++++-
> > .../net/ethernet/microsoft/mana/hw_channel.c | 25 +++++++++++++++-
> > include/net/mana/gdma.h | 20 ++++++++++++-
> > include/net/mana/hw_channel.h | 5 ++++
> > 4 files changed, 77 insertions(+), 3 deletions(-)
>
> <...>
>
> > gc->hwc.driver_data = NULL;
> > gc->hwc.gdma_context = NULL;
> > @@ -818,6 +839,7 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
> > dest_vrq = hwc->pf_dest_vrq_id;
> > dest_vrcq = hwc->pf_dest_vrcq_id;
> > }
> > + dev_err(hwc->dev, "HWC: timeout %u ms\n", hwc->hwc_timeout);
>
> Why do you print this message every time and with error level?
> Probably you should delete it.
>
In V5 I have changed it to dev_dbg from dev_err.
> Thanks
^ permalink raw reply
* [PATCH -next] hv: hyperv.h: Remove unused extern declaration vmbus_ontimer()
From: YueHaibing @ 2023-07-25 13:58 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui; +Cc: linux-hyperv, linux-kernel, YueHaibing
Since commit 30fbee49b071 ("Staging: hv: vmbus: Get rid of the unused function vmbus_ontimer()")
this is not used anymore, so can remove it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
include/linux/hyperv.h | 3 ---
1 file changed, 3 deletions(-)
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index bfbc37ce223b..3ac3974b3c78 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1239,9 +1239,6 @@ extern int vmbus_recvpacket_raw(struct vmbus_channel *channel,
u32 *buffer_actual_len,
u64 *requestid);
-
-extern void vmbus_ontimer(unsigned long data);
-
/* Base driver object */
struct hv_driver {
const char *name;
--
2.34.1
^ permalink raw reply related
* RE: [PATCH -next] hv: hyperv.h: Remove unused extern declaration vmbus_ontimer()
From: Michael Kelley (LINUX) @ 2023-07-25 14:12 UTC (permalink / raw)
To: YueHaibing, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
Dexuan Cui
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20230725135834.1732-1-yuehaibing@huawei.com>
From: YueHaibing <yuehaibing@huawei.com> Sent: Tuesday, July 25, 2023 6:59 AM
>
I'd suggest using "Drivers: hv: vmbus:" as the prefix in the commit message Subject.
I see that "hv: hyperv.h:" has been used a few times in the past, but my suggestion
is much more commonly used and would give better overall consistency.
> Since commit 30fbee49b071 ("Staging: hv: vmbus: Get rid of the unused function
> vmbus_ontimer()")
> this is not used anymore, so can remove it.
Indeed, yes!
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
> include/linux/hyperv.h | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index bfbc37ce223b..3ac3974b3c78 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1239,9 +1239,6 @@ extern int vmbus_recvpacket_raw(struct vmbus_channel
> *channel,
> u32 *buffer_actual_len,
> u64 *requestid);
>
> -
> -extern void vmbus_ontimer(unsigned long data);
> -
> /* Base driver object */
> struct hv_driver {
> const char *name;
> --
> 2.34.1
^ permalink raw reply
* Re: [PATCH -next] hv: hyperv.h: Remove unused extern declaration vmbus_ontimer()
From: YueHaibing @ 2023-07-25 14:17 UTC (permalink / raw)
To: Michael Kelley (LINUX), KY Srinivasan, Haiyang Zhang,
wei.liu@kernel.org, Dexuan Cui
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <SN6PR2101MB1693201F3C3FD2BCDD0843B3D703A@SN6PR2101MB1693.namprd21.prod.outlook.com>
On 2023/7/25 22:12, Michael Kelley (LINUX) wrote:
> From: YueHaibing <yuehaibing@huawei.com> Sent: Tuesday, July 25, 2023 6:59 AM
>>
>
> I'd suggest using "Drivers: hv: vmbus:" as the prefix in the commit message Subject.
> I see that "hv: hyperv.h:" has been used a few times in the past, but my suggestion
> is much more commonly used and would give better overall consistency.
Ok, will send v2.
>
>> Since commit 30fbee49b071 ("Staging: hv: vmbus: Get rid of the unused function
>> vmbus_ontimer()")
>> this is not used anymore, so can remove it.
>
> Indeed, yes!
>
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
>
>>
>> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
>> ---
>> include/linux/hyperv.h | 3 ---
>> 1 file changed, 3 deletions(-)
>>
>> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
>> index bfbc37ce223b..3ac3974b3c78 100644
>> --- a/include/linux/hyperv.h
>> +++ b/include/linux/hyperv.h
>> @@ -1239,9 +1239,6 @@ extern int vmbus_recvpacket_raw(struct vmbus_channel
>> *channel,
>> u32 *buffer_actual_len,
>> u64 *requestid);
>>
>> -
>> -extern void vmbus_ontimer(unsigned long data);
>> -
>> /* Base driver object */
>> struct hv_driver {
>> const char *name;
>> --
>> 2.34.1
>
> .
>
^ permalink raw reply
* [PATCH v2 -next] Drivers: hv: vmbus: Remove unused extern declaration vmbus_ontimer()
From: YueHaibing @ 2023-07-25 14:21 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, mikelley
Cc: linux-hyperv, linux-kernel, YueHaibing
Since commit 30fbee49b071 ("Staging: hv: vmbus: Get rid of the unused function vmbus_ontimer()")
this is not used anymore, so can remove it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---
v2: change prefix "Drivers: hv: vmbus:"
---
include/linux/hyperv.h | 3 ---
1 file changed, 3 deletions(-)
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index bfbc37ce223b..3ac3974b3c78 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1239,9 +1239,6 @@ extern int vmbus_recvpacket_raw(struct vmbus_channel *channel,
u32 *buffer_actual_len,
u64 *requestid);
-
-extern void vmbus_ontimer(unsigned long data);
-
/* Base driver object */
struct hv_driver {
const char *name;
--
2.34.1
^ permalink raw reply related
* [PATCH] x86/hyperv: Rename hv_isolation_type_snp/en_snp() to isol_type_snp_paravisor/enlightened()
From: Tianyu Lan @ 2023-07-25 15:08 UTC (permalink / raw)
To: tglx, mingo, bp, dave.hansen, x86, hpa, kys, haiyangz, wei.liu,
decui, arnd, kirill.shutemov, rppt, nikunj, thomas.lendacky,
liam.merwick, alexandr.lobakin, michael.roth, tiala,
pasha.tatashin, peterz, jpoimboe, michael.h.kelley
Cc: linux-arch, linux-hyperv, linux-kernel, vkuznets
From: Tianyu Lan <tiala@microsoft.com>
Rename hv_isolation_type_snp and hv_isolation_type_en_snp()
to make them much intuitiver.
Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
This patch is based on the patchset "x86/hyperv: Add AMD sev-snp
enlightened guest support on hyperv" https://lore.kernel.org/lkml/
20230718032304.136888-3-ltykernel@gmail.com/T/.
arch/x86/hyperv/hv_init.c | 6 +++---
arch/x86/hyperv/ivm.c | 17 +++++++++--------
arch/x86/include/asm/mshyperv.h | 8 ++++----
arch/x86/kernel/cpu/mshyperv.c | 12 ++++++------
drivers/hv/connection.c | 2 +-
drivers/hv/hv.c | 16 ++++++++--------
drivers/hv/hv_common.c | 10 +++++-----
include/asm-generic/mshyperv.h | 4 ++--
8 files changed, 38 insertions(+), 37 deletions(-)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index b004370d3b01..49054dc30604 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -52,7 +52,7 @@ static int hyperv_init_ghcb(void)
void *ghcb_va;
void **ghcb_base;
- if (!hv_isolation_type_snp())
+ if (!isol_type_snp_paravisor())
return 0;
if (!hv_ghcb_pg)
@@ -116,7 +116,7 @@ static int hv_cpu_init(unsigned int cpu)
* is blocked to run in Confidential VM. So only decrypt assist
* page in non-root partition here.
*/
- if (*hvp && hv_isolation_type_en_snp()) {
+ if (*hvp && isol_type_snp_enlightened()) {
WARN_ON_ONCE(set_memory_decrypted((unsigned long)(*hvp), 1));
memset(*hvp, 0, PAGE_SIZE);
}
@@ -453,7 +453,7 @@ void __init hyperv_init(void)
goto common_free;
}
- if (hv_isolation_type_snp()) {
+ if (isol_type_snp_paravisor()) {
/* Negotiate GHCB Version. */
if (!hv_ghcb_negotiate_protocol())
hv_ghcb_terminate(SEV_TERM_SET_GEN,
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 2eda4e69849d..2911c2525ed5 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -591,24 +591,25 @@ bool hv_is_isolation_supported(void)
return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
}
-DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
+DEFINE_STATIC_KEY_FALSE(isol_type_snp_paravisor_flag);
/*
- * hv_isolation_type_snp - Check system runs in the AMD SEV-SNP based
+ * isol_type_snp_paravisor - Check system runs in the AMD SEV-SNP based
* isolation VM.
*/
-bool hv_isolation_type_snp(void)
+bool isol_type_snp_paravisor(void)
{
- return static_branch_unlikely(&isolation_type_snp);
+ return static_branch_unlikely(&isol_type_snp_paravisor_flag);
}
-DEFINE_STATIC_KEY_FALSE(isolation_type_en_snp);
+DEFINE_STATIC_KEY_FALSE(isol_type_snp_enlightened_flag);
+
/*
- * hv_isolation_type_en_snp - Check system runs in the AMD SEV-SNP based
+ * isol_type_snp_enlightened - Check system runs in the AMD SEV-SNP based
* isolation enlightened VM.
*/
-bool hv_isolation_type_en_snp(void)
+bool isol_type_snp_enlightened(void)
{
- return static_branch_unlikely(&isolation_type_en_snp);
+ return static_branch_unlikely(&isol_type_snp_enlightened_flag);
}
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index c5a3c29fad01..51eb239d71dd 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -25,8 +25,8 @@
union hv_ghcb;
-DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
-DECLARE_STATIC_KEY_FALSE(isolation_type_en_snp);
+DECLARE_STATIC_KEY_FALSE(isol_type_snp_paravisor_flag);
+DECLARE_STATIC_KEY_FALSE(isol_type_snp_enlightened_flag);
typedef int (*hyperv_fill_flush_list_func)(
struct hv_guest_mapping_flush_list *flush,
@@ -46,7 +46,7 @@ extern void *hv_hypercall_pg;
extern u64 hv_current_partition_id;
-extern bool hv_isolation_type_en_snp(void);
+extern bool isol_type_snp_enlightened(void);
extern union hv_ghcb * __percpu *hv_ghcb_pg;
@@ -268,7 +268,7 @@ static inline void hv_sev_init_mem_and_cpu(void) {}
static int hv_snp_boot_ap(int cpu, unsigned long start_ip) {}
#endif
-extern bool hv_isolation_type_snp(void);
+extern bool isol_type_snp_paravisor(void);
static inline bool hv_is_synic_reg(unsigned int reg)
{
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 6ff0b60d30f9..d9dcee48099c 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -66,7 +66,7 @@ u64 hv_get_non_nested_register(unsigned int reg)
{
u64 value;
- if (hv_is_synic_reg(reg) && hv_isolation_type_snp())
+ if (hv_is_synic_reg(reg) && isol_type_snp_paravisor())
hv_ghcb_msr_read(reg, &value);
else
rdmsrl(reg, value);
@@ -76,7 +76,7 @@ EXPORT_SYMBOL_GPL(hv_get_non_nested_register);
void hv_set_non_nested_register(unsigned int reg, u64 value)
{
- if (hv_is_synic_reg(reg) && hv_isolation_type_snp()) {
+ if (hv_is_synic_reg(reg) && isol_type_snp_paravisor()) {
hv_ghcb_msr_write(reg, value);
/* Write proxy bit via wrmsl instruction */
@@ -300,7 +300,7 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
* Override wakeup_secondary_cpu_64 callback for SEV-SNP
* enlightened guest.
*/
- if (hv_isolation_type_en_snp())
+ if (isol_type_snp_enlightened())
apic->wakeup_secondary_cpu_64 = hv_snp_boot_ap;
if (!hv_root_partition)
@@ -421,9 +421,9 @@ static void __init ms_hyperv_init_platform(void)
if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
- static_branch_enable(&isolation_type_en_snp);
+ static_branch_enable(&isol_type_snp_enlightened_flag);
} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
- static_branch_enable(&isolation_type_snp);
+ static_branch_enable(&isol_type_snp_paravisor_flag);
}
}
@@ -545,7 +545,7 @@ static void __init ms_hyperv_init_platform(void)
if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
mark_tsc_unstable("running on Hyper-V");
- if (hv_isolation_type_en_snp())
+ if (isol_type_snp_enlightened())
hv_sev_init_mem_and_cpu();
hardlockup_detector_disable();
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 02b54f85dc60..8659d18a55fe 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -484,7 +484,7 @@ void vmbus_set_event(struct vmbus_channel *channel)
++channel->sig_events;
- if (hv_isolation_type_snp())
+ if (isol_type_snp_paravisor())
hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, &channel->sig_event,
NULL, sizeof(channel->sig_event));
else
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index ec6e35a0d9bf..7651d79205da 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -64,7 +64,7 @@ int hv_post_message(union hv_connection_id connection_id,
aligned_msg->payload_size = payload_size;
memcpy((void *)aligned_msg->payload, payload, payload_size);
- if (hv_isolation_type_snp())
+ if (isol_type_snp_paravisor())
status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
(void *)aligned_msg, NULL,
sizeof(*aligned_msg));
@@ -109,7 +109,7 @@ int hv_synic_alloc(void)
* Synic message and event pages are allocated by paravisor.
* Skip these pages allocation here.
*/
- if (!hv_isolation_type_snp() && !hv_root_partition) {
+ if (!isol_type_snp_paravisor() && !hv_root_partition) {
hv_cpu->synic_message_page =
(void *)get_zeroed_page(GFP_ATOMIC);
if (hv_cpu->synic_message_page == NULL) {
@@ -125,7 +125,7 @@ int hv_synic_alloc(void)
}
}
- if (hv_isolation_type_en_snp()) {
+ if (isol_type_snp_enlightened()) {
ret = set_memory_decrypted((unsigned long)
hv_cpu->synic_message_page, 1);
if (ret) {
@@ -174,7 +174,7 @@ void hv_synic_free(void)
= per_cpu_ptr(hv_context.cpu_context, cpu);
/* It's better to leak the page if the encryption fails. */
- if (hv_isolation_type_en_snp()) {
+ if (isol_type_snp_enlightened()) {
if (hv_cpu->synic_message_page) {
ret = set_memory_encrypted((unsigned long)
hv_cpu->synic_message_page, 1);
@@ -221,7 +221,7 @@ void hv_synic_enable_regs(unsigned int cpu)
simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
simp.simp_enabled = 1;
- if (hv_isolation_type_snp() || hv_root_partition) {
+ if (isol_type_snp_paravisor() || hv_root_partition) {
/* Mask out vTOM bit. ioremap_cache() maps decrypted */
u64 base = (simp.base_simp_gpa << HV_HYP_PAGE_SHIFT) &
~ms_hyperv.shared_gpa_boundary;
@@ -240,7 +240,7 @@ void hv_synic_enable_regs(unsigned int cpu)
siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
siefp.siefp_enabled = 1;
- if (hv_isolation_type_snp() || hv_root_partition) {
+ if (isol_type_snp_paravisor() || hv_root_partition) {
/* Mask out vTOM bit. ioremap_cache() maps decrypted */
u64 base = (siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT) &
~ms_hyperv.shared_gpa_boundary;
@@ -323,7 +323,7 @@ void hv_synic_disable_regs(unsigned int cpu)
* addresses.
*/
simp.simp_enabled = 0;
- if (hv_isolation_type_snp() || hv_root_partition) {
+ if (isol_type_snp_paravisor() || hv_root_partition) {
iounmap(hv_cpu->synic_message_page);
hv_cpu->synic_message_page = NULL;
} else {
@@ -335,7 +335,7 @@ void hv_synic_disable_regs(unsigned int cpu)
siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
siefp.siefp_enabled = 0;
- if (hv_isolation_type_snp() || hv_root_partition) {
+ if (isol_type_snp_paravisor() || hv_root_partition) {
iounmap(hv_cpu->synic_event_page);
hv_cpu->synic_event_page = NULL;
} else {
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 2d43ba2bc925..527e91409ef7 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -381,7 +381,7 @@ int hv_common_cpu_init(unsigned int cpu)
*outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
}
- if (hv_isolation_type_en_snp()) {
+ if (isol_type_snp_enlightened()) {
ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
if (ret) {
kfree(*inputarg);
@@ -509,17 +509,17 @@ bool __weak hv_is_isolation_supported(void)
}
EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
-bool __weak hv_isolation_type_snp(void)
+bool __weak isol_type_snp_paravisor(void)
{
return false;
}
-EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
+EXPORT_SYMBOL_GPL(isol_type_snp_paravisor);
-bool __weak hv_isolation_type_en_snp(void)
+bool __weak isol_type_snp_enlightened(void)
{
return false;
}
-EXPORT_SYMBOL_GPL(hv_isolation_type_en_snp);
+EXPORT_SYMBOL_GPL(isol_type_snp_enlightened);
void __weak hv_setup_vmbus_handler(void (*handler)(void))
{
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index f73a044ecaa7..d60a9306c0cc 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -64,7 +64,7 @@ extern void * __percpu *hyperv_pcpu_output_arg;
extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
-extern bool hv_isolation_type_snp(void);
+extern bool isol_type_snp_paravisor(void);
/* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
static inline int hv_result(u64 status)
@@ -279,7 +279,7 @@ bool hv_is_hyperv_initialized(void);
bool hv_is_hibernation_supported(void);
enum hv_isolation_type hv_get_isolation_type(void);
bool hv_is_isolation_supported(void);
-bool hv_isolation_type_snp(void);
+bool isol_type_snp_paravisor(void);
u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
void hyperv_cleanup(void);
bool hv_query_ext_cap(u64 cap_query);
--
2.25.1
^ permalink raw reply related
* Re: [PATCH] x86/hyperv: Rename hv_isolation_type_snp/en_snp() to isol_type_snp_paravisor/enlightened()
From: Vitaly Kuznetsov @ 2023-07-25 15:22 UTC (permalink / raw)
To: Tianyu Lan
Cc: linux-arch, linux-hyperv, linux-kernel, tglx, mingo, bp,
dave.hansen, x86, hpa, kys, haiyangz, wei.liu, decui, arnd,
kirill.shutemov, rppt, nikunj, thomas.lendacky, liam.merwick,
alexandr.lobakin, michael.roth, tiala, pasha.tatashin, peterz,
jpoimboe, michael.h.kelley
In-Reply-To: <20230725150825.283891-1-ltykernel@gmail.com>
Tianyu Lan <ltykernel@gmail.com> writes:
> From: Tianyu Lan <tiala@microsoft.com>
>
> Rename hv_isolation_type_snp and hv_isolation_type_en_snp()
> to make them much intuitiver.
>
> Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
Thanks for the patch! A few comments below ...
> ---
> This patch is based on the patchset "x86/hyperv: Add AMD sev-snp
> enlightened guest support on hyperv" https://lore.kernel.org/lkml/
> 20230718032304.136888-3-ltykernel@gmail.com/T/.
>
> arch/x86/hyperv/hv_init.c | 6 +++---
> arch/x86/hyperv/ivm.c | 17 +++++++++--------
> arch/x86/include/asm/mshyperv.h | 8 ++++----
> arch/x86/kernel/cpu/mshyperv.c | 12 ++++++------
> drivers/hv/connection.c | 2 +-
> drivers/hv/hv.c | 16 ++++++++--------
> drivers/hv/hv_common.c | 10 +++++-----
> include/asm-generic/mshyperv.h | 4 ++--
> 8 files changed, 38 insertions(+), 37 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index b004370d3b01..49054dc30604 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -52,7 +52,7 @@ static int hyperv_init_ghcb(void)
> void *ghcb_va;
> void **ghcb_base;
>
> - if (!hv_isolation_type_snp())
> + if (!isol_type_snp_paravisor())
> return 0;
>
> if (!hv_ghcb_pg)
> @@ -116,7 +116,7 @@ static int hv_cpu_init(unsigned int cpu)
> * is blocked to run in Confidential VM. So only decrypt assist
> * page in non-root partition here.
> */
> - if (*hvp && hv_isolation_type_en_snp()) {
> + if (*hvp && isol_type_snp_enlightened()) {
> WARN_ON_ONCE(set_memory_decrypted((unsigned long)(*hvp), 1));
> memset(*hvp, 0, PAGE_SIZE);
> }
> @@ -453,7 +453,7 @@ void __init hyperv_init(void)
> goto common_free;
> }
>
> - if (hv_isolation_type_snp()) {
> + if (isol_type_snp_paravisor()) {
> /* Negotiate GHCB Version. */
> if (!hv_ghcb_negotiate_protocol())
> hv_ghcb_terminate(SEV_TERM_SET_GEN,
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 2eda4e69849d..2911c2525ed5 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -591,24 +591,25 @@ bool hv_is_isolation_supported(void)
> return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
> }
>
> -DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
> +DEFINE_STATIC_KEY_FALSE(isol_type_snp_paravisor_flag);
>
> /*
> - * hv_isolation_type_snp - Check system runs in the AMD SEV-SNP based
> + * isol_type_snp_paravisor - Check system runs in the AMD SEV-SNP based
> * isolation VM.
> */
> -bool hv_isolation_type_snp(void)
> +bool isol_type_snp_paravisor(void)
I think that it would be better to keep 'hv_' prefix here for two reasons:
...
> {
> - return static_branch_unlikely(&isolation_type_snp);
> + return static_branch_unlikely(&isol_type_snp_paravisor_flag);
...
First reason is that it would be possible to drop '_flag' suffix here.
...
> }
>
> -DEFINE_STATIC_KEY_FALSE(isolation_type_en_snp);
> +DEFINE_STATIC_KEY_FALSE(isol_type_snp_enlightened_flag);
> +
> /*
> - * hv_isolation_type_en_snp - Check system runs in the AMD SEV-SNP based
> + * isol_type_snp_enlightened - Check system runs in the AMD SEV-SNP based
> * isolation enlightened VM.
> */
> -bool hv_isolation_type_en_snp(void)
> +bool isol_type_snp_enlightened(void)
> {
> - return static_branch_unlikely(&isolation_type_en_snp);
> + return static_branch_unlikely(&isol_type_snp_enlightened_flag);
> }
>
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index c5a3c29fad01..51eb239d71dd 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -25,8 +25,8 @@
>
> union hv_ghcb;
>
> -DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
> -DECLARE_STATIC_KEY_FALSE(isolation_type_en_snp);
> +DECLARE_STATIC_KEY_FALSE(isol_type_snp_paravisor_flag);
> +DECLARE_STATIC_KEY_FALSE(isol_type_snp_enlightened_flag);
>
> typedef int (*hyperv_fill_flush_list_func)(
> struct hv_guest_mapping_flush_list *flush,
> @@ -46,7 +46,7 @@ extern void *hv_hypercall_pg;
>
> extern u64 hv_current_partition_id;
>
> -extern bool hv_isolation_type_en_snp(void);
> +extern bool isol_type_snp_enlightened(void);
>
> extern union hv_ghcb * __percpu *hv_ghcb_pg;
>
> @@ -268,7 +268,7 @@ static inline void hv_sev_init_mem_and_cpu(void) {}
> static int hv_snp_boot_ap(int cpu, unsigned long start_ip) {}
> #endif
>
> -extern bool hv_isolation_type_snp(void);
> +extern bool isol_type_snp_paravisor(void);
>
> static inline bool hv_is_synic_reg(unsigned int reg)
> {
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 6ff0b60d30f9..d9dcee48099c 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -66,7 +66,7 @@ u64 hv_get_non_nested_register(unsigned int reg)
> {
> u64 value;
>
> - if (hv_is_synic_reg(reg) && hv_isolation_type_snp())
> + if (hv_is_synic_reg(reg) && isol_type_snp_paravisor())
> hv_ghcb_msr_read(reg, &value);
> else
> rdmsrl(reg, value);
> @@ -76,7 +76,7 @@ EXPORT_SYMBOL_GPL(hv_get_non_nested_register);
>
> void hv_set_non_nested_register(unsigned int reg, u64 value)
> {
> - if (hv_is_synic_reg(reg) && hv_isolation_type_snp()) {
> + if (hv_is_synic_reg(reg) && isol_type_snp_paravisor()) {
> hv_ghcb_msr_write(reg, value);
>
> /* Write proxy bit via wrmsl instruction */
> @@ -300,7 +300,7 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
> * Override wakeup_secondary_cpu_64 callback for SEV-SNP
> * enlightened guest.
> */
> - if (hv_isolation_type_en_snp())
> + if (isol_type_snp_enlightened())
> apic->wakeup_secondary_cpu_64 = hv_snp_boot_ap;
>
> if (!hv_root_partition)
> @@ -421,9 +421,9 @@ static void __init ms_hyperv_init_platform(void)
>
>
> if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
> - static_branch_enable(&isolation_type_en_snp);
> + static_branch_enable(&isol_type_snp_enlightened_flag);
> } else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
> - static_branch_enable(&isolation_type_snp);
> + static_branch_enable(&isol_type_snp_paravisor_flag);
> }
> }
>
> @@ -545,7 +545,7 @@ static void __init ms_hyperv_init_platform(void)
> if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
> mark_tsc_unstable("running on Hyper-V");
>
> - if (hv_isolation_type_en_snp())
> + if (isol_type_snp_enlightened())
> hv_sev_init_mem_and_cpu();
>
> hardlockup_detector_disable();
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 02b54f85dc60..8659d18a55fe 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -484,7 +484,7 @@ void vmbus_set_event(struct vmbus_channel *channel)
>
> ++channel->sig_events;
>
> - if (hv_isolation_type_snp())
> + if (isol_type_snp_paravisor())
> hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, &channel->sig_event,
> NULL, sizeof(channel->sig_event));
> else
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index ec6e35a0d9bf..7651d79205da 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -64,7 +64,7 @@ int hv_post_message(union hv_connection_id connection_id,
> aligned_msg->payload_size = payload_size;
> memcpy((void *)aligned_msg->payload, payload, payload_size);
>
> - if (hv_isolation_type_snp())
> + if (isol_type_snp_paravisor())
> status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
> (void *)aligned_msg, NULL,
> sizeof(*aligned_msg));
> @@ -109,7 +109,7 @@ int hv_synic_alloc(void)
> * Synic message and event pages are allocated by paravisor.
> * Skip these pages allocation here.
> */
> - if (!hv_isolation_type_snp() && !hv_root_partition) {
> + if (!isol_type_snp_paravisor() && !hv_root_partition) {
> hv_cpu->synic_message_page =
> (void *)get_zeroed_page(GFP_ATOMIC);
> if (hv_cpu->synic_message_page == NULL) {
> @@ -125,7 +125,7 @@ int hv_synic_alloc(void)
> }
> }
>
> - if (hv_isolation_type_en_snp()) {
> + if (isol_type_snp_enlightened()) {
> ret = set_memory_decrypted((unsigned long)
> hv_cpu->synic_message_page, 1);
> if (ret) {
> @@ -174,7 +174,7 @@ void hv_synic_free(void)
> = per_cpu_ptr(hv_context.cpu_context, cpu);
>
> /* It's better to leak the page if the encryption fails. */
> - if (hv_isolation_type_en_snp()) {
> + if (isol_type_snp_enlightened()) {
> if (hv_cpu->synic_message_page) {
> ret = set_memory_encrypted((unsigned long)
> hv_cpu->synic_message_page, 1);
> @@ -221,7 +221,7 @@ void hv_synic_enable_regs(unsigned int cpu)
> simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
> simp.simp_enabled = 1;
>
> - if (hv_isolation_type_snp() || hv_root_partition) {
> + if (isol_type_snp_paravisor() || hv_root_partition) {
> /* Mask out vTOM bit. ioremap_cache() maps decrypted */
> u64 base = (simp.base_simp_gpa << HV_HYP_PAGE_SHIFT) &
> ~ms_hyperv.shared_gpa_boundary;
> @@ -240,7 +240,7 @@ void hv_synic_enable_regs(unsigned int cpu)
> siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
> siefp.siefp_enabled = 1;
>
> - if (hv_isolation_type_snp() || hv_root_partition) {
> + if (isol_type_snp_paravisor() || hv_root_partition) {
> /* Mask out vTOM bit. ioremap_cache() maps decrypted */
> u64 base = (siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT) &
> ~ms_hyperv.shared_gpa_boundary;
> @@ -323,7 +323,7 @@ void hv_synic_disable_regs(unsigned int cpu)
> * addresses.
> */
> simp.simp_enabled = 0;
> - if (hv_isolation_type_snp() || hv_root_partition) {
> + if (isol_type_snp_paravisor() || hv_root_partition) {
> iounmap(hv_cpu->synic_message_page);
> hv_cpu->synic_message_page = NULL;
> } else {
> @@ -335,7 +335,7 @@ void hv_synic_disable_regs(unsigned int cpu)
> siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
> siefp.siefp_enabled = 0;
>
> - if (hv_isolation_type_snp() || hv_root_partition) {
> + if (isol_type_snp_paravisor() || hv_root_partition) {
> iounmap(hv_cpu->synic_event_page);
> hv_cpu->synic_event_page = NULL;
> } else {
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 2d43ba2bc925..527e91409ef7 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -381,7 +381,7 @@ int hv_common_cpu_init(unsigned int cpu)
> *outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
> }
>
> - if (hv_isolation_type_en_snp()) {
> + if (isol_type_snp_enlightened()) {
> ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
> if (ret) {
> kfree(*inputarg);
> @@ -509,17 +509,17 @@ bool __weak hv_is_isolation_supported(void)
> }
> EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
>
> -bool __weak hv_isolation_type_snp(void)
> +bool __weak isol_type_snp_paravisor(void)
> {
> return false;
> }
> -EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
> +EXPORT_SYMBOL_GPL(isol_type_snp_paravisor);
>
> -bool __weak hv_isolation_type_en_snp(void)
> +bool __weak isol_type_snp_enlightened(void)
> {
> return false;
> }
> -EXPORT_SYMBOL_GPL(hv_isolation_type_en_snp);
> +EXPORT_SYMBOL_GPL(isol_type_snp_enlightened);
>
> void __weak hv_setup_vmbus_handler(void (*handler)(void))
> {
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index f73a044ecaa7..d60a9306c0cc 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -64,7 +64,7 @@ extern void * __percpu *hyperv_pcpu_output_arg;
>
> extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
> extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
> -extern bool hv_isolation_type_snp(void);
> +extern bool isol_type_snp_paravisor(void);
>
> /* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
> static inline int hv_result(u64 status)
> @@ -279,7 +279,7 @@ bool hv_is_hyperv_initialized(void);
> bool hv_is_hibernation_supported(void);
> enum hv_isolation_type hv_get_isolation_type(void);
> bool hv_is_isolation_supported(void);
> -bool hv_isolation_type_snp(void);
> +bool isol_type_snp_paravisor(void);
> u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
> void hyperv_cleanup(void);
> bool hv_query_ext_cap(u64 cap_query);
...
Second reason is that 'isol_type_snp_paravisor'/'isol_type_snp_enlightened'
are exported and present in headers, it is unclear that these are
Hyper-V related.
--
Vitaly
^ permalink raw reply
* Re: [PATCH V3,net-next] net: mana: Add page pool for RX buffers
From: Jesper Dangaard Brouer @ 2023-07-25 18:01 UTC (permalink / raw)
To: Haiyang Zhang, Jesper Dangaard Brouer,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
Cc: brouer, Dexuan Cui, KY Srinivasan, Paul Rosswurm, olaf@aepfle.de,
vkuznets@redhat.com, davem@davemloft.net, wei.liu@kernel.org,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
leon@kernel.org, Long Li, ssengar@linux.microsoft.com,
linux-rdma@vger.kernel.org, daniel@iogearbox.net,
john.fastabend@gmail.com, bpf@vger.kernel.org, ast@kernel.org,
Ajay Sharma, hawk@kernel.org, tglx@linutronix.de,
shradhagupta@linux.microsoft.com, linux-kernel@vger.kernel.org,
Ilias Apalodimas
In-Reply-To: <PH7PR21MB311675E57B81B49577ADE98FCA02A@PH7PR21MB3116.namprd21.prod.outlook.com>
On 24/07/2023 20.35, Haiyang Zhang wrote:
>
[...]
>>> On 21/07/2023 21.05, Haiyang Zhang wrote:
>>>> Add page pool for RX buffers for faster buffer cycle and reduce CPU
>>>> usage.
>>>>
>>>> The standard page pool API is used.
>>>>
>>>> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
>>>> ---
>>>> V3:
>>>> Update xdp mem model, pool param, alloc as suggested by Jakub Kicinski
>>>> V2:
>>>> Use the standard page pool API as suggested by Jesper Dangaard Brouer
>>>>
>>>> ---
>>>> drivers/net/ethernet/microsoft/mana/mana_en.c | 91 +++++++++++++++--
>> --
>>>> include/net/mana/mana.h | 3 +
>>>> 2 files changed, 78 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
>>> b/drivers/net/ethernet/microsoft/mana/mana_en.c
>>>> index a499e460594b..4307f25f8c7a 100644
>>>> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
>>>> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
>>> [...]
>>>> @@ -1659,6 +1679,8 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
>>>>
>>>> if (rxq->xdp_flush)
>>>> xdp_do_flush();
>>>> +
>>>> + page_pool_nid_changed(rxq->page_pool, numa_mem_id());
>>>
>>> I don't think this page_pool_nid_changed() called is needed, if you do
>>> as I suggest below (nid = NUMA_NO_NODE).
>>>
>>>
>>>> }
>>>>
>>>> static int mana_cq_handler(void *context, struct gdma_queue
>>> *gdma_queue)
>>> [...]
>>>
>>>> @@ -2008,6 +2041,25 @@ static int mana_push_wqe(struct mana_rxq
>> *rxq)
>>>> return 0;
>>>> }
>>>>
>>>> +static int mana_create_page_pool(struct mana_rxq *rxq)
>>>> +{
>>>> + struct page_pool_params pprm = {};
>>>
>>> You are implicitly assigning NUMA node id zero.
>>>
>>>> + int ret;
>>>> +
>>>> + pprm.pool_size = RX_BUFFERS_PER_QUEUE;
>>>> + pprm.napi = &rxq->rx_cq.napi;
>>>
>>> You likely want to assign pprm.nid to NUMA_NO_NODE
>>>
>>> pprm.nid = NUMA_NO_NODE;
>>>
>>> For most drivers it is recommended to assign ``NUMA_NO_NODE`` (value -1)
>>> as the NUMA ID to ``pp_params.nid``. When ``CONFIG_NUMA`` is enabled
>>> this setting will automatically select the (preferred) NUMA node (via
>>> ``numa_mem_id()``) based on where NAPI RX-processing is currently
>>> running. The effect is that page_pool will only use recycled memory when
>>> NUMA node match running CPU. This assumes CPU refilling driver RX-ring
>>> will also run RX-NAPI.
>>>
>>> If a driver want more control over the NUMA node memory selection,
>>> drivers can assign (``pp_params.nid``) something else than
>>> `NUMA_NO_NODE`` and runtime adjust via function
>>> ``page_pool_nid_changed()``.
>>
>> Our driver is using NUMA 0 by default, so I implicitly assign NUMA node id
>> to zero during pool init.
>>
>> And, if the IRQ/CPU affinity is changed, the page_pool_nid_changed()
>> will update the nid for the pool. Does this sound good?
>>
>
> Also, since our driver is getting the default node from here:
> gc->numa_node = dev_to_node(&pdev->dev);
> I will update this patch to set the default node as above, instead of implicitly
> assigning it to 0.
>
In that case, I agree that it make sense to use dev_to_node(&pdev->dev),
like:
pprm.nid = dev_to_node(&pdev->dev);
Driver must have a reason for assigning gc->numa_node for this hardware,
which is okay. That is why page_pool API allows driver to control this.
But then I don't think you should call page_pool_nid_changed() like
page_pool_nid_changed(rxq->page_pool, numa_mem_id());
Because then you will (at first packet processing event) revert the
dev_to_node() setting to use numa_mem_id() of processing/running CPU.
(In effect this will be the same as setting NUMA_NO_NODE).
I know, mlx5 do call page_pool_nid_changed(), but they showed benchmark
numbers that this was preferred action, even-when sysadm had
"misconfigured" the default smp_affinity RX-processing to happen on a
remote NUMA node. AFAIK mlx5 keeps the descriptor rings on the
originally configured NUMA node that corresponds to the NIC PCIe slot.
--Jesper
^ permalink raw reply
* RE: [PATCH V3,net-next] net: mana: Add page pool for RX buffers
From: Haiyang Zhang @ 2023-07-25 19:02 UTC (permalink / raw)
To: Jesper Dangaard Brouer, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org
Cc: brouer@redhat.com, Dexuan Cui, KY Srinivasan, Paul Rosswurm,
olaf@aepfle.de, vkuznets@redhat.com, davem@davemloft.net,
wei.liu@kernel.org, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, leon@kernel.org, Long Li,
ssengar@linux.microsoft.com, linux-rdma@vger.kernel.org,
daniel@iogearbox.net, john.fastabend@gmail.com,
bpf@vger.kernel.org, ast@kernel.org, Ajay Sharma, hawk@kernel.org,
tglx@linutronix.de, shradhagupta@linux.microsoft.com,
linux-kernel@vger.kernel.org, Ilias Apalodimas
In-Reply-To: <729b360c-4d79-1025-f5be-384b17f132d3@redhat.com>
> -----Original Message-----
> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
> Sent: Tuesday, July 25, 2023 2:01 PM
> >>
> >> Our driver is using NUMA 0 by default, so I implicitly assign NUMA node id
> >> to zero during pool init.
> >>
> >> And, if the IRQ/CPU affinity is changed, the page_pool_nid_changed()
> >> will update the nid for the pool. Does this sound good?
> >>
> >
> > Also, since our driver is getting the default node from here:
> > gc->numa_node = dev_to_node(&pdev->dev);
> > I will update this patch to set the default node as above, instead of implicitly
> > assigning it to 0.
> >
>
> In that case, I agree that it make sense to use dev_to_node(&pdev->dev),
> like:
> pprm.nid = dev_to_node(&pdev->dev);
>
> Driver must have a reason for assigning gc->numa_node for this hardware,
> which is okay. That is why page_pool API allows driver to control this.
>
> But then I don't think you should call page_pool_nid_changed() like
>
> page_pool_nid_changed(rxq->page_pool, numa_mem_id());
>
> Because then you will (at first packet processing event) revert the
> dev_to_node() setting to use numa_mem_id() of processing/running CPU.
> (In effect this will be the same as setting NUMA_NO_NODE).
>
> I know, mlx5 do call page_pool_nid_changed(), but they showed benchmark
> numbers that this was preferred action, even-when sysadm had
> "misconfigured" the default smp_affinity RX-processing to happen on a
> remote NUMA node. AFAIK mlx5 keeps the descriptor rings on the
> originally configured NUMA node that corresponds to the NIC PCIe slot.
In mana_gd_setup_irqs(), we set the default IRQ/CPU affinity to gc->numa_node
too, so it won't revert the nid initial setting.
Currently, the Azure hypervisor always indicates numa 0 as default. (In
the future, it will start to provide the accurate default dev node.) When a
user manually changes the IRQ/CPU affinity for perf tuning, we want to
allow page_pool_nid_changed() to update the pool. Is this OK?
Thanks,
- Haiyang
^ permalink raw reply
* [PATCH] hv_balloon: Update the balloon driver to use the SBRM API
From: Mitchell Levy via B4 Relay @ 2023-07-26 0:23 UTC (permalink / raw)
To: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui
Cc: linux-hyperv, linux-kernel, mikelly, peterz, Boqun Feng,
Mitchell Levy (Microsoft)
From: Mitchell Levy <levymitchell0@gmail.com>
---
This patch is intended as a proof-of-concept for the new SBRM
machinery[1]. For some brief background, the idea behind SBRM is using
the __cleanup__ attribute to automatically unlock locks (or otherwise
release resources) when they go out of scope, similar to C++ style RAII.
This promises some benefits such as making code simpler (particularly
where you have lots of goto fail; type constructs) as well as reducing
the surface area for certain kinds of bugs.
The changes in this patch should not result in any difference in how the
code actually runs (i.e., it's purely an exercise in this new syntax
sugar). In one instance SBRM was not appropriate, so I left that part
alone, but all other locking/unlocking is handled automatically in this
patch.
Link: https://lore.kernel.org/all/20230626125726.GU4253@hirez.programming.kicks-ass.net/ [1]
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: "Mitchell Levy (Microsoft)" <levymitchell0@gmail.com>
---
drivers/hv/hv_balloon.c | 82 +++++++++++++++++++++++--------------------------
1 file changed, 38 insertions(+), 44 deletions(-)
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index dffcc894f117..2812601e84da 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -8,6 +8,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/cleanup.h>
#include <linux/kernel.h>
#include <linux/jiffies.h>
#include <linux/mman.h>
@@ -646,7 +647,7 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
void *v)
{
struct memory_notify *mem = (struct memory_notify *)v;
- unsigned long flags, pfn_count;
+ unsigned long pfn_count;
switch (val) {
case MEM_ONLINE:
@@ -655,21 +656,22 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
break;
case MEM_OFFLINE:
- spin_lock_irqsave(&dm_device.ha_lock, flags);
- pfn_count = hv_page_offline_check(mem->start_pfn,
- mem->nr_pages);
- if (pfn_count <= dm_device.num_pages_onlined) {
- dm_device.num_pages_onlined -= pfn_count;
- } else {
- /*
- * We're offlining more pages than we managed to online.
- * This is unexpected. In any case don't let
- * num_pages_onlined wrap around zero.
- */
- WARN_ON_ONCE(1);
- dm_device.num_pages_onlined = 0;
+ scoped_guard(spinlock_irqsave, &dm_device.ha_lock) {
+ pfn_count = hv_page_offline_check(mem->start_pfn,
+ mem->nr_pages);
+ if (pfn_count <= dm_device.num_pages_onlined) {
+ dm_device.num_pages_onlined -= pfn_count;
+ } else {
+ /*
+ * We're offlining more pages than we
+ * managed to online. This is
+ * unexpected. In any case don't let
+ * num_pages_onlined wrap around zero.
+ */
+ WARN_ON_ONCE(1);
+ dm_device.num_pages_onlined = 0;
+ }
}
- spin_unlock_irqrestore(&dm_device.ha_lock, flags);
break;
case MEM_GOING_ONLINE:
case MEM_GOING_OFFLINE:
@@ -721,24 +723,23 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
unsigned long start_pfn;
unsigned long processed_pfn;
unsigned long total_pfn = pfn_count;
- unsigned long flags;
for (i = 0; i < (size/HA_CHUNK); i++) {
start_pfn = start + (i * HA_CHUNK);
- spin_lock_irqsave(&dm_device.ha_lock, flags);
- has->ha_end_pfn += HA_CHUNK;
+ scoped_guard(spinlock_irqsave, &dm_device.ha_lock) {
+ has->ha_end_pfn += HA_CHUNK;
- if (total_pfn > HA_CHUNK) {
- processed_pfn = HA_CHUNK;
- total_pfn -= HA_CHUNK;
- } else {
- processed_pfn = total_pfn;
- total_pfn = 0;
- }
+ if (total_pfn > HA_CHUNK) {
+ processed_pfn = HA_CHUNK;
+ total_pfn -= HA_CHUNK;
+ } else {
+ processed_pfn = total_pfn;
+ total_pfn = 0;
+ }
- has->covered_end_pfn += processed_pfn;
- spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+ has->covered_end_pfn += processed_pfn;
+ }
reinit_completion(&dm_device.ol_waitevent);
@@ -758,10 +759,10 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
*/
do_hot_add = false;
}
- spin_lock_irqsave(&dm_device.ha_lock, flags);
- has->ha_end_pfn -= HA_CHUNK;
- has->covered_end_pfn -= processed_pfn;
- spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+ scoped_guard(spinlock_irqsave, &dm_device.ha_lock) {
+ has->ha_end_pfn -= HA_CHUNK;
+ has->covered_end_pfn -= processed_pfn;
+ }
break;
}
@@ -781,10 +782,9 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
static void hv_online_page(struct page *pg, unsigned int order)
{
struct hv_hotadd_state *has;
- unsigned long flags;
unsigned long pfn = page_to_pfn(pg);
- spin_lock_irqsave(&dm_device.ha_lock, flags);
+ guard(spinlock_irqsave)(&dm_device.ha_lock);
list_for_each_entry(has, &dm_device.ha_region_list, list) {
/* The page belongs to a different HAS. */
if ((pfn < has->start_pfn) ||
@@ -794,7 +794,6 @@ static void hv_online_page(struct page *pg, unsigned int order)
hv_bring_pgs_online(has, pfn, 1UL << order);
break;
}
- spin_unlock_irqrestore(&dm_device.ha_lock, flags);
}
static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
@@ -803,9 +802,8 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
struct hv_hotadd_gap *gap;
unsigned long residual, new_inc;
int ret = 0;
- unsigned long flags;
- spin_lock_irqsave(&dm_device.ha_lock, flags);
+ guard(spinlock_irqsave)(&dm_device.ha_lock);
list_for_each_entry(has, &dm_device.ha_region_list, list) {
/*
* If the pfn range we are dealing with is not in the current
@@ -852,7 +850,6 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
ret = 1;
break;
}
- spin_unlock_irqrestore(&dm_device.ha_lock, flags);
return ret;
}
@@ -947,7 +944,6 @@ static unsigned long process_hot_add(unsigned long pg_start,
{
struct hv_hotadd_state *ha_region = NULL;
int covered;
- unsigned long flags;
if (pfn_cnt == 0)
return 0;
@@ -979,9 +975,9 @@ static unsigned long process_hot_add(unsigned long pg_start,
ha_region->covered_end_pfn = pg_start;
ha_region->end_pfn = rg_start + rg_size;
- spin_lock_irqsave(&dm_device.ha_lock, flags);
- list_add_tail(&ha_region->list, &dm_device.ha_region_list);
- spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+ scoped_guard(spinlock_irqsave, &dm_device.ha_lock) {
+ list_add_tail(&ha_region->list, &dm_device.ha_region_list);
+ }
}
do_pg_range:
@@ -2047,7 +2043,6 @@ static void balloon_remove(struct hv_device *dev)
struct hv_dynmem_device *dm = hv_get_drvdata(dev);
struct hv_hotadd_state *has, *tmp;
struct hv_hotadd_gap *gap, *tmp_gap;
- unsigned long flags;
if (dm->num_pages_ballooned != 0)
pr_warn("Ballooned pages: %d\n", dm->num_pages_ballooned);
@@ -2073,7 +2068,7 @@ static void balloon_remove(struct hv_device *dev)
#endif
}
- spin_lock_irqsave(&dm_device.ha_lock, flags);
+ guard(spinlock_irqsave)(&dm_device.ha_lock);
list_for_each_entry_safe(has, tmp, &dm->ha_region_list, list) {
list_for_each_entry_safe(gap, tmp_gap, &has->gap_list, list) {
list_del(&gap->list);
@@ -2082,7 +2077,6 @@ static void balloon_remove(struct hv_device *dev)
list_del(&has->list);
kfree(has);
}
- spin_unlock_irqrestore(&dm_device.ha_lock, flags);
}
static int balloon_suspend(struct hv_device *hv_dev)
---
base-commit: 3f01e9fed8454dcd89727016c3e5b2fbb8f8e50c
change-id: 20230725-master-bbcd9205758b
Best regards,
--
Mitchell Levy <levymitchell0@gmail.com>
^ permalink raw reply related
* Re: [PATCH V5 net-next] net: mana: Configure hwc timeout from hardware
From: Jakub Kicinski @ 2023-07-26 2:56 UTC (permalink / raw)
To: Souradeep Chakrabarti
Cc: kys, haiyangz, wei.liu, decui, davem, edumazet, pabeni, longli,
sharmaajay, leon, cai.huoqing, ssengar, vkuznets, tglx,
linux-hyperv, netdev, linux-kernel, linux-rdma, schakrabarti
In-Reply-To: <1690177120-20938-1-git-send-email-schakrabarti@linux.microsoft.com>
On Sun, 23 Jul 2023 22:38:40 -0700 Souradeep Chakrabarti wrote:
> @@ -825,7 +847,8 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
> goto out;
> }
>
> - if (!wait_for_completion_timeout(&ctx->comp_event, 30 * HZ)) {
> + if (!wait_for_completion_timeout(&ctx->comp_event,
> + (hwc->hwc_timeout / 1000) * HZ)) {
> dev_err(hwc->dev, "HWC: Request timed out!\n");
> err = -ETIMEDOUT;
> goto out;
msecs_to_jiffies()
--
pw-bot: cr
^ permalink raw reply
* Re: [PATCH] x86/hyperv: Rename hv_isolation_type_snp/en_snp() to isol_type_snp_paravisor/enlightened()
From: Tianyu Lan @ 2023-07-26 2:57 UTC (permalink / raw)
To: Vitaly Kuznetsov
Cc: linux-arch, linux-hyperv, linux-kernel, tglx, mingo, bp,
dave.hansen, x86, hpa, kys, haiyangz, wei.liu, decui, arnd,
kirill.shutemov, rppt, nikunj, thomas.lendacky, liam.merwick,
alexandr.lobakin, michael.roth, tiala, pasha.tatashin, peterz,
jpoimboe, michael.h.kelley
In-Reply-To: <871qgwow1q.fsf@redhat.com>
On 7/25/2023 11:22 PM, Vitaly Kuznetsov wrote:
> Tianyu Lan <ltykernel@gmail.com> writes:
>
>> From: Tianyu Lan <tiala@microsoft.com>
>>
>> Rename hv_isolation_type_snp and hv_isolation_type_en_snp()
>> to make them much intuitiver.
>>
>> Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
>
> Thanks for the patch! A few comments below ...
>
>> ---
>> This patch is based on the patchset "x86/hyperv: Add AMD sev-snp
>> enlightened guest support on hyperv" https://lore.kernel.org/lkml/
>> 20230718032304.136888-3-ltykernel@gmail.com/T/.
>>
>> arch/x86/hyperv/hv_init.c | 6 +++---
>> arch/x86/hyperv/ivm.c | 17 +++++++++--------
>> arch/x86/include/asm/mshyperv.h | 8 ++++----
>> arch/x86/kernel/cpu/mshyperv.c | 12 ++++++------
>> drivers/hv/connection.c | 2 +-
>> drivers/hv/hv.c | 16 ++++++++--------
>> drivers/hv/hv_common.c | 10 +++++-----
>> include/asm-generic/mshyperv.h | 4 ++--
>> 8 files changed, 38 insertions(+), 37 deletions(-)
>>
>> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
>> index 2eda4e69849d..2911c2525ed5 100644
>> --- a/arch/x86/hyperv/ivm.c
>> +++ b/arch/x86/hyperv/ivm.c
>> @@ -591,24 +591,25 @@ bool hv_is_isolation_supported(void)
>> return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>> }
>>
>> -DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
>> +DEFINE_STATIC_KEY_FALSE(isol_type_snp_paravisor_flag);
>>
>> /*
>> - * hv_isolation_type_snp - Check system runs in the AMD SEV-SNP based
>> + * isol_type_snp_paravisor - Check system runs in the AMD SEV-SNP based
>> * isolation VM.
>> */
>> -bool hv_isolation_type_snp(void)
>> +bool isol_type_snp_paravisor(void)
>
>
> I think that it would be better to keep 'hv_' prefix here for two reasons:
> ...
>
Agree. Will update.
Thanks.
^ permalink raw reply
* RE: [PATCH V3 1/9] x86/hyperv: Add sev-snp enlightened guest static key
From: Michael Kelley (LINUX) @ 2023-07-26 3:26 UTC (permalink / raw)
To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
Dexuan Cui, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
daniel.lezcano@linaro.org, arnd@arndb.de
Cc: Tianyu Lan, linux-arch@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
vkuznets@redhat.com
In-Reply-To: <20230718032304.136888-2-ltykernel@gmail.com>
From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, July 17, 2023 8:23 PM
>
> Introduce static key isolation_type_en_snp for enlightened
> sev-snp guest check.
>
> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
> ---
> arch/x86/hyperv/ivm.c | 11 +++++++++++
> arch/x86/include/asm/mshyperv.h | 3 +++
> arch/x86/kernel/cpu/mshyperv.c | 9 +++++++--
> drivers/hv/hv_common.c | 6 ++++++
> include/asm-generic/mshyperv.h | 12 +++++++++---
> 5 files changed, 36 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 14f46ad2ca64..b2b5cb19fac9 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -413,3 +413,14 @@ bool hv_isolation_type_snp(void)
> {
> return static_branch_unlikely(&isolation_type_snp);
> }
> +
> +DEFINE_STATIC_KEY_FALSE(isolation_type_en_snp);
> +/*
> + * hv_isolation_type_en_snp - Check system runs in the AMD SEV-SNP based
> + * isolation enlightened VM.
> + */
> +bool hv_isolation_type_en_snp(void)
> +{
> + return static_branch_unlikely(&isolation_type_en_snp);
> +}
> +
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 88d9ef98e087..2fa38e9f6207 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -26,6 +26,7 @@
> union hv_ghcb;
>
> DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
> +DECLARE_STATIC_KEY_FALSE(isolation_type_en_snp);
>
> typedef int (*hyperv_fill_flush_list_func)(
> struct hv_guest_mapping_flush_list *flush,
> @@ -45,6 +46,8 @@ extern void *hv_hypercall_pg;
>
> extern u64 hv_current_partition_id;
>
> +extern bool hv_isolation_type_en_snp(void);
> +
> extern union hv_ghcb * __percpu *hv_ghcb_pg;
>
> int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index c7969e806c64..5398fb2f4d39 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -402,8 +402,12 @@ static void __init ms_hyperv_init_platform(void)
> pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
> ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
>
> - if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
> +
> + if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
> + static_branch_enable(&isolation_type_en_snp);
> + } else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
> static_branch_enable(&isolation_type_snp);
> + }
> }
>
> if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
> @@ -473,7 +477,8 @@ static void __init ms_hyperv_init_platform(void)
>
> #if IS_ENABLED(CONFIG_HYPERV)
> if ((hv_get_isolation_type() == HV_ISOLATION_TYPE_VBS) ||
> - (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP))
> + ((hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) &&
> + ms_hyperv.paravisor_present))
> hv_vtom_init();
> /*
> * Setup the hook to get control post apic initialization.
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 542a1d53b303..4b4aa53c34c2 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -502,6 +502,12 @@ bool __weak hv_isolation_type_snp(void)
> }
> EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
>
> +bool __weak hv_isolation_type_en_snp(void)
> +{
> + return false;
> +}
> +EXPORT_SYMBOL_GPL(hv_isolation_type_en_snp);
> +
> void __weak hv_setup_vmbus_handler(void (*handler)(void))
> {
> }
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 402a8c1c202d..6b5c41f90398 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -36,15 +36,21 @@ struct ms_hyperv_info {
> u32 nested_features;
> u32 max_vp_index;
> u32 max_lp_index;
> - u32 isolation_config_a;
> + union {
> + u32 isolation_config_a;
> + struct {
> + u32 paravisor_present : 1;
> + u32 reserved_a1 : 31;
> + };
> + };
> union {
> u32 isolation_config_b;
> struct {
> u32 cvm_type : 4;
> - u32 reserved1 : 1;
> + u32 reserved_b1 : 1;
> u32 shared_gpa_boundary_active : 1;
> u32 shared_gpa_boundary_bits : 6;
> - u32 reserved2 : 20;
> + u32 reserved_b2 : 20;
> };
> };
> u64 shared_gpa_boundary;
> --
> 2.25.1
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
^ permalink raw reply
* RE: [PATCH V3 2/9] x86/hyperv: Set Virtual Trust Level in VMBus init message
From: Michael Kelley (LINUX) @ 2023-07-26 3:28 UTC (permalink / raw)
To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
Dexuan Cui, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
daniel.lezcano@linaro.org, arnd@arndb.de
Cc: Tianyu Lan, linux-arch@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
vkuznets@redhat.com
In-Reply-To: <20230718032304.136888-3-ltykernel@gmail.com>
From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, July 17, 2023 8:23 PM
>
> SEV-SNP guests on Hyper-V can run at multiple Virtual Trust
> Levels (VTL). During boot, get the VTL at which we're running
> using the GET_VP_REGISTERs hypercall, and save the value
> for future use. Then during VMBus initialization, set the VTL
> with the saved value as required in the VMBus init message.
>
> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
> ---
> * Change since v2:
> Update the change log.
> ---
> arch/x86/hyperv/hv_init.c | 36 ++++++++++++++++++++++++++++++
> arch/x86/include/asm/hyperv-tlfs.h | 7 ++++++
> drivers/hv/connection.c | 1 +
> include/asm-generic/mshyperv.h | 1 +
> include/linux/hyperv.h | 4 ++--
> 5 files changed, 47 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 6c04b52f139b..1ba367a9686e 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -378,6 +378,40 @@ static void __init hv_get_partition_id(void)
> local_irq_restore(flags);
> }
>
> +static u8 __init get_vtl(void)
> +{
> + u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS;
> + struct hv_get_vp_registers_input *input;
> + struct hv_get_vp_registers_output *output;
> + u64 vtl = 0;
> + u64 ret;
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + output = (struct hv_get_vp_registers_output *)input;
> + if (!input) {
> + local_irq_restore(flags);
> + goto done;
> + }
> +
> + memset(input, 0, struct_size(input, element, 1));
> + input->header.partitionid = HV_PARTITION_ID_SELF;
> + input->header.vpindex = HV_VP_INDEX_SELF;
> + input->header.inputvtl = 0;
> + input->element[0].name0 = HV_X64_REGISTER_VSM_VP_STATUS;
> +
> + ret = hv_do_hypercall(control, input, output);
> + if (hv_result_success(ret))
> + vtl = output->as64.low & HV_X64_VTL_MASK;
> + else
> + pr_err("Hyper-V: failed to get VTL! %lld", ret);
> + local_irq_restore(flags);
> +
> +done:
> + return vtl;
> +}
> +
> /*
> * This function is to be invoked early in the boot sequence after the
> * hypervisor has been detected.
> @@ -506,6 +540,8 @@ void __init hyperv_init(void)
> /* Query the VMs extended capability once, so that it can be cached. */
> hv_query_ext_cap(0);
>
> + /* Find the VTL */
> + ms_hyperv.vtl = get_vtl();
> return;
>
> clean_guest_os_id:
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index cea95dcd27c2..4bf0b315b0ce 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -301,6 +301,13 @@ enum hv_isolation_type {
> #define HV_X64_MSR_TIME_REF_COUNT HV_REGISTER_TIME_REF_COUNT
> #define HV_X64_MSR_REFERENCE_TSC HV_REGISTER_REFERENCE_TSC
>
> +/*
> + * Registers are only accessible via HVCALL_GET_VP_REGISTERS hvcall and
> + * there is not associated MSR address.
> + */
> +#define HV_X64_REGISTER_VSM_VP_STATUS 0x000D0003
> +#define HV_X64_VTL_MASK GENMASK(3, 0)
> +
> /* Hyper-V memory host visibility */
> enum hv_mem_host_visibility {
> VMBUS_PAGE_NOT_VISIBLE = 0,
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 5978e9dbc286..02b54f85dc60 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -98,6 +98,7 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo
> *msginfo, u32 version)
> */
> if (version >= VERSION_WIN10_V5) {
> msg->msg_sint = VMBUS_MESSAGE_SINT;
> + msg->msg_vtl = ms_hyperv.vtl;
> vmbus_connection.msg_conn_id = VMBUS_MESSAGE_CONNECTION_ID_4;
> } else {
> msg->interrupt_page = virt_to_phys(vmbus_connection.int_page);
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 6b5c41f90398..f73a044ecaa7 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -54,6 +54,7 @@ struct ms_hyperv_info {
> };
> };
> u64 shared_gpa_boundary;
> + u8 vtl;
> };
> extern struct ms_hyperv_info ms_hyperv;
> extern bool hv_nested;
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index bfbc37ce223b..1f2bfec4abde 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -665,8 +665,8 @@ struct vmbus_channel_initiate_contact {
> u64 interrupt_page;
> struct {
> u8 msg_sint;
> - u8 padding1[3];
> - u32 padding2;
> + u8 msg_vtl;
> + u8 reserved[6];
> };
> };
> u64 monitor_page1;
> --
> 2.25.1
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
^ permalink raw reply
* RE: [PATCH V3 5/9] x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest
From: Michael Kelley (LINUX) @ 2023-07-26 3:44 UTC (permalink / raw)
To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
Dexuan Cui, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
daniel.lezcano@linaro.org, arnd@arndb.de
Cc: Tianyu Lan, linux-arch@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
vkuznets@redhat.com
In-Reply-To: <20230718032304.136888-6-ltykernel@gmail.com>
From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, July 17, 2023 8:23 PM
>
> In sev-snp enlightened guest, Hyper-V hypercall needs
> to use vmmcall to trigger vmexit and notify hypervisor
> to handle hypercall request.
>
> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
> ---
> arch/x86/include/asm/mshyperv.h | 27 ++++++++++++++-------------
> 1 file changed, 14 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 2fa38e9f6207..025eda129d99 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -64,12 +64,12 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> if (!hv_hypercall_pg)
> return U64_MAX;
>
> - __asm__ __volatile__("mov %4, %%r8\n"
> - CALL_NOSPEC
> + __asm__ __volatile__("mov %[output], %%r8\n"
> + ALTERNATIVE("vmmcall", CALL_NOSPEC, X86_FEATURE_SEV_ES)
Since this code is for SEV-SNP, what's the thinking behind using
X86_FEATURE_SEV_ES in the ALTERNATIVE statements? Don't you need
to use X86_FEATURE_SEV_SNP (which is being added in another patch set that
Boris Petkov pointed out).
Also, does this patch depend on Peter Zijlstra's patch to support nested
ALTERNATIVE statements? If so, that needs to be called out, probably in
the cover letter. Peter's patch doesn't yet appear in linux-next.
Michael
> : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> - "+c" (control), "+d" (input_address)
> - : "r" (output_address),
> - THUNK_TARGET(hv_hypercall_pg)
> + "+c" (control), "+d" (input_address)
> + : [output] "r" (output_address),
> + THUNK_TARGET(hv_hypercall_pg)
> : "cc", "memory", "r8", "r9", "r10", "r11");
> #else
> u32 input_address_hi = upper_32_bits(input_address);
> @@ -105,7 +105,8 @@ static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
>
> #ifdef CONFIG_X86_64
> {
> - __asm__ __volatile__(CALL_NOSPEC
> + __asm__ __volatile__("mov %[thunk_target], %%r8\n"
> + ALTERNATIVE("vmmcall", CALL_NOSPEC, X86_FEATURE_SEV_ES)
> : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> "+c" (control), "+d" (input1)
> : THUNK_TARGET(hv_hypercall_pg)
> @@ -150,13 +151,13 @@ static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
>
> #ifdef CONFIG_X86_64
> {
> - __asm__ __volatile__("mov %4, %%r8\n"
> - CALL_NOSPEC
> - : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> - "+c" (control), "+d" (input1)
> - : "r" (input2),
> - THUNK_TARGET(hv_hypercall_pg)
> - : "cc", "r8", "r9", "r10", "r11");
> + __asm__ __volatile__("mov %[output], %%r8\n"
> + ALTERNATIVE("vmmcall", CALL_NOSPEC, X86_FEATURE_SEV_ES)
> + : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> + "+c" (control), "+d" (input1)
> + : [output] "r" (input2),
> + THUNK_TARGET(hv_hypercall_pg)
> + : "cc", "r8", "r9", "r10", "r11");
> }
> #else
> {
> --
> 2.25.1
^ permalink raw reply
* [Patch v2 0/5] RDMA/mana_ib Read Capabilities
From: sharmaajay @ 2023-07-26 3:56 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky, Dexuan Cui, Wei Liu,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: linux-rdma, linux-hyperv, netdev, linux-kernel, Ajay Sharma
From: Ajay Sharma <sharmaajay@microsoft.com>
This version has style formatting corrections.
The V1 introduced changes to control resource
allocation. The resources are managed by
Management SW and the patch allows reading the
limits to prevent overflow.
Ajay Sharma (5):
RDMA/mana-ib : Rename all mana_ib_dev type variables to mib_dev
RDMA/mana_ib : Register Mana IB device with Management SW
RDMA/mana_ib : Add error eq and notification from SoC
RDMA/mana_ib : Create Adapter - each vf bound to adapter object
RDMA/mana_ib : Query adapter capabilities
drivers/infiniband/hw/mana/cq.c | 12 +-
drivers/infiniband/hw/mana/device.c | 72 +++--
drivers/infiniband/hw/mana/main.c | 258 ++++++++++++------
drivers/infiniband/hw/mana/mana_ib.h | 96 ++++++-
drivers/infiniband/hw/mana/mr.c | 42 ++-
drivers/infiniband/hw/mana/qp.c | 82 +++---
drivers/infiniband/hw/mana/wq.c | 21 +-
.../net/ethernet/microsoft/mana/gdma_main.c | 151 +++++-----
drivers/net/ethernet/microsoft/mana/mana_en.c | 3 +
include/net/mana/gdma.h | 16 +-
10 files changed, 505 insertions(+), 248 deletions(-)
--
2.25.1
^ permalink raw reply
* [Patch v2 2/5] RDMA/mana_ib : Register Mana IB device with Management SW
From: sharmaajay @ 2023-07-26 3:56 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky, Dexuan Cui, Wei Liu,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: linux-rdma, linux-hyperv, netdev, linux-kernel, Ajay Sharma
In-Reply-To: <1690343820-20188-1-git-send-email-sharmaajay@linuxonhyperv.com>
From: Ajay Sharma <sharmaajay@microsoft.com>
Register the Mana IB device with Management
which would later help get an adapter handle.
Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com>
---
drivers/infiniband/hw/mana/device.c | 20 +++++--
drivers/infiniband/hw/mana/main.c | 58 ++++++-------------
drivers/infiniband/hw/mana/mana_ib.h | 1 +
drivers/infiniband/hw/mana/mr.c | 17 ++----
drivers/infiniband/hw/mana/qp.c | 10 ++--
.../net/ethernet/microsoft/mana/gdma_main.c | 5 ++
include/net/mana/gdma.h | 3 +
7 files changed, 55 insertions(+), 59 deletions(-)
diff --git a/drivers/infiniband/hw/mana/device.c b/drivers/infiniband/hw/mana/device.c
index 083f27246ba8..ea4c8c8fc10d 100644
--- a/drivers/infiniband/hw/mana/device.c
+++ b/drivers/infiniband/hw/mana/device.c
@@ -78,22 +78,34 @@ static int mana_ib_probe(struct auxiliary_device *adev,
mib_dev->ib_dev.num_comp_vectors = 1;
mib_dev->ib_dev.dev.parent = mdev->gdma_context->dev;
- ret = ib_register_device(&mib_dev->ib_dev, "mana_%d",
- mdev->gdma_context->dev);
+ ret = mana_gd_register_device(&mib_dev->gc->mana_ib);
if (ret) {
- ib_dealloc_device(&mib_dev->ib_dev);
- return ret;
+ ibdev_err(&mib_dev->ib_dev, "Failed to register device, ret %d",
+ ret);
+ goto free_ib_device;
}
+ ret = ib_register_device(&mib_dev->ib_dev, "mana_%d",
+ mdev->gdma_context->dev);
+ if (ret)
+ goto deregister_device;
+
dev_set_drvdata(&adev->dev, mib_dev);
return 0;
+
+deregister_device:
+ mana_gd_deregister_device(&mib_dev->gc->mana_ib);
+free_ib_device:
+ ib_dealloc_device(&mib_dev->ib_dev);
+ return ret;
}
static void mana_ib_remove(struct auxiliary_device *adev)
{
struct mana_ib_dev *mib_dev = dev_get_drvdata(&adev->dev);
+ mana_gd_deregister_device(&mib_dev->gc->mana_ib);
ib_unregister_device(&mib_dev->ib_dev);
ib_dealloc_device(&mib_dev->ib_dev);
}
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index 189e774cdab6..2c4e3c496644 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -8,7 +8,7 @@
void mana_ib_uncfg_vport(struct mana_ib_dev *mib_dev, struct mana_ib_pd *pd,
u32 port)
{
- struct gdma_dev *gd = mib_dev->gdma_dev;
+ struct gdma_dev *gd = &mib_dev->gc->mana;
struct mana_port_context *mpc;
struct net_device *ndev;
struct mana_context *mc;
@@ -32,7 +32,7 @@ int mana_ib_cfg_vport(struct mana_ib_dev *mib_dev, u32 port,
struct mana_ib_pd *pd,
u32 doorbell_id)
{
- struct gdma_dev *mdev = mib_dev->gdma_dev;
+ struct gdma_dev *mdev = &mib_dev->gc->mana;
struct mana_port_context *mpc;
struct mana_context *mc;
struct net_device *ndev;
@@ -81,17 +81,16 @@ int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
struct gdma_create_pd_req req = {};
enum gdma_pd_flags flags = 0;
struct mana_ib_dev *mib_dev;
- struct gdma_dev *mdev;
+
int err;
mib_dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
- mdev = mib_dev->gdma_dev;
mana_gd_init_req_hdr(&req.hdr, GDMA_CREATE_PD, sizeof(req),
sizeof(resp));
req.flags = flags;
- err = mana_gd_send_request(mdev->gdma_context, sizeof(req), &req,
+ err = mana_gd_send_request(mib_dev->gc, sizeof(req), &req,
sizeof(resp), &resp);
if (err || resp.hdr.status) {
@@ -121,17 +120,15 @@ int mana_ib_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
struct gdma_destory_pd_resp resp = {};
struct gdma_destroy_pd_req req = {};
struct mana_ib_dev *mib_dev;
- struct gdma_dev *mdev;
int err;
mib_dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
- mdev = mib_dev->gdma_dev;
mana_gd_init_req_hdr(&req.hdr, GDMA_DESTROY_PD, sizeof(req),
sizeof(resp));
req.pd_handle = pd->pd_handle;
- err = mana_gd_send_request(mdev->gdma_context, sizeof(req), &req,
+ err = mana_gd_send_request(mib_dev->gc, sizeof(req), &req,
sizeof(resp), &resp);
if (err || resp.hdr.status) {
@@ -207,17 +204,13 @@ int mana_ib_alloc_ucontext(struct ib_ucontext *ibcontext,
container_of(ibcontext, struct mana_ib_ucontext, ibucontext);
struct ib_device *ibdev = ibcontext->device;
struct mana_ib_dev *mib_dev;
- struct gdma_context *gc;
- struct gdma_dev *dev;
int doorbell_page;
int ret;
mib_dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
- dev = mib_dev->gdma_dev;
- gc = dev->gdma_context;
/* Allocate a doorbell page index */
- ret = mana_gd_allocate_doorbell_page(gc, &doorbell_page);
+ ret = mana_gd_allocate_doorbell_page(mib_dev->gc, &doorbell_page);
if (ret) {
ibdev_dbg(ibdev, "Failed to allocate doorbell page %d\n", ret);
return ret;
@@ -236,20 +229,17 @@ void mana_ib_dealloc_ucontext(struct ib_ucontext *ibcontext)
container_of(ibcontext, struct mana_ib_ucontext, ibucontext);
struct ib_device *ibdev = ibcontext->device;
struct mana_ib_dev *mib_dev;
- struct gdma_context *gc;
int ret;
mib_dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
- gc = mib_dev->gdma_dev->gdma_context;
- ret = mana_gd_destroy_doorbell_page(gc, mana_ucontext->doorbell);
+ ret = mana_gd_destroy_doorbell_page(mib_dev->gc, mana_ucontext->doorbell);
if (ret)
ibdev_dbg(ibdev, "Failed to destroy doorbell page %d\n", ret);
}
static int
mana_ib_gd_first_dma_region(struct mana_ib_dev *mib_dev,
- struct gdma_context *gc,
struct gdma_create_dma_region_req *create_req,
size_t num_pages, mana_handle_t *gdma_region,
u32 expected_status)
@@ -262,7 +252,7 @@ mana_ib_gd_first_dma_region(struct mana_ib_dev *mib_dev,
struct_size(create_req, page_addr_list, num_pages);
create_req->page_addr_list_len = num_pages;
- err = mana_gd_send_request(gc, create_req_msg_size, create_req,
+ err = mana_gd_send_request(mib_dev->gc, create_req_msg_size, create_req,
sizeof(create_resp), &create_resp);
if (err || create_resp.hdr.status != expected_status) {
ibdev_dbg(&mib_dev->ib_dev,
@@ -282,7 +272,7 @@ mana_ib_gd_first_dma_region(struct mana_ib_dev *mib_dev,
}
static int
-mana_ib_gd_add_dma_region(struct mana_ib_dev *mib_dev, struct gdma_context *gc,
+mana_ib_gd_add_dma_region(struct mana_ib_dev *mib_dev,
struct gdma_dma_region_add_pages_req *add_req,
unsigned int num_pages, u32 expected_status)
{
@@ -295,7 +285,7 @@ mana_ib_gd_add_dma_region(struct mana_ib_dev *mib_dev, struct gdma_context *gc,
add_req_msg_size, sizeof(add_resp));
add_req->page_addr_list_len = num_pages;
- err = mana_gd_send_request(gc, add_req_msg_size, add_req,
+ err = mana_gd_send_request(mib_dev->gc, add_req_msg_size, add_req,
sizeof(add_resp), &add_resp);
if (err || add_resp.hdr.status != expected_status) {
ibdev_dbg(&mib_dev->ib_dev,
@@ -323,18 +313,14 @@ int mana_ib_gd_create_dma_region(struct mana_ib_dev *mib_dev,
struct ib_block_iter biter;
size_t max_pgs_add_cmd = 0;
size_t max_pgs_create_cmd;
- struct gdma_context *gc;
size_t num_pages_total;
- struct gdma_dev *mdev;
unsigned long page_sz;
unsigned int tail = 0;
u64 *page_addr_list;
void *request_buf;
int err;
- mdev = mib_dev->gdma_dev;
- gc = mdev->gdma_context;
- hwc = gc->hwc.driver_data;
+ hwc = mib_dev->gc->hwc.driver_data;
/* Hardware requires dma region to align to chosen page size */
page_sz = ib_umem_find_best_pgsz(umem, PAGE_SZ_BM, 0);
@@ -388,7 +374,7 @@ int mana_ib_gd_create_dma_region(struct mana_ib_dev *mib_dev,
if (!num_pages_processed) {
/* First create message */
- err = mana_ib_gd_first_dma_region(mib_dev, gc, create_req,
+ err = mana_ib_gd_first_dma_region(mib_dev, create_req,
tail, gdma_region,
expected_status);
if (err)
@@ -403,7 +389,7 @@ int mana_ib_gd_create_dma_region(struct mana_ib_dev *mib_dev,
page_addr_list = add_req->page_addr_list;
} else {
/* Subsequent create messages */
- err = mana_ib_gd_add_dma_region(mib_dev, gc, add_req, tail,
+ err = mana_ib_gd_add_dma_region(mib_dev, add_req, tail,
expected_status);
if (err)
break;
@@ -429,13 +415,9 @@ int mana_ib_gd_create_dma_region(struct mana_ib_dev *mib_dev,
int mana_ib_gd_destroy_dma_region(struct mana_ib_dev *mib_dev, u64 gdma_region)
{
- struct gdma_dev *mdev = mib_dev->gdma_dev;
- struct gdma_context *gc;
-
- gc = mdev->gdma_context;
ibdev_dbg(&mib_dev->ib_dev, "destroy dma region 0x%llx\n", gdma_region);
- return mana_gd_destroy_dma_region(gc, gdma_region);
+ return mana_gd_destroy_dma_region(mib_dev->gc, gdma_region);
}
int mana_ib_mmap(struct ib_ucontext *ibcontext, struct vm_area_struct *vma)
@@ -444,13 +426,11 @@ int mana_ib_mmap(struct ib_ucontext *ibcontext, struct vm_area_struct *vma)
container_of(ibcontext, struct mana_ib_ucontext, ibucontext);
struct ib_device *ibdev = ibcontext->device;
struct mana_ib_dev *mib_dev;
- struct gdma_context *gc;
phys_addr_t pfn;
pgprot_t prot;
int ret;
mib_dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
- gc = mib_dev->gdma_dev->gdma_context;
if (vma->vm_pgoff != 0) {
ibdev_dbg(ibdev, "Unexpected vm_pgoff %lu\n", vma->vm_pgoff);
@@ -458,18 +438,18 @@ int mana_ib_mmap(struct ib_ucontext *ibcontext, struct vm_area_struct *vma)
}
/* Map to the page indexed by ucontext->doorbell */
- pfn = (gc->phys_db_page_base +
- gc->db_page_size * mana_ucontext->doorbell) >>
+ pfn = (mib_dev->gc->phys_db_page_base +
+ mib_dev->gc->db_page_size * mana_ucontext->doorbell) >>
PAGE_SHIFT;
prot = pgprot_writecombine(vma->vm_page_prot);
- ret = rdma_user_mmap_io(ibcontext, vma, pfn, gc->db_page_size, prot,
- NULL);
+ ret = rdma_user_mmap_io(ibcontext, vma, pfn, mib_dev->gc->db_page_size,
+ prot, NULL);
if (ret)
ibdev_dbg(ibdev, "can't rdma_user_mmap_io ret %d\n", ret);
else
ibdev_dbg(ibdev, "mapped I/O pfn 0x%llx page_size %u, ret %d\n",
- pfn, gc->db_page_size, ret);
+ pfn, mib_dev->gc->db_page_size, ret);
return ret;
}
diff --git a/drivers/infiniband/hw/mana/mana_ib.h b/drivers/infiniband/hw/mana/mana_ib.h
index ee4efd0af278..3a2ba6b96f15 100644
--- a/drivers/infiniband/hw/mana/mana_ib.h
+++ b/drivers/infiniband/hw/mana/mana_ib.h
@@ -30,6 +30,7 @@
struct mana_ib_dev {
struct ib_device ib_dev;
struct gdma_dev *gdma_dev;
+ struct gdma_context *gc;
};
struct mana_ib_wq {
diff --git a/drivers/infiniband/hw/mana/mr.c b/drivers/infiniband/hw/mana/mr.c
index f6a53906204d..3106d1bce837 100644
--- a/drivers/infiniband/hw/mana/mr.c
+++ b/drivers/infiniband/hw/mana/mr.c
@@ -29,13 +29,10 @@ static int mana_ib_gd_create_mr(struct mana_ib_dev *mib_dev,
struct mana_ib_mr *mr,
struct gdma_create_mr_params *mr_params)
{
- struct gdma_dev *mdev = mib_dev->gdma_dev;
struct gdma_create_mr_response resp = {};
struct gdma_create_mr_request req = {};
- struct gdma_context *gc;
int err;
- gc = mdev->gdma_context;
mana_gd_init_req_hdr(&req.hdr, GDMA_CREATE_MR, sizeof(req),
sizeof(resp));
@@ -56,7 +53,8 @@ static int mana_ib_gd_create_mr(struct mana_ib_dev *mib_dev,
return -EINVAL;
}
- err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
+ err = mana_gd_send_request(mib_dev->gc, sizeof(req), &req,
+ sizeof(resp), &resp);
if (err || resp.hdr.status) {
ibdev_dbg(&mib_dev->ib_dev, "Failed to create mr %d, %u", err,
@@ -77,22 +75,19 @@ static int mana_ib_gd_create_mr(struct mana_ib_dev *mib_dev,
static int mana_ib_gd_destroy_mr(struct mana_ib_dev *mib_dev, u64 mr_handle)
{
struct gdma_destroy_mr_response resp = {};
- struct gdma_dev *mdev = mib_dev->gdma_dev;
struct gdma_destroy_mr_request req = {};
- struct gdma_context *gc;
int err;
- gc = mdev->gdma_context;
-
mana_gd_init_req_hdr(&req.hdr, GDMA_DESTROY_MR, sizeof(req),
sizeof(resp));
req.mr_handle = mr_handle;
- err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
+ err = mana_gd_send_request(mib_dev->gc, sizeof(req), &req,
+ sizeof(resp), &resp);
if (err || resp.hdr.status) {
- dev_err(gc->dev, "Failed to destroy MR: %d, 0x%x\n", err,
- resp.hdr.status);
+ dev_err(mib_dev->gc->dev, "Failed to destroy MR: %d, 0x%x\n",
+ err, resp.hdr.status);
if (!err)
err = -EPROTO;
return err;
diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
index 2e3a57123ed7..874cfd794825 100644
--- a/drivers/infiniband/hw/mana/qp.c
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -21,7 +21,7 @@ static int mana_ib_cfg_vport_steering(struct mana_ib_dev *mib_dev,
u32 req_buf_size;
int i, err;
- mdev = mib_dev->gdma_dev;
+ mdev = &mib_dev->gc->mana;
gc = mdev->gdma_context;
req_buf_size =
@@ -102,7 +102,7 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
struct ib_rwq_ind_table *ind_tbl = attr->rwq_ind_tbl;
struct mana_ib_create_qp_rss_resp resp = {};
struct mana_ib_create_qp_rss ucmd = {};
- struct gdma_dev *gd = mib_dev->gdma_dev;
+ struct gdma_dev *gd = &mib_dev->gc->mana;
mana_handle_t *mana_ind_table;
struct mana_port_context *mpc;
struct mana_context *mc;
@@ -267,7 +267,7 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
rdma_udata_to_drv_context(udata, struct mana_ib_ucontext,
ibucontext);
struct mana_ib_create_qp_resp resp = {};
- struct gdma_dev *gd = mib_dev->gdma_dev;
+ struct gdma_dev *gd = &mib_dev->gc->mana;
struct mana_ib_create_qp ucmd = {};
struct mana_obj_spec wq_spec = {};
struct mana_obj_spec cq_spec = {};
@@ -437,7 +437,7 @@ static int mana_ib_destroy_qp_rss(struct mana_ib_qp *qp,
{
struct mana_ib_dev *mib_dev =
container_of(qp->ibqp.device, struct mana_ib_dev, ib_dev);
- struct gdma_dev *gd = mib_dev->gdma_dev;
+ struct gdma_dev *gd = &mib_dev->gc->mana;
struct mana_port_context *mpc;
struct mana_context *mc;
struct net_device *ndev;
@@ -464,7 +464,7 @@ static int mana_ib_destroy_qp_raw(struct mana_ib_qp *qp, struct ib_udata *udata)
{
struct mana_ib_dev *mib_dev =
container_of(qp->ibqp.device, struct mana_ib_dev, ib_dev);
- struct gdma_dev *gd = mib_dev->gdma_dev;
+ struct gdma_dev *gd = &mib_dev->gc->mana;
struct ib_pd *ibpd = qp->ibqp.pd;
struct mana_port_context *mpc;
struct mana_context *mc;
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 8f3f78b68592..9fa7a2d6c2b2 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -139,6 +139,9 @@ static int mana_gd_detect_devices(struct pci_dev *pdev)
if (dev_type == GDMA_DEVICE_MANA) {
gc->mana.gdma_context = gc;
gc->mana.dev_id = dev;
+ } else if (dev_type == GDMA_DEVICE_MANA_IB) {
+ gc->mana_ib.dev_id = dev;
+ gc->mana_ib.gdma_context = gc;
}
}
@@ -940,6 +943,7 @@ int mana_gd_register_device(struct gdma_dev *gd)
return 0;
}
+EXPORT_SYMBOL(mana_gd_register_device);
int mana_gd_deregister_device(struct gdma_dev *gd)
{
@@ -970,6 +974,7 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
return err;
}
+EXPORT_SYMBOL(mana_gd_deregister_device);
u32 mana_gd_wq_avail_space(struct gdma_queue *wq)
{
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 96c120160f15..e2b212dd722b 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -63,6 +63,7 @@ enum {
GDMA_DEVICE_NONE = 0,
GDMA_DEVICE_HWC = 1,
GDMA_DEVICE_MANA = 2,
+ GDMA_DEVICE_MANA_IB = 3,
};
struct gdma_resource {
@@ -384,6 +385,8 @@ struct gdma_context {
/* Azure network adapter */
struct gdma_dev mana;
+ /* rdma device */
+ struct gdma_dev mana_ib;
};
#define MAX_NUM_GDMA_DEVICES 4
--
2.25.1
^ permalink raw reply related
* [Patch v2 1/5] RDMA/mana-ib : Rename all mana_ib_dev type variables to mib_dev
From: sharmaajay @ 2023-07-26 3:56 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky, Dexuan Cui, Wei Liu,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: linux-rdma, linux-hyperv, netdev, linux-kernel, Ajay Sharma
In-Reply-To: <1690343820-20188-1-git-send-email-sharmaajay@linuxonhyperv.com>
From: Ajay Sharma <sharmaajay@microsoft.com>
Renaming all mana_ib_dev type variables to mib_dev to have
clean separation between eth dev and ibdev variables.
Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com>
---
drivers/infiniband/hw/mana/cq.c | 12 ++--
drivers/infiniband/hw/mana/device.c | 34 +++++------
drivers/infiniband/hw/mana/main.c | 87 ++++++++++++++--------------
drivers/infiniband/hw/mana/mana_ib.h | 9 +--
drivers/infiniband/hw/mana/mr.c | 29 +++++-----
drivers/infiniband/hw/mana/qp.c | 82 +++++++++++++-------------
drivers/infiniband/hw/mana/wq.c | 21 +++----
7 files changed, 140 insertions(+), 134 deletions(-)
diff --git a/drivers/infiniband/hw/mana/cq.c b/drivers/infiniband/hw/mana/cq.c
index d141cab8a1e6..1aed4e6360ba 100644
--- a/drivers/infiniband/hw/mana/cq.c
+++ b/drivers/infiniband/hw/mana/cq.c
@@ -11,10 +11,10 @@ int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
struct mana_ib_cq *cq = container_of(ibcq, struct mana_ib_cq, ibcq);
struct ib_device *ibdev = ibcq->device;
struct mana_ib_create_cq ucmd = {};
- struct mana_ib_dev *mdev;
+ struct mana_ib_dev *mib_dev;
int err;
- mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+ mib_dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
if (udata->inlen < sizeof(ucmd))
return -EINVAL;
@@ -41,7 +41,7 @@ int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
return err;
}
- err = mana_ib_gd_create_dma_region(mdev, cq->umem, &cq->gdma_region);
+ err = mana_ib_gd_create_dma_region(mib_dev, cq->umem, &cq->gdma_region);
if (err) {
ibdev_dbg(ibdev,
"Failed to create dma region for create cq, %d\n",
@@ -68,11 +68,11 @@ int mana_ib_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
{
struct mana_ib_cq *cq = container_of(ibcq, struct mana_ib_cq, ibcq);
struct ib_device *ibdev = ibcq->device;
- struct mana_ib_dev *mdev;
+ struct mana_ib_dev *mib_dev;
- mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+ mib_dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
- mana_ib_gd_destroy_dma_region(mdev, cq->gdma_region);
+ mana_ib_gd_destroy_dma_region(mib_dev, cq->gdma_region);
ib_umem_release(cq->umem);
return 0;
diff --git a/drivers/infiniband/hw/mana/device.c b/drivers/infiniband/hw/mana/device.c
index d4541b8707e4..083f27246ba8 100644
--- a/drivers/infiniband/hw/mana/device.c
+++ b/drivers/infiniband/hw/mana/device.c
@@ -51,51 +51,51 @@ static int mana_ib_probe(struct auxiliary_device *adev,
{
struct mana_adev *madev = container_of(adev, struct mana_adev, adev);
struct gdma_dev *mdev = madev->mdev;
+ struct mana_ib_dev *mib_dev;
struct mana_context *mc;
- struct mana_ib_dev *dev;
int ret;
mc = mdev->driver_data;
- dev = ib_alloc_device(mana_ib_dev, ib_dev);
- if (!dev)
+ mib_dev = ib_alloc_device(mana_ib_dev, ib_dev);
+ if (!mib_dev)
return -ENOMEM;
- ib_set_device_ops(&dev->ib_dev, &mana_ib_dev_ops);
+ ib_set_device_ops(&mib_dev->ib_dev, &mana_ib_dev_ops);
- dev->ib_dev.phys_port_cnt = mc->num_ports;
+ mib_dev->ib_dev.phys_port_cnt = mc->num_ports;
- ibdev_dbg(&dev->ib_dev, "mdev=%p id=%d num_ports=%d\n", mdev,
- mdev->dev_id.as_uint32, dev->ib_dev.phys_port_cnt);
+ ibdev_dbg(&mib_dev->ib_dev, "mdev=%p id=%d num_ports=%d\n", mdev,
+ mdev->dev_id.as_uint32, mib_dev->ib_dev.phys_port_cnt);
- dev->gdma_dev = mdev;
- dev->ib_dev.node_type = RDMA_NODE_IB_CA;
+ mib_dev->gdma_dev = mdev;
+ mib_dev->ib_dev.node_type = RDMA_NODE_IB_CA;
/*
* num_comp_vectors needs to set to the max MSIX index
* when interrupts and event queues are implemented
*/
- dev->ib_dev.num_comp_vectors = 1;
- dev->ib_dev.dev.parent = mdev->gdma_context->dev;
+ mib_dev->ib_dev.num_comp_vectors = 1;
+ mib_dev->ib_dev.dev.parent = mdev->gdma_context->dev;
- ret = ib_register_device(&dev->ib_dev, "mana_%d",
+ ret = ib_register_device(&mib_dev->ib_dev, "mana_%d",
mdev->gdma_context->dev);
if (ret) {
- ib_dealloc_device(&dev->ib_dev);
+ ib_dealloc_device(&mib_dev->ib_dev);
return ret;
}
- dev_set_drvdata(&adev->dev, dev);
+ dev_set_drvdata(&adev->dev, mib_dev);
return 0;
}
static void mana_ib_remove(struct auxiliary_device *adev)
{
- struct mana_ib_dev *dev = dev_get_drvdata(&adev->dev);
+ struct mana_ib_dev *mib_dev = dev_get_drvdata(&adev->dev);
- ib_unregister_device(&dev->ib_dev);
- ib_dealloc_device(&dev->ib_dev);
+ ib_unregister_device(&mib_dev->ib_dev);
+ ib_dealloc_device(&mib_dev->ib_dev);
}
static const struct auxiliary_device_id mana_id_table[] = {
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index 7be4c3adb4e2..189e774cdab6 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -5,10 +5,10 @@
#include "mana_ib.h"
-void mana_ib_uncfg_vport(struct mana_ib_dev *dev, struct mana_ib_pd *pd,
+void mana_ib_uncfg_vport(struct mana_ib_dev *mib_dev, struct mana_ib_pd *pd,
u32 port)
{
- struct gdma_dev *gd = dev->gdma_dev;
+ struct gdma_dev *gd = mib_dev->gdma_dev;
struct mana_port_context *mpc;
struct net_device *ndev;
struct mana_context *mc;
@@ -28,10 +28,11 @@ void mana_ib_uncfg_vport(struct mana_ib_dev *dev, struct mana_ib_pd *pd,
mutex_unlock(&pd->vport_mutex);
}
-int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, struct mana_ib_pd *pd,
+int mana_ib_cfg_vport(struct mana_ib_dev *mib_dev, u32 port,
+ struct mana_ib_pd *pd,
u32 doorbell_id)
{
- struct gdma_dev *mdev = dev->gdma_dev;
+ struct gdma_dev *mdev = mib_dev->gdma_dev;
struct mana_port_context *mpc;
struct mana_context *mc;
struct net_device *ndev;
@@ -45,7 +46,7 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, struct mana_ib_pd *pd,
pd->vport_use_count++;
if (pd->vport_use_count > 1) {
- ibdev_dbg(&dev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Skip as this PD is already configured vport\n");
mutex_unlock(&pd->vport_mutex);
return 0;
@@ -56,7 +57,8 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, struct mana_ib_pd *pd,
pd->vport_use_count--;
mutex_unlock(&pd->vport_mutex);
- ibdev_dbg(&dev->ib_dev, "Failed to configure vPort %d\n", err);
+ ibdev_dbg(&mib_dev->ib_dev, "Failed to configure vPort %d\n",
+ err);
return err;
}
@@ -65,7 +67,7 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, struct mana_ib_pd *pd,
pd->tx_shortform_allowed = mpc->tx_shortform_allowed;
pd->tx_vp_offset = mpc->tx_vp_offset;
- ibdev_dbg(&dev->ib_dev, "vport handle %llx pdid %x doorbell_id %x\n",
+ ibdev_dbg(&mib_dev->ib_dev, "vport handle %llx pdid %x doorbell_id %x\n",
mpc->port_handle, pd->pdn, doorbell_id);
return 0;
@@ -78,12 +80,12 @@ int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
struct gdma_create_pd_resp resp = {};
struct gdma_create_pd_req req = {};
enum gdma_pd_flags flags = 0;
- struct mana_ib_dev *dev;
+ struct mana_ib_dev *mib_dev;
struct gdma_dev *mdev;
int err;
- dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
- mdev = dev->gdma_dev;
+ mib_dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+ mdev = mib_dev->gdma_dev;
mana_gd_init_req_hdr(&req.hdr, GDMA_CREATE_PD, sizeof(req),
sizeof(resp));
@@ -93,7 +95,7 @@ int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
sizeof(resp), &resp);
if (err || resp.hdr.status) {
- ibdev_dbg(&dev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Failed to get pd_id err %d status %u\n", err,
resp.hdr.status);
if (!err)
@@ -104,7 +106,7 @@ int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
pd->pd_handle = resp.pd_handle;
pd->pdn = resp.pd_id;
- ibdev_dbg(&dev->ib_dev, "pd_handle 0x%llx pd_id %d\n",
+ ibdev_dbg(&mib_dev->ib_dev, "pd_handle 0x%llx pd_id %d\n",
pd->pd_handle, pd->pdn);
mutex_init(&pd->vport_mutex);
@@ -118,12 +120,12 @@ int mana_ib_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
struct ib_device *ibdev = ibpd->device;
struct gdma_destory_pd_resp resp = {};
struct gdma_destroy_pd_req req = {};
- struct mana_ib_dev *dev;
+ struct mana_ib_dev *mib_dev;
struct gdma_dev *mdev;
int err;
- dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
- mdev = dev->gdma_dev;
+ mib_dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+ mdev = mib_dev->gdma_dev;
mana_gd_init_req_hdr(&req.hdr, GDMA_DESTROY_PD, sizeof(req),
sizeof(resp));
@@ -133,7 +135,7 @@ int mana_ib_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
sizeof(resp), &resp);
if (err || resp.hdr.status) {
- ibdev_dbg(&dev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Failed to destroy pd_handle 0x%llx err %d status %u",
pd->pd_handle, err, resp.hdr.status);
if (!err)
@@ -204,14 +206,14 @@ int mana_ib_alloc_ucontext(struct ib_ucontext *ibcontext,
struct mana_ib_ucontext *ucontext =
container_of(ibcontext, struct mana_ib_ucontext, ibucontext);
struct ib_device *ibdev = ibcontext->device;
- struct mana_ib_dev *mdev;
+ struct mana_ib_dev *mib_dev;
struct gdma_context *gc;
struct gdma_dev *dev;
int doorbell_page;
int ret;
- mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
- dev = mdev->gdma_dev;
+ mib_dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+ dev = mib_dev->gdma_dev;
gc = dev->gdma_context;
/* Allocate a doorbell page index */
@@ -233,12 +235,12 @@ void mana_ib_dealloc_ucontext(struct ib_ucontext *ibcontext)
struct mana_ib_ucontext *mana_ucontext =
container_of(ibcontext, struct mana_ib_ucontext, ibucontext);
struct ib_device *ibdev = ibcontext->device;
- struct mana_ib_dev *mdev;
+ struct mana_ib_dev *mib_dev;
struct gdma_context *gc;
int ret;
- mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
- gc = mdev->gdma_dev->gdma_context;
+ mib_dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+ gc = mib_dev->gdma_dev->gdma_context;
ret = mana_gd_destroy_doorbell_page(gc, mana_ucontext->doorbell);
if (ret)
@@ -246,7 +248,7 @@ void mana_ib_dealloc_ucontext(struct ib_ucontext *ibcontext)
}
static int
-mana_ib_gd_first_dma_region(struct mana_ib_dev *dev,
+mana_ib_gd_first_dma_region(struct mana_ib_dev *mib_dev,
struct gdma_context *gc,
struct gdma_create_dma_region_req *create_req,
size_t num_pages, mana_handle_t *gdma_region,
@@ -263,7 +265,7 @@ mana_ib_gd_first_dma_region(struct mana_ib_dev *dev,
err = mana_gd_send_request(gc, create_req_msg_size, create_req,
sizeof(create_resp), &create_resp);
if (err || create_resp.hdr.status != expected_status) {
- ibdev_dbg(&dev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Failed to create DMA region: %d, 0x%x\n",
err, create_resp.hdr.status);
if (!err)
@@ -273,14 +275,14 @@ mana_ib_gd_first_dma_region(struct mana_ib_dev *dev,
}
*gdma_region = create_resp.dma_region_handle;
- ibdev_dbg(&dev->ib_dev, "Created DMA region handle 0x%llx\n",
+ ibdev_dbg(&mib_dev->ib_dev, "Created DMA region handle 0x%llx\n",
*gdma_region);
return 0;
}
static int
-mana_ib_gd_add_dma_region(struct mana_ib_dev *dev, struct gdma_context *gc,
+mana_ib_gd_add_dma_region(struct mana_ib_dev *mib_dev, struct gdma_context *gc,
struct gdma_dma_region_add_pages_req *add_req,
unsigned int num_pages, u32 expected_status)
{
@@ -296,7 +298,7 @@ mana_ib_gd_add_dma_region(struct mana_ib_dev *dev, struct gdma_context *gc,
err = mana_gd_send_request(gc, add_req_msg_size, add_req,
sizeof(add_resp), &add_resp);
if (err || add_resp.hdr.status != expected_status) {
- ibdev_dbg(&dev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Failed to create DMA region: %d, 0x%x\n",
err, add_resp.hdr.status);
@@ -309,7 +311,8 @@ mana_ib_gd_add_dma_region(struct mana_ib_dev *dev, struct gdma_context *gc,
return 0;
}
-int mana_ib_gd_create_dma_region(struct mana_ib_dev *dev, struct ib_umem *umem,
+int mana_ib_gd_create_dma_region(struct mana_ib_dev *mib_dev,
+ struct ib_umem *umem,
mana_handle_t *gdma_region)
{
struct gdma_dma_region_add_pages_req *add_req = NULL;
@@ -329,14 +332,14 @@ int mana_ib_gd_create_dma_region(struct mana_ib_dev *dev, struct ib_umem *umem,
void *request_buf;
int err;
- mdev = dev->gdma_dev;
+ mdev = mib_dev->gdma_dev;
gc = mdev->gdma_context;
hwc = gc->hwc.driver_data;
/* Hardware requires dma region to align to chosen page size */
page_sz = ib_umem_find_best_pgsz(umem, PAGE_SZ_BM, 0);
if (!page_sz) {
- ibdev_dbg(&dev->ib_dev, "failed to find page size.\n");
+ ibdev_dbg(&mib_dev->ib_dev, "failed to find page size.\n");
return -ENOMEM;
}
num_pages_total = ib_umem_num_dma_blocks(umem, page_sz);
@@ -362,13 +365,13 @@ int mana_ib_gd_create_dma_region(struct mana_ib_dev *dev, struct ib_umem *umem,
create_req->gdma_page_type = order_base_2(page_sz) - PAGE_SHIFT;
create_req->page_count = num_pages_total;
- ibdev_dbg(&dev->ib_dev, "size_dma_region %lu num_pages_total %lu\n",
+ ibdev_dbg(&mib_dev->ib_dev, "size_dma_region %lu num_pages_total %lu\n",
umem->length, num_pages_total);
- ibdev_dbg(&dev->ib_dev, "page_sz %lu offset_in_page %u\n",
+ ibdev_dbg(&mib_dev->ib_dev, "page_sz %lu offset_in_page %u\n",
page_sz, create_req->offset_in_page);
- ibdev_dbg(&dev->ib_dev, "num_pages_to_handle %lu, gdma_page_type %u",
+ ibdev_dbg(&mib_dev->ib_dev, "num_pages_to_handle %lu, gdma_page_type %u",
num_pages_to_handle, create_req->gdma_page_type);
page_addr_list = create_req->page_addr_list;
@@ -385,7 +388,7 @@ int mana_ib_gd_create_dma_region(struct mana_ib_dev *dev, struct ib_umem *umem,
if (!num_pages_processed) {
/* First create message */
- err = mana_ib_gd_first_dma_region(dev, gc, create_req,
+ err = mana_ib_gd_first_dma_region(mib_dev, gc, create_req,
tail, gdma_region,
expected_status);
if (err)
@@ -400,7 +403,7 @@ int mana_ib_gd_create_dma_region(struct mana_ib_dev *dev, struct ib_umem *umem,
page_addr_list = add_req->page_addr_list;
} else {
/* Subsequent create messages */
- err = mana_ib_gd_add_dma_region(dev, gc, add_req, tail,
+ err = mana_ib_gd_add_dma_region(mib_dev, gc, add_req, tail,
expected_status);
if (err)
break;
@@ -417,20 +420,20 @@ int mana_ib_gd_create_dma_region(struct mana_ib_dev *dev, struct ib_umem *umem,
}
if (err)
- mana_ib_gd_destroy_dma_region(dev, *gdma_region);
+ mana_ib_gd_destroy_dma_region(mib_dev, *gdma_region);
out:
kfree(request_buf);
return err;
}
-int mana_ib_gd_destroy_dma_region(struct mana_ib_dev *dev, u64 gdma_region)
+int mana_ib_gd_destroy_dma_region(struct mana_ib_dev *mib_dev, u64 gdma_region)
{
- struct gdma_dev *mdev = dev->gdma_dev;
+ struct gdma_dev *mdev = mib_dev->gdma_dev;
struct gdma_context *gc;
gc = mdev->gdma_context;
- ibdev_dbg(&dev->ib_dev, "destroy dma region 0x%llx\n", gdma_region);
+ ibdev_dbg(&mib_dev->ib_dev, "destroy dma region 0x%llx\n", gdma_region);
return mana_gd_destroy_dma_region(gc, gdma_region);
}
@@ -440,14 +443,14 @@ int mana_ib_mmap(struct ib_ucontext *ibcontext, struct vm_area_struct *vma)
struct mana_ib_ucontext *mana_ucontext =
container_of(ibcontext, struct mana_ib_ucontext, ibucontext);
struct ib_device *ibdev = ibcontext->device;
- struct mana_ib_dev *mdev;
+ struct mana_ib_dev *mib_dev;
struct gdma_context *gc;
phys_addr_t pfn;
pgprot_t prot;
int ret;
- mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
- gc = mdev->gdma_dev->gdma_context;
+ mib_dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+ gc = mib_dev->gdma_dev->gdma_context;
if (vma->vm_pgoff != 0) {
ibdev_dbg(ibdev, "Unexpected vm_pgoff %lu\n", vma->vm_pgoff);
diff --git a/drivers/infiniband/hw/mana/mana_ib.h b/drivers/infiniband/hw/mana/mana_ib.h
index 502cc8672eef..ee4efd0af278 100644
--- a/drivers/infiniband/hw/mana/mana_ib.h
+++ b/drivers/infiniband/hw/mana/mana_ib.h
@@ -92,10 +92,11 @@ struct mana_ib_rwq_ind_table {
struct ib_rwq_ind_table ib_ind_table;
};
-int mana_ib_gd_create_dma_region(struct mana_ib_dev *dev, struct ib_umem *umem,
+int mana_ib_gd_create_dma_region(struct mana_ib_dev *mib_dev,
+ struct ib_umem *umem,
mana_handle_t *gdma_region);
-int mana_ib_gd_destroy_dma_region(struct mana_ib_dev *dev,
+int mana_ib_gd_destroy_dma_region(struct mana_ib_dev *mib_dev,
mana_handle_t gdma_region);
struct ib_wq *mana_ib_create_wq(struct ib_pd *pd,
@@ -129,9 +130,9 @@ int mana_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
int mana_ib_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata);
-int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port_id,
+int mana_ib_cfg_vport(struct mana_ib_dev *mib_dev, u32 port_id,
struct mana_ib_pd *pd, u32 doorbell_id);
-void mana_ib_uncfg_vport(struct mana_ib_dev *dev, struct mana_ib_pd *pd,
+void mana_ib_uncfg_vport(struct mana_ib_dev *mib_dev, struct mana_ib_pd *pd,
u32 port);
int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
diff --git a/drivers/infiniband/hw/mana/mr.c b/drivers/infiniband/hw/mana/mr.c
index 351207c60eb6..f6a53906204d 100644
--- a/drivers/infiniband/hw/mana/mr.c
+++ b/drivers/infiniband/hw/mana/mr.c
@@ -25,12 +25,13 @@ mana_ib_verbs_to_gdma_access_flags(int access_flags)
return flags;
}
-static int mana_ib_gd_create_mr(struct mana_ib_dev *dev, struct mana_ib_mr *mr,
+static int mana_ib_gd_create_mr(struct mana_ib_dev *mib_dev,
+ struct mana_ib_mr *mr,
struct gdma_create_mr_params *mr_params)
{
+ struct gdma_dev *mdev = mib_dev->gdma_dev;
struct gdma_create_mr_response resp = {};
struct gdma_create_mr_request req = {};
- struct gdma_dev *mdev = dev->gdma_dev;
struct gdma_context *gc;
int err;
@@ -49,7 +50,7 @@ static int mana_ib_gd_create_mr(struct mana_ib_dev *dev, struct mana_ib_mr *mr,
break;
default:
- ibdev_dbg(&dev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"invalid param (GDMA_MR_TYPE) passed, type %d\n",
req.mr_type);
return -EINVAL;
@@ -58,7 +59,7 @@ static int mana_ib_gd_create_mr(struct mana_ib_dev *dev, struct mana_ib_mr *mr,
err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
if (err || resp.hdr.status) {
- ibdev_dbg(&dev->ib_dev, "Failed to create mr %d, %u", err,
+ ibdev_dbg(&mib_dev->ib_dev, "Failed to create mr %d, %u", err,
resp.hdr.status);
if (!err)
err = -EPROTO;
@@ -73,11 +74,11 @@ static int mana_ib_gd_create_mr(struct mana_ib_dev *dev, struct mana_ib_mr *mr,
return 0;
}
-static int mana_ib_gd_destroy_mr(struct mana_ib_dev *dev, u64 mr_handle)
+static int mana_ib_gd_destroy_mr(struct mana_ib_dev *mib_dev, u64 mr_handle)
{
struct gdma_destroy_mr_response resp = {};
+ struct gdma_dev *mdev = mib_dev->gdma_dev;
struct gdma_destroy_mr_request req = {};
- struct gdma_dev *mdev = dev->gdma_dev;
struct gdma_context *gc;
int err;
@@ -107,12 +108,12 @@ struct ib_mr *mana_ib_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 length,
struct mana_ib_pd *pd = container_of(ibpd, struct mana_ib_pd, ibpd);
struct gdma_create_mr_params mr_params = {};
struct ib_device *ibdev = ibpd->device;
- struct mana_ib_dev *dev;
+ struct mana_ib_dev *mib_dev;
struct mana_ib_mr *mr;
u64 dma_region_handle;
int err;
- dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+ mib_dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
ibdev_dbg(ibdev,
"start 0x%llx, iova 0x%llx length 0x%llx access_flags 0x%x",
@@ -133,7 +134,7 @@ struct ib_mr *mana_ib_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 length,
goto err_free;
}
- err = mana_ib_gd_create_dma_region(dev, mr->umem, &dma_region_handle);
+ err = mana_ib_gd_create_dma_region(mib_dev, mr->umem, &dma_region_handle);
if (err) {
ibdev_dbg(ibdev, "Failed create dma region for user-mr, %d\n",
err);
@@ -151,7 +152,7 @@ struct ib_mr *mana_ib_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 length,
mr_params.gva.access_flags =
mana_ib_verbs_to_gdma_access_flags(access_flags);
- err = mana_ib_gd_create_mr(dev, mr, &mr_params);
+ err = mana_ib_gd_create_mr(mib_dev, mr, &mr_params);
if (err)
goto err_dma_region;
@@ -164,7 +165,7 @@ struct ib_mr *mana_ib_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 length,
return &mr->ibmr;
err_dma_region:
- mana_gd_destroy_dma_region(dev->gdma_dev->gdma_context,
+ mana_gd_destroy_dma_region(mib_dev->gdma_dev->gdma_context,
dma_region_handle);
err_umem:
@@ -179,12 +180,12 @@ int mana_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
{
struct mana_ib_mr *mr = container_of(ibmr, struct mana_ib_mr, ibmr);
struct ib_device *ibdev = ibmr->device;
- struct mana_ib_dev *dev;
+ struct mana_ib_dev *mib_dev;
int err;
- dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+ mib_dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
- err = mana_ib_gd_destroy_mr(dev, mr->mr_handle);
+ err = mana_ib_gd_destroy_mr(mib_dev, mr->mr_handle);
if (err)
return err;
diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
index 4b3b5b274e84..2e3a57123ed7 100644
--- a/drivers/infiniband/hw/mana/qp.c
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -5,7 +5,7 @@
#include "mana_ib.h"
-static int mana_ib_cfg_vport_steering(struct mana_ib_dev *dev,
+static int mana_ib_cfg_vport_steering(struct mana_ib_dev *mib_dev,
struct net_device *ndev,
mana_handle_t default_rxobj,
mana_handle_t ind_table[],
@@ -21,7 +21,7 @@ static int mana_ib_cfg_vport_steering(struct mana_ib_dev *dev,
u32 req_buf_size;
int i, err;
- mdev = dev->gdma_dev;
+ mdev = mib_dev->gdma_dev;
gc = mdev->gdma_context;
req_buf_size =
@@ -55,10 +55,10 @@ static int mana_ib_cfg_vport_steering(struct mana_ib_dev *dev,
* MANA_INDIRECT_TABLE_SIZE entries. Adjust the verb
* ind_table to MANA_INDIRECT_TABLE_SIZE if required
*/
- ibdev_dbg(&dev->ib_dev, "ind table size %u\n", 1 << log_ind_tbl_size);
+ ibdev_dbg(&mib_dev->ib_dev, "ind table size %u\n", 1 << log_ind_tbl_size);
for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++) {
req_indir_tab[i] = ind_table[i % (1 << log_ind_tbl_size)];
- ibdev_dbg(&dev->ib_dev, "index %u handle 0x%llx\n", i,
+ ibdev_dbg(&mib_dev->ib_dev, "index %u handle 0x%llx\n", i,
req_indir_tab[i]);
}
@@ -68,7 +68,7 @@ static int mana_ib_cfg_vport_steering(struct mana_ib_dev *dev,
else
netdev_rss_key_fill(req->hashkey, MANA_HASH_KEY_SIZE);
- ibdev_dbg(&dev->ib_dev, "vport handle %llu default_rxobj 0x%llx\n",
+ ibdev_dbg(&mib_dev->ib_dev, "vport handle %llu default_rxobj 0x%llx\n",
req->vport, default_rxobj);
err = mana_gd_send_request(gc, req_buf_size, req, sizeof(resp), &resp);
@@ -97,12 +97,12 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
struct ib_udata *udata)
{
struct mana_ib_qp *qp = container_of(ibqp, struct mana_ib_qp, ibqp);
- struct mana_ib_dev *mdev =
+ struct mana_ib_dev *mib_dev =
container_of(pd->device, struct mana_ib_dev, ib_dev);
struct ib_rwq_ind_table *ind_tbl = attr->rwq_ind_tbl;
struct mana_ib_create_qp_rss_resp resp = {};
struct mana_ib_create_qp_rss ucmd = {};
- struct gdma_dev *gd = mdev->gdma_dev;
+ struct gdma_dev *gd = mib_dev->gdma_dev;
mana_handle_t *mana_ind_table;
struct mana_port_context *mpc;
struct mana_context *mc;
@@ -123,21 +123,21 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
ret = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));
if (ret) {
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Failed copy from udata for create rss-qp, err %d\n",
ret);
return ret;
}
if (attr->cap.max_recv_wr > MAX_SEND_BUFFERS_PER_QUEUE) {
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Requested max_recv_wr %d exceeding limit\n",
attr->cap.max_recv_wr);
return -EINVAL;
}
if (attr->cap.max_recv_sge > MAX_RX_WQE_SGL_ENTRIES) {
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Requested max_recv_sge %d exceeding limit\n",
attr->cap.max_recv_sge);
return -EINVAL;
@@ -145,14 +145,14 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
ind_tbl_size = 1 << ind_tbl->log_ind_tbl_size;
if (ind_tbl_size > MANA_INDIRECT_TABLE_SIZE) {
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Indirect table size %d exceeding limit\n",
ind_tbl_size);
return -EINVAL;
}
if (ucmd.rx_hash_function != MANA_IB_RX_HASH_FUNC_TOEPLITZ) {
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"RX Hash function is not supported, %d\n",
ucmd.rx_hash_function);
return -EINVAL;
@@ -161,14 +161,14 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
/* IB ports start with 1, MANA start with 0 */
port = ucmd.port;
if (port < 1 || port > mc->num_ports) {
- ibdev_dbg(&mdev->ib_dev, "Invalid port %u in creating qp\n",
+ ibdev_dbg(&mib_dev->ib_dev, "Invalid port %u in creating qp\n",
port);
return -EINVAL;
}
ndev = mc->ports[port - 1];
mpc = netdev_priv(ndev);
- ibdev_dbg(&mdev->ib_dev, "rx_hash_function %d port %d\n",
+ ibdev_dbg(&mib_dev->ib_dev, "rx_hash_function %d port %d\n",
ucmd.rx_hash_function, port);
mana_ind_table = kcalloc(ind_tbl_size, sizeof(mana_handle_t),
@@ -210,7 +210,7 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
wq->id = wq_spec.queue_index;
cq->id = cq_spec.queue_index;
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"ret %d rx_object 0x%llx wq id %llu cq id %llu\n",
ret, wq->rx_object, wq->id, cq->id);
@@ -221,7 +221,7 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
}
resp.num_entries = i;
- ret = mana_ib_cfg_vport_steering(mdev, ndev, wq->rx_object,
+ ret = mana_ib_cfg_vport_steering(mib_dev, ndev, wq->rx_object,
mana_ind_table,
ind_tbl->log_ind_tbl_size,
ucmd.rx_hash_key_len,
@@ -231,7 +231,7 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
ret = ib_copy_to_udata(udata, &resp, sizeof(resp));
if (ret) {
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Failed to copy to udata create rss-qp, %d\n",
ret);
goto fail;
@@ -259,7 +259,7 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
{
struct mana_ib_pd *pd = container_of(ibpd, struct mana_ib_pd, ibpd);
struct mana_ib_qp *qp = container_of(ibqp, struct mana_ib_qp, ibqp);
- struct mana_ib_dev *mdev =
+ struct mana_ib_dev *mib_dev =
container_of(ibpd->device, struct mana_ib_dev, ib_dev);
struct mana_ib_cq *send_cq =
container_of(attr->send_cq, struct mana_ib_cq, ibcq);
@@ -267,7 +267,7 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
rdma_udata_to_drv_context(udata, struct mana_ib_ucontext,
ibucontext);
struct mana_ib_create_qp_resp resp = {};
- struct gdma_dev *gd = mdev->gdma_dev;
+ struct gdma_dev *gd = mib_dev->gdma_dev;
struct mana_ib_create_qp ucmd = {};
struct mana_obj_spec wq_spec = {};
struct mana_obj_spec cq_spec = {};
@@ -285,7 +285,7 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));
if (err) {
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Failed to copy from udata create qp-raw, %d\n", err);
return err;
}
@@ -296,14 +296,14 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
return -EINVAL;
if (attr->cap.max_send_wr > MAX_SEND_BUFFERS_PER_QUEUE) {
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Requested max_send_wr %d exceeding limit\n",
attr->cap.max_send_wr);
return -EINVAL;
}
if (attr->cap.max_send_sge > MAX_TX_WQE_SGL_ENTRIES) {
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Requested max_send_sge %d exceeding limit\n",
attr->cap.max_send_sge);
return -EINVAL;
@@ -311,38 +311,38 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
ndev = mc->ports[port - 1];
mpc = netdev_priv(ndev);
- ibdev_dbg(&mdev->ib_dev, "port %u ndev %p mpc %p\n", port, ndev, mpc);
+ ibdev_dbg(&mib_dev->ib_dev, "port %u ndev %p mpc %p\n", port, ndev, mpc);
- err = mana_ib_cfg_vport(mdev, port - 1, pd, mana_ucontext->doorbell);
+ err = mana_ib_cfg_vport(mib_dev, port - 1, pd, mana_ucontext->doorbell);
if (err)
return -ENODEV;
qp->port = port;
- ibdev_dbg(&mdev->ib_dev, "ucmd sq_buf_addr 0x%llx port %u\n",
+ ibdev_dbg(&mib_dev->ib_dev, "ucmd sq_buf_addr 0x%llx port %u\n",
ucmd.sq_buf_addr, ucmd.port);
umem = ib_umem_get(ibpd->device, ucmd.sq_buf_addr, ucmd.sq_buf_size,
IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(umem)) {
err = PTR_ERR(umem);
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Failed to get umem for create qp-raw, err %d\n",
err);
goto err_free_vport;
}
qp->sq_umem = umem;
- err = mana_ib_gd_create_dma_region(mdev, qp->sq_umem,
+ err = mana_ib_gd_create_dma_region(mib_dev, qp->sq_umem,
&qp->sq_gdma_region);
if (err) {
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Failed to create dma region for create qp-raw, %d\n",
err);
goto err_release_umem;
}
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"mana_ib_gd_create_dma_region ret %d gdma_region 0x%llx\n",
err, qp->sq_gdma_region);
@@ -358,7 +358,7 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
err = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_SQ, &wq_spec,
&cq_spec, &qp->tx_object);
if (err) {
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Failed to create wq for create raw-qp, err %d\n",
err);
goto err_destroy_dma_region;
@@ -371,7 +371,7 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
qp->sq_id = wq_spec.queue_index;
send_cq->id = cq_spec.queue_index;
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"ret %d qp->tx_object 0x%llx sq id %llu cq id %llu\n", err,
qp->tx_object, qp->sq_id, send_cq->id);
@@ -381,7 +381,7 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
err = ib_copy_to_udata(udata, &resp, sizeof(resp));
if (err) {
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Failed copy udata for create qp-raw, %d\n",
err);
goto err_destroy_wq_obj;
@@ -393,13 +393,13 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
mana_destroy_wq_obj(mpc, GDMA_SQ, qp->tx_object);
err_destroy_dma_region:
- mana_ib_gd_destroy_dma_region(mdev, qp->sq_gdma_region);
+ mana_ib_gd_destroy_dma_region(mib_dev, qp->sq_gdma_region);
err_release_umem:
ib_umem_release(umem);
err_free_vport:
- mana_ib_uncfg_vport(mdev, pd, port - 1);
+ mana_ib_uncfg_vport(mib_dev, pd, port - 1);
return err;
}
@@ -435,9 +435,9 @@ static int mana_ib_destroy_qp_rss(struct mana_ib_qp *qp,
struct ib_rwq_ind_table *ind_tbl,
struct ib_udata *udata)
{
- struct mana_ib_dev *mdev =
+ struct mana_ib_dev *mib_dev =
container_of(qp->ibqp.device, struct mana_ib_dev, ib_dev);
- struct gdma_dev *gd = mdev->gdma_dev;
+ struct gdma_dev *gd = mib_dev->gdma_dev;
struct mana_port_context *mpc;
struct mana_context *mc;
struct net_device *ndev;
@@ -452,7 +452,7 @@ static int mana_ib_destroy_qp_rss(struct mana_ib_qp *qp,
for (i = 0; i < (1 << ind_tbl->log_ind_tbl_size); i++) {
ibwq = ind_tbl->ind_tbl[i];
wq = container_of(ibwq, struct mana_ib_wq, ibwq);
- ibdev_dbg(&mdev->ib_dev, "destroying wq->rx_object %llu\n",
+ ibdev_dbg(&mib_dev->ib_dev, "destroying wq->rx_object %llu\n",
wq->rx_object);
mana_destroy_wq_obj(mpc, GDMA_RQ, wq->rx_object);
}
@@ -462,9 +462,9 @@ static int mana_ib_destroy_qp_rss(struct mana_ib_qp *qp,
static int mana_ib_destroy_qp_raw(struct mana_ib_qp *qp, struct ib_udata *udata)
{
- struct mana_ib_dev *mdev =
+ struct mana_ib_dev *mib_dev =
container_of(qp->ibqp.device, struct mana_ib_dev, ib_dev);
- struct gdma_dev *gd = mdev->gdma_dev;
+ struct gdma_dev *gd = mib_dev->gdma_dev;
struct ib_pd *ibpd = qp->ibqp.pd;
struct mana_port_context *mpc;
struct mana_context *mc;
@@ -479,11 +479,11 @@ static int mana_ib_destroy_qp_raw(struct mana_ib_qp *qp, struct ib_udata *udata)
mana_destroy_wq_obj(mpc, GDMA_SQ, qp->tx_object);
if (qp->sq_umem) {
- mana_ib_gd_destroy_dma_region(mdev, qp->sq_gdma_region);
+ mana_ib_gd_destroy_dma_region(mib_dev, qp->sq_gdma_region);
ib_umem_release(qp->sq_umem);
}
- mana_ib_uncfg_vport(mdev, pd, qp->port - 1);
+ mana_ib_uncfg_vport(mib_dev, pd, qp->port - 1);
return 0;
}
diff --git a/drivers/infiniband/hw/mana/wq.c b/drivers/infiniband/hw/mana/wq.c
index 372d361510e0..56bc2b8b6690 100644
--- a/drivers/infiniband/hw/mana/wq.c
+++ b/drivers/infiniband/hw/mana/wq.c
@@ -9,7 +9,7 @@ struct ib_wq *mana_ib_create_wq(struct ib_pd *pd,
struct ib_wq_init_attr *init_attr,
struct ib_udata *udata)
{
- struct mana_ib_dev *mdev =
+ struct mana_ib_dev *mib_dev =
container_of(pd->device, struct mana_ib_dev, ib_dev);
struct mana_ib_create_wq ucmd = {};
struct mana_ib_wq *wq;
@@ -21,7 +21,7 @@ struct ib_wq *mana_ib_create_wq(struct ib_pd *pd,
err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));
if (err) {
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Failed to copy from udata for create wq, %d\n", err);
return ERR_PTR(err);
}
@@ -30,13 +30,14 @@ struct ib_wq *mana_ib_create_wq(struct ib_pd *pd,
if (!wq)
return ERR_PTR(-ENOMEM);
- ibdev_dbg(&mdev->ib_dev, "ucmd wq_buf_addr 0x%llx\n", ucmd.wq_buf_addr);
+ ibdev_dbg(&mib_dev->ib_dev, "ucmd wq_buf_addr 0x%llx\n",
+ ucmd.wq_buf_addr);
umem = ib_umem_get(pd->device, ucmd.wq_buf_addr, ucmd.wq_buf_size,
IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(umem)) {
err = PTR_ERR(umem);
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Failed to get umem for create wq, err %d\n", err);
goto err_free_wq;
}
@@ -46,15 +47,15 @@ struct ib_wq *mana_ib_create_wq(struct ib_pd *pd,
wq->wq_buf_size = ucmd.wq_buf_size;
wq->rx_object = INVALID_MANA_HANDLE;
- err = mana_ib_gd_create_dma_region(mdev, wq->umem, &wq->gdma_region);
+ err = mana_ib_gd_create_dma_region(mib_dev, wq->umem, &wq->gdma_region);
if (err) {
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"Failed to create dma region for create wq, %d\n",
err);
goto err_release_umem;
}
- ibdev_dbg(&mdev->ib_dev,
+ ibdev_dbg(&mib_dev->ib_dev,
"mana_ib_gd_create_dma_region ret %d gdma_region 0x%llx\n",
err, wq->gdma_region);
@@ -82,11 +83,11 @@ int mana_ib_destroy_wq(struct ib_wq *ibwq, struct ib_udata *udata)
{
struct mana_ib_wq *wq = container_of(ibwq, struct mana_ib_wq, ibwq);
struct ib_device *ib_dev = ibwq->device;
- struct mana_ib_dev *mdev;
+ struct mana_ib_dev *mib_dev;
- mdev = container_of(ib_dev, struct mana_ib_dev, ib_dev);
+ mib_dev = container_of(ib_dev, struct mana_ib_dev, ib_dev);
- mana_ib_gd_destroy_dma_region(mdev, wq->gdma_region);
+ mana_ib_gd_destroy_dma_region(mib_dev, wq->gdma_region);
ib_umem_release(wq->umem);
kfree(wq);
--
2.25.1
^ permalink raw reply related
* [Patch v2 3/5] RDMA/mana_ib : Add error eq and notification from SoC
From: sharmaajay @ 2023-07-26 3:56 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky, Dexuan Cui, Wei Liu,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: linux-rdma, linux-hyperv, netdev, linux-kernel, Ajay Sharma
In-Reply-To: <1690343820-20188-1-git-send-email-sharmaajay@linuxonhyperv.com>
From: Ajay Sharma <sharmaajay@microsoft.com>
Add error eq needed for adapter creation
and later used for notification from
Management SW.
Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com>
---
drivers/infiniband/hw/mana/device.c | 13 +-
drivers/infiniband/hw/mana/main.c | 44 ++++++
drivers/infiniband/hw/mana/mana_ib.h | 3 +
.../net/ethernet/microsoft/mana/gdma_main.c | 146 ++++++++++--------
drivers/net/ethernet/microsoft/mana/mana_en.c | 3 +
include/net/mana/gdma.h | 13 +-
6 files changed, 152 insertions(+), 70 deletions(-)
diff --git a/drivers/infiniband/hw/mana/device.c b/drivers/infiniband/hw/mana/device.c
index ea4c8c8fc10d..3ab4e69705df 100644
--- a/drivers/infiniband/hw/mana/device.c
+++ b/drivers/infiniband/hw/mana/device.c
@@ -68,7 +68,7 @@ static int mana_ib_probe(struct auxiliary_device *adev,
ibdev_dbg(&mib_dev->ib_dev, "mdev=%p id=%d num_ports=%d\n", mdev,
mdev->dev_id.as_uint32, mib_dev->ib_dev.phys_port_cnt);
- mib_dev->gdma_dev = mdev;
+ mib_dev->gc = mdev->gdma_context;
mib_dev->ib_dev.node_type = RDMA_NODE_IB_CA;
/*
@@ -85,15 +85,23 @@ static int mana_ib_probe(struct auxiliary_device *adev,
goto free_ib_device;
}
+ ret = mana_ib_create_error_eq(mib_dev);
+ if (ret) {
+ ibdev_err(&mib_dev->ib_dev, "Failed to allocate err eq");
+ goto deregister_device;
+ }
+
ret = ib_register_device(&mib_dev->ib_dev, "mana_%d",
mdev->gdma_context->dev);
if (ret)
- goto deregister_device;
+ goto free_error_eq;
dev_set_drvdata(&adev->dev, mib_dev);
return 0;
+free_error_eq:
+ mana_gd_destroy_queue(mib_dev->gc, mib_dev->fatal_err_eq);
deregister_device:
mana_gd_deregister_device(&mib_dev->gc->mana_ib);
free_ib_device:
@@ -105,6 +113,7 @@ static void mana_ib_remove(struct auxiliary_device *adev)
{
struct mana_ib_dev *mib_dev = dev_get_drvdata(&adev->dev);
+ mana_gd_destroy_queue(mib_dev->gc, mib_dev->fatal_err_eq);
mana_gd_deregister_device(&mib_dev->gc->mana_ib);
ib_unregister_device(&mib_dev->ib_dev);
ib_dealloc_device(&mib_dev->ib_dev);
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index 2c4e3c496644..2ea24ba3065f 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -504,3 +504,47 @@ int mana_ib_query_gid(struct ib_device *ibdev, u32 port, int index,
void mana_ib_disassociate_ucontext(struct ib_ucontext *ibcontext)
{
}
+
+void mana_ib_soc_event_handler(void *ctx, struct gdma_queue *queue,
+ struct gdma_event *event)
+{
+ struct mana_ib_dev *mib_dev = (struct mana_ib_dev *)ctx;
+
+ switch (event->type) {
+ case GDMA_EQE_SOC_EVENT_NOTIFICATION:
+ ibdev_info(&mib_dev->ib_dev, "Received SOC Notification");
+ break;
+ case GDMA_EQE_SOC_EVENT_TEST:
+ ibdev_info(&mib_dev->ib_dev, "Received SoC Test");
+ break;
+ default:
+ ibdev_err(&mib_dev->ib_dev, "Received unsolicited evt %d",
+ event->type);
+ }
+}
+
+int mana_ib_create_error_eq(struct mana_ib_dev *mib_dev)
+{
+ struct gdma_queue_spec spec = {};
+ int err;
+
+ spec.type = GDMA_EQ;
+ spec.monitor_avl_buf = false;
+ spec.queue_size = EQ_SIZE;
+ spec.eq.callback = mana_ib_soc_event_handler;
+ spec.eq.context = mib_dev;
+ spec.eq.log2_throttle_limit = LOG2_EQ_THROTTLE;
+ spec.eq.msix_allocated = true;
+ spec.eq.msix_index = 0;
+ spec.doorbell = mib_dev->gc->mana_ib.doorbell;
+ spec.pdid = mib_dev->gc->mana_ib.pdid;
+
+ err = mana_gd_create_mana_eq(&mib_dev->gc->mana_ib, &spec,
+ &mib_dev->fatal_err_eq);
+ if (err)
+ return err;
+
+ mib_dev->fatal_err_eq->eq.disable_needed = true;
+
+ return 0;
+}
diff --git a/drivers/infiniband/hw/mana/mana_ib.h b/drivers/infiniband/hw/mana/mana_ib.h
index 3a2ba6b96f15..4383777354d3 100644
--- a/drivers/infiniband/hw/mana/mana_ib.h
+++ b/drivers/infiniband/hw/mana/mana_ib.h
@@ -31,6 +31,7 @@ struct mana_ib_dev {
struct ib_device ib_dev;
struct gdma_dev *gdma_dev;
struct gdma_context *gc;
+ struct gdma_queue *fatal_err_eq;
};
struct mana_ib_wq {
@@ -161,4 +162,6 @@ int mana_ib_query_gid(struct ib_device *ibdev, u32 port, int index,
void mana_ib_disassociate_ucontext(struct ib_ucontext *ibcontext);
+int mana_ib_create_error_eq(struct mana_ib_dev *mib_dev);
+
#endif
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 9fa7a2d6c2b2..84faf4efcb75 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -185,7 +185,8 @@ void mana_gd_free_memory(struct gdma_mem_info *gmi)
}
static int mana_gd_create_hw_eq(struct gdma_context *gc,
- struct gdma_queue *queue)
+ struct gdma_queue *queue,
+ u32 doorbell, u32 pdid)
{
struct gdma_create_queue_resp resp = {};
struct gdma_create_queue_req req = {};
@@ -199,8 +200,8 @@ static int mana_gd_create_hw_eq(struct gdma_context *gc,
req.hdr.dev_id = queue->gdma_dev->dev_id;
req.type = queue->type;
- req.pdid = queue->gdma_dev->pdid;
- req.doolbell_id = queue->gdma_dev->doorbell;
+ req.pdid = pdid;
+ req.doolbell_id = doorbell;
req.gdma_region = queue->mem_info.dma_region_handle;
req.queue_size = queue->queue_size;
req.log2_throttle_limit = queue->eq.log2_throttle_limit;
@@ -371,53 +372,51 @@ static void mana_gd_process_eqe(struct gdma_queue *eq)
}
}
-static void mana_gd_process_eq_events(void *arg)
+static void mana_gd_process_eq_events(struct list_head *eq_list)
{
u32 owner_bits, new_bits, old_bits;
union gdma_eqe_info eqe_info;
struct gdma_eqe *eq_eqe_ptr;
- struct gdma_queue *eq = arg;
+ struct gdma_queue *eq;
struct gdma_context *gc;
struct gdma_eqe *eqe;
u32 head, num_eqe;
int i;
- gc = eq->gdma_dev->gdma_context;
-
- num_eqe = eq->queue_size / GDMA_EQE_SIZE;
- eq_eqe_ptr = eq->queue_mem_ptr;
-
- /* Process up to 5 EQEs at a time, and update the HW head. */
- for (i = 0; i < 5; i++) {
- eqe = &eq_eqe_ptr[eq->head % num_eqe];
- eqe_info.as_uint32 = eqe->eqe_info;
- owner_bits = eqe_info.owner_bits;
-
- old_bits = (eq->head / num_eqe - 1) & GDMA_EQE_OWNER_MASK;
- /* No more entries */
- if (owner_bits == old_bits)
- break;
-
- new_bits = (eq->head / num_eqe) & GDMA_EQE_OWNER_MASK;
- if (owner_bits != new_bits) {
- dev_err(gc->dev, "EQ %d: overflow detected\n", eq->id);
- break;
+ list_for_each_entry_rcu(eq, eq_list, entry) {
+ gc = eq->gdma_dev->gdma_context;
+
+ num_eqe = eq->queue_size / GDMA_EQE_SIZE;
+ eq_eqe_ptr = eq->queue_mem_ptr;
+ /* Process up to 5 EQEs at a time, and update the HW head. */
+ for (i = 0; i < 5; i++) {
+ eqe = &eq_eqe_ptr[eq->head % num_eqe];
+ eqe_info.as_uint32 = eqe->eqe_info;
+ owner_bits = eqe_info.owner_bits;
+
+ old_bits = (eq->head / num_eqe - 1) & GDMA_EQE_OWNER_MASK;
+ /* No more entries */
+ if (owner_bits == old_bits)
+ break;
+
+ new_bits = (eq->head / num_eqe) & GDMA_EQE_OWNER_MASK;
+ if (owner_bits != new_bits) {
+ dev_err(gc->dev, "EQ %d: overflow detected\n",
+ eq->id);
+ break;
+ }
+ /* Per GDMA spec, rmb is necessary after checking owner_bits, before
+ * reading eqe.
+ */
+ rmb();
+ mana_gd_process_eqe(eq);
+ eq->head++;
}
- /* Per GDMA spec, rmb is necessary after checking owner_bits, before
- * reading eqe.
- */
- rmb();
-
- mana_gd_process_eqe(eq);
-
- eq->head++;
+ head = eq->head % (num_eqe << GDMA_EQE_OWNER_BITS);
+ mana_gd_ring_doorbell(gc, eq->gdma_dev->doorbell, eq->type,
+ eq->id, head, SET_ARM_BIT);
}
-
- head = eq->head % (num_eqe << GDMA_EQE_OWNER_BITS);
-
- mana_gd_ring_doorbell(gc, eq->gdma_dev->doorbell, eq->type, eq->id,
- head, SET_ARM_BIT);
}
static int mana_gd_register_irq(struct gdma_queue *queue,
@@ -435,44 +434,47 @@ static int mana_gd_register_irq(struct gdma_queue *queue,
gc = gd->gdma_context;
r = &gc->msix_resource;
dev = gc->dev;
+ msi_index = spec->eq.msix_index;
spin_lock_irqsave(&r->lock, flags);
- msi_index = find_first_zero_bit(r->map, r->size);
- if (msi_index >= r->size || msi_index >= gc->num_msix_usable) {
- err = -ENOSPC;
- } else {
- bitmap_set(r->map, msi_index, 1);
- queue->eq.msix_index = msi_index;
- }
-
- spin_unlock_irqrestore(&r->lock, flags);
+ if (!spec->eq.msix_allocated) {
+ msi_index = find_first_zero_bit(r->map, r->size);
- if (err) {
- dev_err(dev, "Register IRQ err:%d, msi:%u rsize:%u, nMSI:%u",
- err, msi_index, r->size, gc->num_msix_usable);
+ if (msi_index >= r->size ||
+ msi_index >= gc->num_msix_usable)
+ err = -ENOSPC;
+ else
+ bitmap_set(r->map, msi_index, 1);
- return err;
+ if (err) {
+ dev_err(dev, "Register IRQ err:%d, msi:%u rsize:%u, nMSI:%u",
+ err, msi_index, r->size, gc->num_msix_usable);
+ goto out;
+ }
}
+ queue->eq.msix_index = msi_index;
gic = &gc->irq_contexts[msi_index];
- WARN_ON(gic->handler || gic->arg);
-
- gic->arg = queue;
+ list_add_rcu(&queue->entry, &gic->eq_list);
gic->handler = mana_gd_process_eq_events;
- return 0;
+out:
+ spin_unlock_irqrestore(&r->lock, flags);
+ return err;
}
-static void mana_gd_deregiser_irq(struct gdma_queue *queue)
+static void mana_gd_deregister_irq(struct gdma_queue *queue)
{
struct gdma_dev *gd = queue->gdma_dev;
struct gdma_irq_context *gic;
struct gdma_context *gc;
struct gdma_resource *r;
unsigned int msix_index;
+ struct list_head *p, *n;
+ struct gdma_queue *eq;
unsigned long flags;
gc = gd->gdma_context;
@@ -483,14 +485,23 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue)
if (WARN_ON(msix_index >= gc->num_msix_usable))
return;
+ spin_lock_irqsave(&r->lock, flags);
+
gic = &gc->irq_contexts[msix_index];
- gic->handler = NULL;
- gic->arg = NULL;
+ list_for_each_safe(p, n, &gic->eq_list) {
+ eq = list_entry(p, struct gdma_queue, entry);
+ if (queue == eq) {
+ list_del(&eq->entry);
+ break;
+ }
+ }
- spin_lock_irqsave(&r->lock, flags);
- bitmap_clear(r->map, msix_index, 1);
- spin_unlock_irqrestore(&r->lock, flags);
+ if (list_empty(&gic->eq_list)) {
+ gic->handler = NULL;
+ bitmap_clear(r->map, msix_index, 1);
+ }
+ spin_unlock_irqrestore(&r->lock, flags);
queue->eq.msix_index = INVALID_PCI_MSIX_INDEX;
}
@@ -553,7 +564,7 @@ static void mana_gd_destroy_eq(struct gdma_context *gc, bool flush_evenets,
dev_warn(gc->dev, "Failed to flush EQ: %d\n", err);
}
- mana_gd_deregiser_irq(queue);
+ mana_gd_deregister_irq(queue);
if (queue->eq.disable_needed)
mana_gd_disable_queue(queue);
@@ -568,7 +579,7 @@ static int mana_gd_create_eq(struct gdma_dev *gd,
u32 log2_num_entries;
int err;
- queue->eq.msix_index = INVALID_PCI_MSIX_INDEX;
+ queue->eq.msix_index = spec->eq.msix_index;
log2_num_entries = ilog2(queue->queue_size / GDMA_EQE_SIZE);
@@ -590,7 +601,8 @@ static int mana_gd_create_eq(struct gdma_dev *gd,
queue->eq.log2_throttle_limit = spec->eq.log2_throttle_limit ?: 1;
if (create_hwq) {
- err = mana_gd_create_hw_eq(gc, queue);
+ err = mana_gd_create_hw_eq(gc, queue,
+ spec->doorbell, spec->pdid);
if (err)
goto out;
@@ -800,6 +812,7 @@ int mana_gd_create_mana_eq(struct gdma_dev *gd,
kfree(queue);
return err;
}
+EXPORT_SYMBOL(mana_gd_create_mana_eq);
int mana_gd_create_mana_wq_cq(struct gdma_dev *gd,
const struct gdma_queue_spec *spec,
@@ -876,6 +889,7 @@ void mana_gd_destroy_queue(struct gdma_context *gc, struct gdma_queue *queue)
mana_gd_free_memory(gmi);
kfree(queue);
}
+EXPORT_SYMBOL(mana_gd_destroy_queue);
int mana_gd_verify_vf_version(struct pci_dev *pdev)
{
@@ -1193,7 +1207,7 @@ static irqreturn_t mana_gd_intr(int irq, void *arg)
struct gdma_irq_context *gic = arg;
if (gic->handler)
- gic->handler(gic->arg);
+ gic->handler(&gic->eq_list);
return IRQ_HANDLED;
}
@@ -1246,7 +1260,7 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
for (i = 0; i < nvec; i++) {
gic = &gc->irq_contexts[i];
gic->handler = NULL;
- gic->arg = NULL;
+ INIT_LIST_HEAD(&gic->eq_list);
if (!i)
snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_hwc@pci:%s",
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index a499e460594b..d2ba7de8b512 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1167,6 +1167,9 @@ static int mana_create_eq(struct mana_context *ac)
spec.eq.callback = NULL;
spec.eq.context = ac->eqs;
spec.eq.log2_throttle_limit = LOG2_EQ_THROTTLE;
+ spec.eq.msix_allocated = false;
+ spec.doorbell = gd->doorbell;
+ spec.pdid = gd->pdid;
for (i = 0; i < gc->max_num_queues; i++) {
err = mana_gd_create_mana_eq(gd, &spec, &ac->eqs[i].eq);
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index e2b212dd722b..aee8e8fa1ea6 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -57,6 +57,10 @@ enum gdma_eqe_type {
GDMA_EQE_HWC_INIT_EQ_ID_DB = 129,
GDMA_EQE_HWC_INIT_DATA = 130,
GDMA_EQE_HWC_INIT_DONE = 131,
+
+ /* IB NiC Events start at 176*/
+ GDMA_EQE_SOC_EVENT_NOTIFICATION = 176,
+ GDMA_EQE_SOC_EVENT_TEST,
};
enum {
@@ -291,6 +295,7 @@ struct gdma_queue {
u32 head;
u32 tail;
+ struct list_head entry;
/* Extra fields specific to EQ/CQ. */
union {
@@ -318,6 +323,8 @@ struct gdma_queue_spec {
enum gdma_queue_type type;
bool monitor_avl_buf;
unsigned int queue_size;
+ u32 doorbell;
+ u32 pdid;
/* Extra fields specific to EQ/CQ. */
union {
@@ -326,6 +333,8 @@ struct gdma_queue_spec {
void *context;
unsigned long log2_throttle_limit;
+ bool msix_allocated;
+ unsigned int msix_index;
} eq;
struct {
@@ -341,8 +350,8 @@ struct gdma_queue_spec {
#define MANA_IRQ_NAME_SZ 32
struct gdma_irq_context {
- void (*handler)(void *arg);
- void *arg;
+ void (*handler)(struct list_head *arg);
+ struct list_head eq_list;
char name[MANA_IRQ_NAME_SZ];
};
--
2.25.1
^ permalink raw reply related
* [Patch v2 4/5] RDMA/mana_ib : Create Adapter - each vf bound to adapter object
From: sharmaajay @ 2023-07-26 3:56 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky, Dexuan Cui, Wei Liu,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: linux-rdma, linux-hyperv, netdev, linux-kernel, Ajay Sharma
In-Reply-To: <1690343820-20188-1-git-send-email-sharmaajay@linuxonhyperv.com>
From: Ajay Sharma <sharmaajay@microsoft.com>
Create adapte object to have nice container
for VF resources.
Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com>
---
drivers/infiniband/hw/mana/device.c | 11 +++++-
drivers/infiniband/hw/mana/main.c | 50 ++++++++++++++++++++++++++++
drivers/infiniband/hw/mana/mana_ib.h | 30 +++++++++++++++++
3 files changed, 90 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mana/device.c b/drivers/infiniband/hw/mana/device.c
index 3ab4e69705df..4077e440657a 100644
--- a/drivers/infiniband/hw/mana/device.c
+++ b/drivers/infiniband/hw/mana/device.c
@@ -91,15 +91,23 @@ static int mana_ib_probe(struct auxiliary_device *adev,
goto deregister_device;
}
+ ret = mana_ib_create_adapter(mib_dev);
+ if (ret) {
+ ibdev_err(&mib_dev->ib_dev, "Failed to create adapter");
+ goto free_error_eq;
+ }
+
ret = ib_register_device(&mib_dev->ib_dev, "mana_%d",
mdev->gdma_context->dev);
if (ret)
- goto free_error_eq;
+ goto destroy_adapter;
dev_set_drvdata(&adev->dev, mib_dev);
return 0;
+destroy_adapter:
+ mana_ib_destroy_adapter(mib_dev);
free_error_eq:
mana_gd_destroy_queue(mib_dev->gc, mib_dev->fatal_err_eq);
deregister_device:
@@ -114,6 +122,7 @@ static void mana_ib_remove(struct auxiliary_device *adev)
struct mana_ib_dev *mib_dev = dev_get_drvdata(&adev->dev);
mana_gd_destroy_queue(mib_dev->gc, mib_dev->fatal_err_eq);
+ mana_ib_destroy_adapter(mib_dev);
mana_gd_deregister_device(&mib_dev->gc->mana_ib);
ib_unregister_device(&mib_dev->ib_dev);
ib_dealloc_device(&mib_dev->ib_dev);
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index 2ea24ba3065f..aab1cc096824 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -505,6 +505,56 @@ void mana_ib_disassociate_ucontext(struct ib_ucontext *ibcontext)
{
}
+int mana_ib_destroy_adapter(struct mana_ib_dev *mib_dev)
+{
+ struct mana_ib_destroy_adapter_resp resp = {};
+ struct mana_ib_destroy_adapter_req req = {};
+ struct gdma_context *gc;
+ int err;
+
+ gc = mib_dev->gc;
+
+ mana_gd_init_req_hdr(&req.hdr, MANA_IB_DESTROY_ADAPTER, sizeof(req),
+ sizeof(resp));
+ req.adapter = mib_dev->adapter_handle;
+ req.hdr.dev_id = gc->mana_ib.dev_id;
+
+ err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
+
+ if (err) {
+ ibdev_err(&mib_dev->ib_dev, "Failed to destroy adapter err %d", err);
+ return err;
+ }
+
+ return 0;
+}
+
+int mana_ib_create_adapter(struct mana_ib_dev *mib_dev)
+{
+ struct mana_ib_create_adapter_resp resp = {};
+ struct mana_ib_create_adapter_req req = {};
+ struct gdma_context *gc;
+ int err;
+
+ gc = mib_dev->gc;
+
+ mana_gd_init_req_hdr(&req.hdr, MANA_IB_CREATE_ADAPTER, sizeof(req),
+ sizeof(resp));
+ req.notify_eq_id = mib_dev->fatal_err_eq->id;
+ req.hdr.dev_id = gc->mana_ib.dev_id;
+
+ err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
+
+ if (err) {
+ ibdev_err(&mib_dev->ib_dev, "Failed to create adapter err %d", err);
+ return err;
+ }
+
+ mib_dev->adapter_handle = resp.adapter;
+
+ return 0;
+}
+
void mana_ib_soc_event_handler(void *ctx, struct gdma_queue *queue,
struct gdma_event *event)
{
diff --git a/drivers/infiniband/hw/mana/mana_ib.h b/drivers/infiniband/hw/mana/mana_ib.h
index 4383777354d3..8a652bccd978 100644
--- a/drivers/infiniband/hw/mana/mana_ib.h
+++ b/drivers/infiniband/hw/mana/mana_ib.h
@@ -32,6 +32,7 @@ struct mana_ib_dev {
struct gdma_dev *gdma_dev;
struct gdma_context *gc;
struct gdma_queue *fatal_err_eq;
+ mana_handle_t adapter_handle;
};
struct mana_ib_wq {
@@ -94,6 +95,31 @@ struct mana_ib_rwq_ind_table {
struct ib_rwq_ind_table ib_ind_table;
};
+enum mana_ib_command_code {
+ MANA_IB_CREATE_ADAPTER = 0x30002,
+ MANA_IB_DESTROY_ADAPTER = 0x30003,
+};
+
+struct mana_ib_create_adapter_req {
+ struct gdma_req_hdr hdr;
+ u32 notify_eq_id;
+ u32 reserved;
+}; /*HW Data */
+
+struct mana_ib_create_adapter_resp {
+ struct gdma_resp_hdr hdr;
+ mana_handle_t adapter;
+}; /* HW Data */
+
+struct mana_ib_destroy_adapter_req {
+ struct gdma_req_hdr hdr;
+ mana_handle_t adapter;
+}; /*HW Data */
+
+struct mana_ib_destroy_adapter_resp {
+ struct gdma_resp_hdr hdr;
+}; /* HW Data */
+
int mana_ib_gd_create_dma_region(struct mana_ib_dev *mib_dev,
struct ib_umem *umem,
mana_handle_t *gdma_region);
@@ -164,4 +190,8 @@ void mana_ib_disassociate_ucontext(struct ib_ucontext *ibcontext);
int mana_ib_create_error_eq(struct mana_ib_dev *mib_dev);
+int mana_ib_create_adapter(struct mana_ib_dev *mib_dev);
+
+int mana_ib_destroy_adapter(struct mana_ib_dev *mib_dev);
+
#endif
--
2.25.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox