From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 78732D1356F for ; Sun, 27 Oct 2024 15:30:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=zqvfSXA47GF7YeUS5geOGS1Di2D4vIEM/P6G8yafBlY=; b=B4LC4IYIawHD6loKtgqTPyjsOh 6LppcDKxuny7zpNts8k//S1G/XQGbZpk1AAwmogY7E4iqhS+eQ8T2AK3a/AXusXLBINztqac4ZNO/ QFsyfRRgN9tZpjKmKXnu+233mZJDgIZxq1AUtDUKxsIRCaooKD2xehoQ0+2WWagrgf/xTfmEoo6rI BLuOd2G6UXk9h8YhN2GqOHuoR/m6JnIE0NHNO6EtYQN1aSUdfvH9M5TohEptLvOhYJmRx+ea0oH0l jM/4YlAtlwZJlHBySsC+Z2TodyKcl7sumw/X7ufGrKfJUyy2tz4aQUDylyAWr96ZYnUtPtUY2Sk4t JZseu2Og==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1t55Dc-00000008TRM-3Ps2; Sun, 27 Oct 2024 15:30:16 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1t54AL-00000008KlN-1KHx for linux-nvme@bombadil.infradead.org; Sun, 27 Oct 2024 14:22:49 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=zqvfSXA47GF7YeUS5geOGS1Di2D4vIEM/P6G8yafBlY=; b=Ci/8e1ckkv6aZNkkInFMSwpwEN LjACTb5VUxWuQfOurmVPjzvCcowyjgQLx18YoSvxvGC+uU7I9OMUI/QbfiKTaUq/QdbdB36SSJdyF M64gKhkgsZ/3EsCCzQqEWQuZJRq11/+CisdYvRdOf58uA4DkciyGLRbky/LmkeW+6cDOriDk/e8uX mxpTSf85zHosJLy/qsLYq/fcrXNs+yodCfCWHmh/ecuPgLqjyecv+DFEkb6Kvo3axgHNOOS1vRVwQ R3rX0z8ZZKnn1Kw2XDP5Nyjyqv+LfXEYoqZ/RC5BAlSgUUKTFFfltckxvZOEu9sq9w85t7Xx3D4U2 504sfjoQ==; Received: from nyc.source.kernel.org ([147.75.193.91]) by desiato.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1t54AF-00000009QPR-3KLT for linux-nvme@lists.infradead.org; Sun, 27 Oct 2024 14:22:47 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 09BF0A40FA0; Sun, 27 Oct 2024 14:20:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0CCEAC4CEC3; Sun, 27 Oct 2024 14:22:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730038960; bh=9Y8s1sn+jRVmq+XduWzJ+4BCR70epcg9p+8pEEFM2cI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=D6nBNy7KRpHMQDBD/hIk8id0BwPrCi1QrX6RrVsB2mIixEfPurh4i3M7Q+NV8Pe+G O1UJBOKRQl1UIMFjxgpdFl264u3K378hbDGEWKGy0KIWwt5SdFzvoUsGeCbwF46Eb0 sVEBYsZ2HVsm83TcrJXjk7/XZP96tbjWUizHwVcMnAmKJm6J5d29yR8sT2IhN4v+oo sYQo22P1Bkwsv65FV96jsyaj8FU7QvlpcUQ4UHHGclNVimMdEHYeG+cD3hoMzUy8IE zW/kkqQxodZssIgCBi7yBhnU8LumK9wCldrWqn+g3ObKUe3DGjGVKJMSMy9Z7Xmgc0 C/UIHvlE0yU7g== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Christoph Hellwig , Sagi Grimberg Cc: Leon Romanovsky , Keith Busch , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , Jonathan Corbet , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 18/18] vfio/mlx5: Convert vfio to use DMA link API Date: Sun, 27 Oct 2024 16:21:18 +0200 Message-ID: <0a517ddff099c14fac1ceb0e75f2f50ed183d09c.1730037276.git.leon@kernel.org> X-Mailer: git-send-email 2.46.2 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241027_142245_109550_694FF733 X-CRM114-Status: GOOD ( 23.22 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org From: Leon Romanovsky Remove intermediate scatter-gather table as it is not needed if DMA link API is used. This conversion reduces drastically the memory used to manage that table. Signed-off-by: Leon Romanovsky --- drivers/vfio/pci/mlx5/cmd.c | 211 ++++++++++++++++++----------------- drivers/vfio/pci/mlx5/cmd.h | 9 +- drivers/vfio/pci/mlx5/main.c | 31 +---- 3 files changed, 114 insertions(+), 137 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 34ae3e299a9e..58c490222be7 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -345,25 +345,81 @@ static u32 *alloc_mkey_in(u32 npages, u32 pdn) return in; } -static int create_mkey(struct mlx5_core_dev *mdev, u32 npages, - struct mlx5_vhca_data_buffer *buf, u32 *mkey_in, +static int create_mkey(struct mlx5_core_dev *mdev, u32 npages, u32 *mkey_in, u32 *mkey) { + int inlen = MLX5_ST_SZ_BYTES(create_mkey_in) + + sizeof(__be64) * round_up(npages, 2); + + return mlx5_core_create_mkey(mdev, mkey, mkey_in, inlen); +} + +static void unregister_dma_pages(struct mlx5_core_dev *mdev, u32 npages, + u32 *mkey_in, struct dma_iova_state *state, + enum dma_data_direction dir) +{ + dma_addr_t addr; __be64 *mtt; - int inlen; + int i; - mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); - if (buf) { - struct sg_dma_page_iter dma_iter; + WARN_ON_ONCE(dir == DMA_NONE); - for_each_sgtable_dma_page(&buf->table.sgt, &dma_iter, 0) - *mtt++ = cpu_to_be64(sg_page_iter_dma_address(&dma_iter)); + if (dma_use_iova(state)) { + dma_iova_destroy(mdev->device, state, dir, 0); + } else { + mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, + klm_pas_mtt); + for (i = npages - 1; i >= 0; i--) { + addr = be64_to_cpu(mtt[i]); + dma_unmap_page(mdev->device, addr, PAGE_SIZE, dir); + } } +} - inlen = MLX5_ST_SZ_BYTES(create_mkey_in) + - sizeof(__be64) * round_up(npages, 2); +static int register_dma_pages(struct mlx5_core_dev *mdev, u32 npages, + struct page **page_list, u32 *mkey_in, + struct dma_iova_state *state, + enum dma_data_direction dir) +{ + dma_addr_t addr; + size_t mapped = 0; + __be64 *mtt; + int i, err; - return mlx5_core_create_mkey(mdev, mkey, mkey_in, inlen); + WARN_ON_ONCE(dir == DMA_NONE); + + mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); + + if (dma_iova_try_alloc(mdev->device, state, 0, npages * PAGE_SIZE)) { + addr = state->addr; + for (i = 0; i < npages; i++) { + err = dma_iova_link(mdev->device, state, + page_to_phys(page_list[i]), mapped, + PAGE_SIZE, dir, 0); + if (err) + break; + *mtt++ = cpu_to_be64(addr); + addr += PAGE_SIZE; + mapped += PAGE_SIZE; + } + err = dma_iova_sync(mdev->device, state, 0, mapped, err); + if (err) + goto error; + } else { + for (i = 0; i < npages; i++) { + addr = dma_map_page(mdev->device, page_list[i], 0, + PAGE_SIZE, dir); + err = dma_mapping_error(mdev->device, addr); + if (err) + goto error; + *mtt++ = cpu_to_be64(addr); + } + } + return 0; + +error: + unregister_dma_pages(mdev, i, mkey_in, state, dir); + return err; } static int mlx5vf_dma_data_buffer(struct mlx5_vhca_data_buffer *buf) @@ -379,50 +435,57 @@ static int mlx5vf_dma_data_buffer(struct mlx5_vhca_data_buffer *buf) if (buf->mkey_in || !buf->npages) return -EINVAL; - ret = dma_map_sgtable(mdev->device, &buf->table.sgt, buf->dma_dir, 0); - if (ret) - return ret; - buf->mkey_in = alloc_mkey_in(buf->npages, buf->migf->pdn); - if (!buf->mkey_in) { - ret = -ENOMEM; - goto err; - } + if (!buf->mkey_in) + return -ENOMEM; + + ret = register_dma_pages(mdev, buf->npages, buf->page_list, + buf->mkey_in, &buf->state, buf->dma_dir); + if (ret) + goto err_register_dma; - ret = create_mkey(mdev, buf->npages, buf, buf->mkey_in, &buf->mkey); + ret = create_mkey(mdev, buf->npages, buf->mkey_in, &buf->mkey); if (ret) goto err_create_mkey; return 0; err_create_mkey: + unregister_dma_pages(mdev, buf->npages, buf->mkey_in, &buf->state, + buf->dma_dir); +err_register_dma: kvfree(buf->mkey_in); buf->mkey_in = NULL; -err: - dma_unmap_sgtable(mdev->device, &buf->table.sgt, buf->dma_dir, 0); return ret; } +static void free_page_list(u32 npages, struct page **page_list) +{ + int i; + + /* Undo alloc_pages_bulk_array() */ + for (i = npages - 1; i >= 0; i--) + __free_page(page_list[i]); + + kvfree(page_list); +} + void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buffer *buf) { - struct mlx5_vf_migration_file *migf = buf->migf; - struct sg_page_iter sg_iter; + struct mlx5vf_pci_core_device *mvdev = buf->migf->mvdev; + struct mlx5_core_dev *mdev = mvdev->mdev; - lockdep_assert_held(&migf->mvdev->state_mutex); - WARN_ON(migf->mvdev->mdev_detach); + lockdep_assert_held(&mvdev->state_mutex); + WARN_ON(mvdev->mdev_detach); if (buf->mkey_in) { - mlx5_core_destroy_mkey(migf->mvdev->mdev, buf->mkey); + mlx5_core_destroy_mkey(mdev, buf->mkey); + unregister_dma_pages(mdev, buf->npages, buf->mkey_in, + &buf->state, buf->dma_dir); kvfree(buf->mkey_in); - dma_unmap_sgtable(migf->mvdev->mdev->device, &buf->table.sgt, - buf->dma_dir, 0); } - /* Undo alloc_pages_bulk_array() */ - for_each_sgtable_page(&buf->table.sgt, &sg_iter, 0) - __free_page(sg_page_iter_page(&sg_iter)); - sg_free_append_table(&buf->table); - kvfree(buf->page_list); + free_page_list(buf->npages, buf->page_list); kfree(buf); } @@ -433,7 +496,6 @@ static int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, struct page **page_list; unsigned long filled; unsigned int to_fill; - int ret; to_fill = min_t(unsigned int, npages, PAGE_SIZE / sizeof(*buf->page_list)); page_list = kvzalloc(to_fill * sizeof(*buf->page_list), GFP_KERNEL_ACCOUNT); @@ -443,22 +505,13 @@ static int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, buf->page_list = page_list; do { - filled = alloc_pages_bulk_array(GFP_KERNEL_ACCOUNT, to_fill, - buf->page_list + buf->npages); + filled = alloc_pages_bulk_array(GFP_KERNEL_ACCOUNT, to_alloc, + buf->page_list + buf->npages); if (!filled) return -ENOMEM; to_alloc -= filled; - ret = sg_alloc_append_table_from_pages( - &buf->table, buf->page_list + buf->npages, filled, 0, - filled << PAGE_SHIFT, UINT_MAX, SG_MAX_SINGLE_ALLOC, - GFP_KERNEL_ACCOUNT); - - if (ret) - return ret; buf->npages += filled; - to_fill = min_t(unsigned int, to_alloc, - PAGE_SIZE / sizeof(*buf->page_list)); } while (to_alloc > 0); return 0; @@ -1340,17 +1393,6 @@ static void mlx5vf_destroy_qp(struct mlx5_core_dev *mdev, kfree(qp); } -static void free_recv_pages(struct mlx5_vhca_recv_buf *recv_buf) -{ - int i; - - /* Undo alloc_pages_bulk_array() */ - for (i = 0; i < recv_buf->npages; i++) - __free_page(recv_buf->page_list[i]); - - kvfree(recv_buf->page_list); -} - static int alloc_recv_pages(struct mlx5_vhca_recv_buf *recv_buf, unsigned int npages) { @@ -1386,45 +1428,6 @@ static int alloc_recv_pages(struct mlx5_vhca_recv_buf *recv_buf, kvfree(recv_buf->page_list); return -ENOMEM; } -static void unregister_dma_pages(struct mlx5_core_dev *mdev, u32 npages, - u32 *mkey_in) -{ - dma_addr_t addr; - __be64 *mtt; - int i; - - mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); - for (i = npages - 1; i >= 0; i--) { - addr = be64_to_cpu(mtt[i]); - dma_unmap_single(mdev->device, addr, PAGE_SIZE, - DMA_FROM_DEVICE); - } -} - -static int register_dma_pages(struct mlx5_core_dev *mdev, u32 npages, - struct page **page_list, u32 *mkey_in) -{ - dma_addr_t addr; - __be64 *mtt; - int i; - - mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); - - for (i = 0; i < npages; i++) { - addr = dma_map_page(mdev->device, page_list[i], 0, PAGE_SIZE, - DMA_FROM_DEVICE); - if (dma_mapping_error(mdev->device, addr)) - goto error; - - *mtt++ = cpu_to_be64(addr); - } - - return 0; - -error: - unregister_dma_pages(mdev, i, mkey_in); - return -ENOMEM; -} static void mlx5vf_free_qp_recv_resources(struct mlx5_core_dev *mdev, struct mlx5_vhca_qp *qp) @@ -1432,9 +1435,10 @@ static void mlx5vf_free_qp_recv_resources(struct mlx5_core_dev *mdev, struct mlx5_vhca_recv_buf *recv_buf = &qp->recv_buf; mlx5_core_destroy_mkey(mdev, recv_buf->mkey); - unregister_dma_pages(mdev, recv_buf->npages, recv_buf->mkey_in); + unregister_dma_pages(mdev, recv_buf->npages, recv_buf->mkey_in, + &recv_buf->state, DMA_FROM_DEVICE); kvfree(recv_buf->mkey_in); - free_recv_pages(&qp->recv_buf); + free_page_list(recv_buf->npages, recv_buf->page_list); } static int mlx5vf_alloc_qp_recv_resources(struct mlx5_core_dev *mdev, @@ -1456,24 +1460,25 @@ static int mlx5vf_alloc_qp_recv_resources(struct mlx5_core_dev *mdev, } err = register_dma_pages(mdev, npages, recv_buf->page_list, - recv_buf->mkey_in); + recv_buf->mkey_in, &recv_buf->state, + DMA_FROM_DEVICE); if (err) goto err_register_dma; - err = create_mkey(mdev, npages, NULL, recv_buf->mkey_in, - &recv_buf->mkey); + err = create_mkey(mdev, npages, recv_buf->mkey_in, &recv_buf->mkey); if (err) goto err_create_mkey; return 0; err_create_mkey: - unregister_dma_pages(mdev, npages, recv_buf->mkey_in); + unregister_dma_pages(mdev, npages, recv_buf->mkey_in, &recv_buf->state, + DMA_FROM_DEVICE); err_register_dma: kvfree(recv_buf->mkey_in); recv_buf->mkey_in = NULL; end: - free_recv_pages(recv_buf); + free_page_list(npages, recv_buf->page_list); return err; } diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 5b764199db53..f9c7268272e7 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -54,20 +54,16 @@ struct mlx5_vf_migration_header { struct mlx5_vhca_data_buffer { struct page **page_list; - struct sg_append_table table; + struct dma_iova_state state; + enum dma_data_direction dma_dir; loff_t start_pos; u64 length; u32 npages; u32 mkey; u32 *mkey_in; - enum dma_data_direction dma_dir; u8 stop_copy_chunk_num; struct list_head buf_elm; struct mlx5_vf_migration_file *migf; - /* Optimize mlx5vf_get_migration_page() for sequential access */ - struct scatterlist *last_offset_sg; - unsigned int sg_last_entry; - unsigned long last_offset; }; struct mlx5vf_async_data { @@ -134,6 +130,7 @@ struct mlx5_vhca_cq { struct mlx5_vhca_recv_buf { u32 npages; struct page **page_list; + struct dma_iova_state state; u32 next_rq_offset; u32 *mkey_in; u32 mkey; diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index a1dbee3be1e0..cac99e6b047d 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -34,35 +34,10 @@ static struct mlx5vf_pci_core_device *mlx5vf_drvdata(struct pci_dev *pdev) core_device); } -struct page * -mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf, - unsigned long offset) +struct page *mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf, + unsigned long offset) { - unsigned long cur_offset = 0; - struct scatterlist *sg; - unsigned int i; - - /* All accesses are sequential */ - if (offset < buf->last_offset || !buf->last_offset_sg) { - buf->last_offset = 0; - buf->last_offset_sg = buf->table.sgt.sgl; - buf->sg_last_entry = 0; - } - - cur_offset = buf->last_offset; - - for_each_sg(buf->last_offset_sg, sg, - buf->table.sgt.orig_nents - buf->sg_last_entry, i) { - if (offset < sg->length + cur_offset) { - buf->last_offset_sg = sg; - buf->sg_last_entry += i; - buf->last_offset = cur_offset; - return nth_page(sg_page(sg), - (offset - cur_offset) / PAGE_SIZE); - } - cur_offset += sg->length; - } - return NULL; + return buf->page_list[offset / PAGE_SIZE]; } static void mlx5vf_disable_fd(struct mlx5_vf_migration_file *migf) -- 2.46.2