From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16EB9234971 for ; Wed, 21 May 2025 19:17:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747855041; cv=none; b=Yv4RjeUb/0dF9AyYms4zXQyvQqXZhnD8jCFkVNz95yOaRJpZQPrG5ItrDffj8lCURrxeMZ4d6N3NV72SY9V5Oh8LW3kntSfU+lI3uzjPtebgmB/V5I+TcuMmYJDf4aEZ7PIJzCh8sQELfzP3JOOKP/IvDkHe/NinNDH4Zrzb+4k= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747855041; c=relaxed/simple; bh=9kZv0wB7aYOjgY0M3wV8Agi5BwVU9rPueqNiaXYr230=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=NJebyQydlMZorveStamF/4G2kl56KKmeGs0stQp8d6ani6UzghD282VXcftgccZdBAJlwVTXiJFfWPAmcA9HEy9n1H/xYmGTQLxmUT4z1pklEB0VGQLQwoxpSsXVyj+Dk2TSWQL2n1a7QBwBqGSioSvVq2ofLp2Jrwj81x1GAIc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=cwnmEjUZ; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cwnmEjUZ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1747855036; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nJXXXDW4PYAr8Jjvjaer7RTmIfx81VQn47DGQ6gp/X4=; b=cwnmEjUZwY4yoJwThopPL1OSBnuyhcSUDviCGZh+MHWkv2JHpELoULz18ZzVuMPfTY4C5x Tziic956THPxvzviBS0SIdwO1lrI1fNchZTclmYkvCEFnT9QkmIEoDO/mlQHKDx0EJE3nR Qouke4lwvW7nrvL/O/O0Xj34nim+g4o= Received: from mail-io1-f69.google.com (mail-io1-f69.google.com [209.85.166.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-357-TQ374jlPMaupZ3G-hCnAaQ-1; Wed, 21 May 2025 15:17:15 -0400 X-MC-Unique: TQ374jlPMaupZ3G-hCnAaQ-1 X-Mimecast-MFC-AGG-ID: TQ374jlPMaupZ3G-hCnAaQ_1747855035 Received: by mail-io1-f69.google.com with SMTP id ca18e2360f4ac-85b3e93e052so127672739f.3 for ; Wed, 21 May 2025 12:17:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747855034; x=1748459834; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nJXXXDW4PYAr8Jjvjaer7RTmIfx81VQn47DGQ6gp/X4=; b=UsNQfIryBwdXeVofR24iJBrGoxASeDmpH3HOdRh5eR3I+l5wPQULwuC+pCkna7Ogwn 8GXSOOndGYe0fAmksJ6ZhFFIHF0/sanaZwZtdD0YS4bvfkXbLml8enK94QotfyKvmkeh 7WrN6KfUrn4dfY+zdouQIVBhxehz+jXgGBcJ+fFblCY3L2vapkALLN5rjHR2o9Ts+lgw 4bgZqNLrQgnd8C1QV6+JAuDrTYc7urPbl+wIPQw8TvW1fug5fUSctXJqeS3CxkKwEwxn yxHOl8+pZ1PpIiSO62Bt6rzmTHhuLdHT/IjgZ0oti+P42Br10bCmoqvDy8peXuYlij5Z +YJQ== X-Forwarded-Encrypted: i=1; AJvYcCWSsKeNiD1EPrJ2U73LfieLSti6uUpNUs5Xr9wQLO/91iwIswZ1iUDSwYIyfL/SrULHgg8UB2M6AMfLynw=@vger.kernel.org X-Gm-Message-State: AOJu0Yxyy2qM5OpmWSgS/HxZWO6avi0RHAsLKMR8N42W2fHvjgZeafGZ OFxFYO78JCO86XV9fTPHUKI1efi4e90MbQ+WHqF6LT68q3zvw6D3bw3wNfzicwiVeXe81zVe20Y 9DoNOloXgyFBDi/uvqpqT9QnPkmq9lYkl0meYEEda50QxrL11IhE5EttwBs2sUjW5q6HrLzULXw == X-Gm-Gg: ASbGnculOEYiezIRbceUys+I0Z2eEb9OdVPkz4dnlYfcFJVgQ+5FpZS2jJBqGHHoz5a 6thhLmYO8xy5Y3XNxA6JDi0L94HtFdHZh4EJa3qwqMCE8E/yOulYj27oBG5aHfTAcK5owo79It+ kH6+f/uEYWyQGz+WUrdnY6yF569opYgcIHvp1eLuFvFrKV/LahQvFyPVYdAcvxDo+QmWQVxw/DL 3F8feJmOdGOf0+WXFDIrWzP2RLo9pAii+t7mN9K/LTuwhcB8FDkDN8MY4qMabqeiq4ffcmUGOCr +l0Y03YG5f9XzG8= X-Received: by 2002:a05:6602:2d8d:b0:85d:9738:54ac with SMTP id ca18e2360f4ac-86a2319bcf3mr742131239f.2.1747855034255; Wed, 21 May 2025 12:17:14 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGSvVThNYy1+dND2QTneRwGhWDuKajIiFFHmLrsqDkQp1WNz78tRhTCrOQoZ5e3B+h1U5LwsA== X-Received: by 2002:a05:6602:2d8d:b0:85d:9738:54ac with SMTP id ca18e2360f4ac-86a2319bcf3mr742129539f.2.1747855033881; Wed, 21 May 2025 12:17:13 -0700 (PDT) Received: from redhat.com ([38.15.36.11]) by smtp.gmail.com with ESMTPSA id ca18e2360f4ac-86a2360cb1esm266746339f.24.2025.05.21.12.17.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 May 2025 12:17:13 -0700 (PDT) Date: Wed, 21 May 2025 13:17:11 -0600 From: Alex Williamson To: lizhe.67@bytedance.com Cc: david@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, peterx@redhat.com Subject: Re: [PATCH v4] vfio/type1: optimize vfio_pin_pages_remote() for large folios Message-ID: <20250521131711.4e0d3f2f.alex.williamson@redhat.com> In-Reply-To: <20250521042507.77205-1-lizhe.67@bytedance.com> References: <20250521042507.77205-1-lizhe.67@bytedance.com> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.43; x86_64-redhat-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Wed, 21 May 2025 12:25:07 +0800 lizhe.67@bytedance.com wrote: > From: Li Zhe > > When vfio_pin_pages_remote() is called with a range of addresses that > includes large folios, the function currently performs individual > statistics counting operations for each page. This can lead to significant > performance overheads, especially when dealing with large ranges of pages. > > This patch optimize this process by batching the statistics counting > operations. > > The performance test results for completing the 8G VFIO IOMMU DMA mapping, > obtained through trace-cmd, are as follows. In this case, the 8G virtual > address space has been mapped to physical memory using hugetlbfs with > pagesize=2M. > > Before this patch: > funcgraph_entry: # 33813.703 us | vfio_pin_map_dma(); > > After this patch: > funcgraph_entry: # 16071.378 us | vfio_pin_map_dma(); > > Signed-off-by: Li Zhe > Co-developed-by: Alex Williamson > Signed-off-by: Alex Williamson > --- Given the discussion on v3, this is currently a Nak. Follow-up in that thread if there are further ideas how to salvage this. Thanks, Alex > Changelogs: > > v3->v4: > - Use min_t() to obtain the step size, rather than min(). > - Fix some issues in commit message and title. > > v2->v3: > - Code simplification. > - Fix some issues in comments. > > v1->v2: > - Fix some issues in comments and formatting. > - Consolidate vfio_find_vpfn_range() and vfio_find_vpfn(). > - Move the processing logic for hugetlbfs folio into the while(true) loop > and use a variable with a default value of 1 to indicate the number of > consecutive pages. > > v3 patch: https://lore.kernel.org/all/20250520070020.6181-1-lizhe.67@bytedance.com/ > v2 patch: https://lore.kernel.org/all/20250519070419.25827-1-lizhe.67@bytedance.com/ > v1 patch: https://lore.kernel.org/all/20250513035730.96387-1-lizhe.67@bytedance.com/ > > drivers/vfio/vfio_iommu_type1.c | 48 +++++++++++++++++++++++++-------- > 1 file changed, 37 insertions(+), 11 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 0ac56072af9f..bd46ed9361fe 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -319,15 +319,22 @@ static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu) > /* > * Helper Functions for host iova-pfn list > */ > -static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova) > + > +/* > + * Find the first vfio_pfn that overlapping the range > + * [iova, iova + PAGE_SIZE * npage) in rb tree. > + */ > +static struct vfio_pfn *vfio_find_vpfn_range(struct vfio_dma *dma, > + dma_addr_t iova, unsigned long npage) > { > struct vfio_pfn *vpfn; > struct rb_node *node = dma->pfn_list.rb_node; > + dma_addr_t end_iova = iova + PAGE_SIZE * npage; > > while (node) { > vpfn = rb_entry(node, struct vfio_pfn, node); > > - if (iova < vpfn->iova) > + if (end_iova <= vpfn->iova) > node = node->rb_left; > else if (iova > vpfn->iova) > node = node->rb_right; > @@ -337,6 +344,11 @@ static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova) > return NULL; > } > > +static inline struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova) > +{ > + return vfio_find_vpfn_range(dma, iova, 1); > +} > + > static void vfio_link_pfn(struct vfio_dma *dma, > struct vfio_pfn *new) > { > @@ -681,32 +693,46 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, > * and rsvd here, and therefore continues to use the batch. > */ > while (true) { > + struct folio *folio = page_folio(batch->pages[batch->offset]); > + long nr_pages; > + > if (pfn != *pfn_base + pinned || > rsvd != is_invalid_reserved_pfn(pfn)) > goto out; > > + /* > + * Note: The current nr_pages does not achieve the optimal > + * performance in scenarios where folio_nr_pages() exceeds > + * batch->capacity. It is anticipated that future enhancements > + * will address this limitation. > + */ > + nr_pages = min_t(long, batch->size, folio_nr_pages(folio) - > + folio_page_idx(folio, batch->pages[batch->offset])); > + if (nr_pages > 1 && vfio_find_vpfn_range(dma, iova, nr_pages)) > + nr_pages = 1; > + > /* > * Reserved pages aren't counted against the user, > * externally pinned pages are already counted against > * the user. > */ > - if (!rsvd && !vfio_find_vpfn(dma, iova)) { > + if (!rsvd && (nr_pages > 1 || !vfio_find_vpfn(dma, iova))) { > if (!dma->lock_cap && > - mm->locked_vm + lock_acct + 1 > limit) { > + mm->locked_vm + lock_acct + nr_pages > limit) { > pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", > __func__, limit << PAGE_SHIFT); > ret = -ENOMEM; > goto unpin_out; > } > - lock_acct++; > + lock_acct += nr_pages; > } > > - pinned++; > - npage--; > - vaddr += PAGE_SIZE; > - iova += PAGE_SIZE; > - batch->offset++; > - batch->size--; > + pinned += nr_pages; > + npage -= nr_pages; > + vaddr += PAGE_SIZE * nr_pages; > + iova += PAGE_SIZE * nr_pages; > + batch->offset += nr_pages; > + batch->size -= nr_pages; > > if (!batch->size) > break;