From: Bodo Stroesser <bostroesser@gmail.com>
To: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>,
linux-mm@kvack.org, target-devel@vger.kernel.org,
linux-scsi@vger.kernel.org
Cc: linux-block@vger.kernel.org, xuyu@linux.alibaba.com
Subject: Re: [RFC 0/3] Add zero copy feature for tcmu
Date: Tue, 22 Mar 2022 13:40:39 +0100 [thread overview]
Message-ID: <abbe51c4-873f-e96e-d421-85906689a55a@gmail.com> (raw)
In-Reply-To: <20220318095531.15479-1-xiaoguang.wang@linux.alibaba.com>
On 18.03.22 10:55, Xiaoguang Wang wrote:
> The core idea to implement tcmu zero copy feature is really straight,
> which just maps block device io request's sgl pages to tcmu user space
> backstore, then we can avoid extra copy overhead between sgl pages and
> tcmu internal data area(which really impacts io throughput), please see
> https://www.spinics.net/lists/target-devel/msg21121.html for detailed
> info.
>
Can you please tell us, how big the performance improvement is and
which configuration you are using for measurenments?
> Initially I use remap_pfn_range or vm_insert_pages to map sgl pages to
> user space, but both of them have limits:
> 1) Use vm_insert_pages
> which is like tcp getsockopt(TCP_ZEROCOPY_RECEIVE), but there're two
> restrictions:
> 1. anonymous pages can not be mmaped to user spacea.
> ==> vm_insert_pages
> ====> insert_pages
> ======> insert_page_in_batch_locked
> ========> validate_page_before_insert
> In validate_page_before_insert(), it shows that anonymous page can not
> be mapped to use space, we know that if issuing direct io to block
> device, io request's sgl pages mostly comes from anonymous page.
> if (PageAnon(page) || PageSlab(page) || page_has_type(page))
> return -EINVAL;
> I'm not sure why there is such restriction? for safety reasons ?
>
> 2. warn_on triggered in __folio_mark_dirty
> When calling zap_page_range in tcmu user space backstore when io
> completes, there is a warn_on triggered in __folio_mark_dirty:
> if (folio->mapping) { /* Race with truncate? */
> WARN_ON_ONCE(warn && !folio_test_uptodate(folio));
>
> I'm not familiar with folio yet, but I think the reason is that when
> issuing a buffered read to tcmu block device, it's page cache mapped
> to user space, backstore write this page and pte will be dirtied. but
> initially it's newly allocated, hence page_update flag not set.
> In zap_pte_range(), there is such codes:
> if (!PageAnon(page)) {
> if (pte_dirty(ptent)) {
> force_flush = 1;
> set_page_dirty(page);
> }
> So this warn_on is reasonable.
> Indeed what I want is just to map io request sgl pages to tcmu user
> space backstore, then backstore can read or write data to mapped area,
> I don't want to care about page or its mapping status, so I choose to
> use remap_pfn_range.
>
> 2) Use remap_pfn_range()
> remap_pfn_range works well, but it has somewhat obvious overhead. For a
> 512kb io request, it has 128 pages, and usually this 128 page's pfn are
> not consecutive, so in worst cases, for a 512kb io request, I'd need to
> issue 128 calls to remap_pfn_range, it's horrible. And in remap_pfn_range,
> if x86 page attribute table feature is enabled, lookup_memtype called by
> track_pfn_remap() also introduces obvious overhead.
>
> Finally in order to solve these problems, Xu Yu helps to implment a new
> helper, which accepts an array of pages as parameter, anonymous pages can
> be mapped to user space, pages would be treated as special pte(pte_special
> returns true), so vm_normal_page returns NULL, above folio warn_on won't
> trigger.
>
> Thanks.
>
> Xiaoguang Wang (2):
> mm: export zap_page_range()
> scsi: target: tcmu: Support zero copy
>
> Xu Yu (1):
> mm/memory.c: introduce vm_insert_page(s)_mkspecial
>
> drivers/target/target_core_user.c | 257 +++++++++++++++++++++++++++++++++-----
> include/linux/mm.h | 2 +
> mm/memory.c | 183 +++++++++++++++++++++++++++
> 3 files changed, 414 insertions(+), 28 deletions(-)
>
next prev parent reply other threads:[~2022-03-22 12:40 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-18 9:55 [RFC 0/3] Add zero copy feature for tcmu Xiaoguang Wang
2022-03-18 9:55 ` [RFC 1/3] mm/memory.c: introduce vm_insert_page(s)_mkspecial Xiaoguang Wang
2022-03-23 16:45 ` Christoph Hellwig
2022-03-24 7:27 ` Xiaoguang Wang
2022-03-18 9:55 ` [RFC 2/3] mm: export zap_page_range() Xiaoguang Wang
2022-03-21 12:01 ` David Hildenbrand
2022-03-22 13:02 ` Xiaoguang Wang
2022-03-22 13:08 ` David Hildenbrand
2022-03-23 13:59 ` Xiaoguang Wang
2022-03-23 16:48 ` Christoph Hellwig
2022-03-23 16:47 ` Christoph Hellwig
2022-03-24 9:16 ` Ming Lei
2022-03-18 9:55 ` [RFC 3/3] scsi: target: tcmu: Support zero copy Xiaoguang Wang
2022-03-22 14:01 ` Bodo Stroesser
2022-03-23 14:33 ` Xiaoguang Wang
2022-03-25 9:06 ` Bodo Stroesser
2022-03-22 12:40 ` Bodo Stroesser [this message]
2022-03-22 13:17 ` [RFC 0/3] Add zero copy feature for tcmu Xiaoguang Wang
2022-03-22 14:05 ` Bodo Stroesser
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abbe51c4-873f-e96e-d421-85906689a55a@gmail.com \
--to=bostroesser@gmail.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-scsi@vger.kernel.org \
--cc=target-devel@vger.kernel.org \
--cc=xiaoguang.wang@linux.alibaba.com \
--cc=xuyu@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).