linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Bodo Stroesser <bostroesser@gmail.com>
To: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>,
	linux-mm@kvack.org, target-devel@vger.kernel.org,
	linux-scsi@vger.kernel.org
Cc: linux-block@vger.kernel.org, xuyu@linux.alibaba.com
Subject: Re: [RFC 0/3] Add zero copy feature for tcmu
Date: Tue, 22 Mar 2022 13:40:39 +0100	[thread overview]
Message-ID: <abbe51c4-873f-e96e-d421-85906689a55a@gmail.com> (raw)
In-Reply-To: <20220318095531.15479-1-xiaoguang.wang@linux.alibaba.com>

On 18.03.22 10:55, Xiaoguang Wang wrote:
> The core idea to implement tcmu zero copy feature is really straight,
> which just maps block device io request's sgl pages to tcmu user space
> backstore, then we can avoid extra copy overhead between sgl pages and
> tcmu internal data area(which really impacts io throughput), please see
> https://www.spinics.net/lists/target-devel/msg21121.html for detailed
> info.
> 

Can you please tell us, how big the performance improvement is and
which configuration you are using for measurenments?

> Initially I use remap_pfn_range or vm_insert_pages to map sgl pages to
> user space, but both of them have limits:
> 1)  Use vm_insert_pages
> which is like tcp getsockopt(TCP_ZEROCOPY_RECEIVE), but there're two
> restrictions:
>    1. anonymous pages can not be mmaped to user spacea.
>      ==> vm_insert_pages
>      ====> insert_pages
>      ======> insert_page_in_batch_locked
>      ========> validate_page_before_insert
>      In validate_page_before_insert(), it shows that anonymous page can not
>      be mapped to use space, we know that if issuing direct io to block
>      device, io request's sgl pages mostly comes from anonymous page.
>          if (PageAnon(page) || PageSlab(page) || page_has_type(page))
>              return -EINVAL;
>      I'm not sure why there is such restriction? for safety reasons ?
> 
>    2. warn_on triggered in __folio_mark_dirty
>      When calling zap_page_range in tcmu user space backstore when io
>      completes, there is a warn_on triggered in __folio_mark_dirty:
>         if (folio->mapping) {   /* Race with truncate? */
>             WARN_ON_ONCE(warn && !folio_test_uptodate(folio));
> 
>      I'm not familiar with folio yet, but I think the reason is that when
>      issuing a buffered read to tcmu block device, it's page cache mapped
>      to user space, backstore write this page and pte will be dirtied. but
>      initially it's newly allocated, hence page_update flag not set.
>      In zap_pte_range(), there is such codes:
>         if (!PageAnon(page)) {
>             if (pte_dirty(ptent)) {
>                 force_flush = 1;
>                 set_page_dirty(page);
>             }
>     So this warn_on is reasonable.
>     Indeed what I want is just to map io request sgl pages to tcmu user
>     space backstore, then backstore can read or write data to mapped area,
>     I don't want to care about page or its mapping status, so I choose to
>     use remap_pfn_range.
> 
> 2) Use remap_pfn_range()
>    remap_pfn_range works well, but it has somewhat obvious overhead. For a
>    512kb io request, it has 128 pages, and usually this 128 page's pfn are
>    not consecutive, so in worst cases, for a 512kb io request, I'd need to
>    issue 128 calls to remap_pfn_range, it's horrible. And in remap_pfn_range,
>    if x86 page attribute table feature is enabled, lookup_memtype called by
>    track_pfn_remap() also introduces obvious overhead.
> 
> Finally in order to solve these problems, Xu Yu helps to implment a new
> helper, which accepts an array of pages as parameter, anonymous pages can
> be mapped to user space, pages would be treated as special pte(pte_special
> returns true), so vm_normal_page returns NULL, above folio warn_on won't
> trigger.
> 
> Thanks.
> 
> Xiaoguang Wang (2):
>    mm: export zap_page_range()
>    scsi: target: tcmu: Support zero copy
> 
> Xu Yu (1):
>    mm/memory.c: introduce vm_insert_page(s)_mkspecial
> 
>   drivers/target/target_core_user.c | 257 +++++++++++++++++++++++++++++++++-----
>   include/linux/mm.h                |   2 +
>   mm/memory.c                       | 183 +++++++++++++++++++++++++++
>   3 files changed, 414 insertions(+), 28 deletions(-)
> 


  parent reply	other threads:[~2022-03-22 12:40 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-18  9:55 [RFC 0/3] Add zero copy feature for tcmu Xiaoguang Wang
2022-03-18  9:55 ` [RFC 1/3] mm/memory.c: introduce vm_insert_page(s)_mkspecial Xiaoguang Wang
2022-03-23 16:45   ` Christoph Hellwig
2022-03-24  7:27     ` Xiaoguang Wang
2022-03-18  9:55 ` [RFC 2/3] mm: export zap_page_range() Xiaoguang Wang
2022-03-21 12:01   ` David Hildenbrand
2022-03-22 13:02     ` Xiaoguang Wang
2022-03-22 13:08       ` David Hildenbrand
2022-03-23 13:59         ` Xiaoguang Wang
2022-03-23 16:48         ` Christoph Hellwig
2022-03-23 16:47   ` Christoph Hellwig
2022-03-24  9:16   ` Ming Lei
2022-03-18  9:55 ` [RFC 3/3] scsi: target: tcmu: Support zero copy Xiaoguang Wang
2022-03-22 14:01   ` Bodo Stroesser
2022-03-23 14:33     ` Xiaoguang Wang
2022-03-25  9:06       ` Bodo Stroesser
2022-03-22 12:40 ` Bodo Stroesser [this message]
2022-03-22 13:17   ` [RFC 0/3] Add zero copy feature for tcmu Xiaoguang Wang
2022-03-22 14:05     ` Bodo Stroesser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abbe51c4-873f-e96e-d421-85906689a55a@gmail.com \
    --to=bostroesser@gmail.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=target-devel@vger.kernel.org \
    --cc=xiaoguang.wang@linux.alibaba.com \
    --cc=xuyu@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).