From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CD88C433EF for ; Tue, 22 Mar 2022 14:05:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CB5CA6B0072; Tue, 22 Mar 2022 10:05:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C65F56B0074; Tue, 22 Mar 2022 10:05:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B06976B0075; Tue, 22 Mar 2022 10:05:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0094.hostedemail.com [216.40.44.94]) by kanga.kvack.org (Postfix) with ESMTP id 9A26A6B0072 for ; Tue, 22 Mar 2022 10:05:41 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 4D479A32AA for ; Tue, 22 Mar 2022 14:05:41 +0000 (UTC) X-FDA: 79272195282.27.555CB82 Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf14.hostedemail.com (Postfix) with ESMTP id 7B5E410002D for ; Tue, 22 Mar 2022 14:05:40 +0000 (UTC) Received: by mail-ed1-f53.google.com with SMTP id a17so20747324edm.9 for ; Tue, 22 Mar 2022 07:05:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=HMkNdXcc2Lb8L9s/UykqzV2eTirXFa1SSmENhZLljCs=; b=JQq/iovDe+OwIZAr+Ds/Ee/ejbRwf3QTpE77Yh3oZlfqZ9KZUrO+Z+j00ElQP2Uqsh PNH4SnV11IoAD+rNVx17TT0AO3SRsCp9MMqFeeuS/OMHHs3x7LC/5ldxbpDdOgYvq8ch 0FMmMwNIU0MCLqUmZvuPHznb+Dq7Z5Ks1MiTEYapbaiGyXqW/nNEJSEO1idS9rVw299u HWjZY1waHbuvz6slibYyGwsLePemCz+t1HMGBWJGbloG3KV/NGoo4NQZcDB0zMb5RaTL TEOzkiMIuT4QHrzEfmOk4WoPe5pM/CygF+IPE0V0OgqzKl3wVwvgwtrERF5cIDxKGAP+ 1d1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=HMkNdXcc2Lb8L9s/UykqzV2eTirXFa1SSmENhZLljCs=; b=hJXc/HvH8vw1M3ritmheqO18K/UVzuavmznOJBd7NoHxjMwjfq/6rhEue24v4mBNSP gl3diYSFel5BPkI3wILhwTofP/hDcIGXlPS69ka92rZwCbH/cs5AdhqxmvxZf/ByqvRN UBhGjpgR9NMAGuvphTCZcx9uMJlEBIPu4fX23oVLW2LkLHGyC3aFM8+q3f1G2vrQ96Bz pytpyJdjcWZCJmJhGncdZObYYdxbM/bX4T4xj2eB8NEmNt5hua2vO37zrsjiYId41pCg +Q/liErFonOiutFtZpNi5LM23aQqRyqwNDDo9u8QVwBtamH0uIkn0SSTx1tCYr3uxmYF xR2g== X-Gm-Message-State: AOAM533N21gxeUV/ucjlOpSMk3ZarGbnBsmV78zn9TPNMRmJNuPLkBE/ C3hbAN5O4NMG5VveNSEgGC8= X-Google-Smtp-Source: ABdhPJz4ZImkLvEfmzp0CwI9TtxvSqsbbfBRxp4A3H3cjClX5wDINCyjuZxD+S7aWsUFsNqxkv6qOw== X-Received: by 2002:aa7:cb0f:0:b0:416:201f:c64d with SMTP id s15-20020aa7cb0f000000b00416201fc64dmr28451611edt.48.1647957939430; Tue, 22 Mar 2022 07:05:39 -0700 (PDT) Received: from [192.168.178.40] (ipbcc1cfad.dynamic.kabel-deutschland.de. [188.193.207.173]) by smtp.gmail.com with ESMTPSA id sa13-20020a1709076d0d00b006ce3ef8e1d4sm8124886ejc.31.2022.03.22.07.05.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 22 Mar 2022 07:05:39 -0700 (PDT) Message-ID: Date: Tue, 22 Mar 2022 15:05:38 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Subject: Re: [RFC 0/3] Add zero copy feature for tcmu Content-Language: en-US To: Xiaoguang Wang , linux-mm@kvack.org, target-devel@vger.kernel.org, linux-scsi@vger.kernel.org Cc: linux-block@vger.kernel.org, xuyu@linux.alibaba.com References: <20220318095531.15479-1-xiaoguang.wang@linux.alibaba.com> <36b5a8e5-c8e9-6a1f-834c-6bf9bf920f4c@linux.alibaba.com> From: Bodo Stroesser In-Reply-To: <36b5a8e5-c8e9-6a1f-834c-6bf9bf920f4c@linux.alibaba.com> Content-Type: text/plain; charset=UTF-8; format=flowed Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="JQq/iovD"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of bostroesser@gmail.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=bostroesser@gmail.com X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 7B5E410002D X-Stat-Signature: 6npzs4dg3ewife1nugkk3wqk94cauwoa X-HE-Tag: 1647957940-280723 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 22.03.22 14:17, Xiaoguang Wang wrote: > hi, >=20 >> On 18.03.22 10:55, Xiaoguang Wang wrote: >>> The core idea to implement tcmu zero copy feature is really straight, >>> which just maps block device io request's sgl pages to tcmu user spac= e >>> backstore, then we can avoid extra copy overhead between sgl pages an= d >>> tcmu internal data area(which really impacts io throughput), please s= ee >>> https://www.spinics.net/lists/target-devel/msg21121.html for detailed >>> info. >>> >> >> Can you please tell us, how big the performance improvement is and >> which configuration you are using for measurenments? > Sorry, I should have attached test results here. Initially I tried to u= se > tcmu user:fbo backstore to evaluate performance improvements, but > it only shows about 10%~15% io throughput improvement. Fio config > is numjobs=3D1, iodepth=3D8, bs=3D256k, which isn't very impressive. Th= e > reason is that user:fbo backstore does buffered reads, it consumes most > of cpu. >=20 > Then I test this zero copy feature for our real workload, whose backsto= re > is a network program visiting distributed file system and it's=20 > multi-threaded. > For 4 job, 8 depth, 256 kb io size, the write throughput improves from > 3.6GB/s to 10GB/s. Thank you for the info. Sounds promising. What fabric are you using? iSCSI? What HW is your target running on? >=20 > Regards, > Xiaoguang Wang >=20 >> >>> Initially I use remap_pfn_range or vm_insert_pages to map sgl pages t= o >>> user space, but both of them have limits: >>> 1)=C2=A0 Use vm_insert_pages >>> which is like tcp getsockopt(TCP_ZEROCOPY_RECEIVE), but there're two >>> restrictions: >>> =C2=A0=C2=A0 1. anonymous pages can not be mmaped to user spacea. >>> =C2=A0=C2=A0=C2=A0=C2=A0 =3D=3D> vm_insert_pages >>> =C2=A0=C2=A0=C2=A0=C2=A0 =3D=3D=3D=3D> insert_pages >>> =C2=A0=C2=A0=C2=A0=C2=A0 =3D=3D=3D=3D=3D=3D> insert_page_in_batch_loc= ked >>> =C2=A0=C2=A0=C2=A0=C2=A0 =3D=3D=3D=3D=3D=3D=3D=3D> validate_page_befo= re_insert >>> =C2=A0=C2=A0=C2=A0=C2=A0 In validate_page_before_insert(), it shows t= hat anonymous page=20 >>> can not >>> =C2=A0=C2=A0=C2=A0=C2=A0 be mapped to use space, we know that if issu= ing direct io to block >>> =C2=A0=C2=A0=C2=A0=C2=A0 device, io request's sgl pages mostly comes = from anonymous page. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (PageAnon(page) |= | PageSlab(page) || page_has_type(page)) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 return -EINVAL; >>> =C2=A0=C2=A0=C2=A0=C2=A0 I'm not sure why there is such restriction? = for safety reasons ? >>> >>> =C2=A0=C2=A0 2. warn_on triggered in __folio_mark_dirty >>> =C2=A0=C2=A0=C2=A0=C2=A0 When calling zap_page_range in tcmu user spa= ce backstore when io >>> =C2=A0=C2=A0=C2=A0=C2=A0 completes, there is a warn_on triggered in _= _folio_mark_dirty: >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (folio->mapping) {=C2=A0= =C2=A0 /* Race with truncate? */ >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 WA= RN_ON_ONCE(warn && !folio_test_uptodate(folio)); >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0 I'm not familiar with folio yet, but I think= the reason is that=20 >>> when >>> =C2=A0=C2=A0=C2=A0=C2=A0 issuing a buffered read to tcmu block device= , it's page cache=20 >>> mapped >>> =C2=A0=C2=A0=C2=A0=C2=A0 to user space, backstore write this page and= pte will be=20 >>> dirtied. but >>> =C2=A0=C2=A0=C2=A0=C2=A0 initially it's newly allocated, hence page_u= pdate flag not set. >>> =C2=A0=C2=A0=C2=A0=C2=A0 In zap_pte_range(), there is such codes: >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!PageAnon(page)) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if= (pte_dirty(ptent)) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 force_flush =3D 1; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 set_page_dirty(page); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>> =C2=A0=C2=A0=C2=A0 So this warn_on is reasonable. >>> =C2=A0=C2=A0=C2=A0 Indeed what I want is just to map io request sgl p= ages to tcmu user >>> =C2=A0=C2=A0=C2=A0 space backstore, then backstore can read or write = data to mapped=20 >>> area, >>> =C2=A0=C2=A0=C2=A0 I don't want to care about page or its mapping sta= tus, so I=20 >>> choose to >>> =C2=A0=C2=A0=C2=A0 use remap_pfn_range. >>> >>> 2) Use remap_pfn_range() >>> =C2=A0=C2=A0 remap_pfn_range works well, but it has somewhat obvious = overhead.=20 >>> For a >>> =C2=A0=C2=A0 512kb io request, it has 128 pages, and usually this 128= page's=20 >>> pfn are >>> =C2=A0=C2=A0 not consecutive, so in worst cases, for a 512kb io reque= st, I'd=20 >>> need to >>> =C2=A0=C2=A0 issue 128 calls to remap_pfn_range, it's horrible. And i= n=20 >>> remap_pfn_range, >>> =C2=A0=C2=A0 if x86 page attribute table feature is enabled, lookup_m= emtype=20 >>> called by >>> =C2=A0=C2=A0 track_pfn_remap() also introduces obvious overhead. >>> >>> Finally in order to solve these problems, Xu Yu helps to implment a n= ew >>> helper, which accepts an array of pages as parameter, anonymous pages= =20 >>> can >>> be mapped to user space, pages would be treated as special=20 >>> pte(pte_special >>> returns true), so vm_normal_page returns NULL, above folio warn_on wo= n't >>> trigger. >>> >>> Thanks. >>> >>> Xiaoguang Wang (2): >>> =C2=A0=C2=A0 mm: export zap_page_range() >>> =C2=A0=C2=A0 scsi: target: tcmu: Support zero copy >>> >>> Xu Yu (1): >>> =C2=A0=C2=A0 mm/memory.c: introduce vm_insert_page(s)_mkspecial >>> >>> =C2=A0 drivers/target/target_core_user.c | 257=20 >>> +++++++++++++++++++++++++++++++++----- >>> =C2=A0 include/linux/mm.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0 2 + >>> =C2=A0 mm/memory.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 | 183 +++++++++++++++++++++++++++ >>> =C2=A0 3 files changed, 414 insertions(+), 28 deletions(-) >>> >=20