From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73728EB64DA for ; Fri, 23 Jun 2023 02:11:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 06D4A8D0002; Thu, 22 Jun 2023 22:11:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 01D5C8D0001; Thu, 22 Jun 2023 22:11:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E26D28D0002; Thu, 22 Jun 2023 22:11:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D21E68D0001 for ; Thu, 22 Jun 2023 22:11:39 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A5AB4802D6 for ; Fri, 23 Jun 2023 02:11:39 +0000 (UTC) X-FDA: 80932386318.10.71D1F0D Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf23.hostedemail.com (Postfix) with ESMTP id DD001140018 for ; Fri, 23 Jun 2023 02:11:37 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=W0+Na6VI; spf=pass (imf23.hostedemail.com: domain of kuba@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=kuba@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687486298; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=161wGHuT1tRbjii1Tjx1HU9+8zKxFGVeZ8ii2kixp8Y=; b=LBbSzFy5MkPh8XEcbAbbZ3R7d24XNBoqVKT9HcD9l8OUbp/9vYq/85mTbWozjJKOdZt9EO 5+0UV9YtqDH5CCgJ9sGPRBq0wsB+BShQc9Wjpg9BKMwuaTHRnjutjzeB+7CfvwRoYnpl7T TOlQPRBa2ArlbBU87OsCg49pjTIeR3c= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687486298; a=rsa-sha256; cv=none; b=EGrZWcvuxdIFksVmar5CJw4dWKddOQrvvjg7bAkLfe4RLKUnDgmHmXqztIWqXSFJCjuk92 TDaC8Hin/OEAQWpezEstAhHR41k9effjm5uN08fa7MrZv/pyMon20PBw2BYMCPTEi36flX ZiJ4feMfkm9zSexbPOGM5ZbqjQLZP3U= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=W0+Na6VI; spf=pass (imf23.hostedemail.com: domain of kuba@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=kuba@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id B91F16192D; Fri, 23 Jun 2023 02:11:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AC2FCC433C9; Fri, 23 Jun 2023 02:11:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1687486296; bh=ZGne4zj9CsGJycq2UzIiziKr2jVISNHV9G2s8oY3T0w=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=W0+Na6VIE5e9qEtp3Iw5o2+JnUz7ML9MeTL5L5dv4nHrgCds/5mbiypeU8fINlxbS 4x3mhFwo9JrNJmY97S4wC1Ni+2JoFU7QhQ/jmawOu9qV9IATatO3KF4781zWqCS/vb b9cSd5Pr+VBSFGtiNAAxga7SEIrXhJFnGuj0q7i/1JrWVp+nPyDvCpEAOM2dq6j4ZN m7VYqX9Dnp+tBjURpxNrHOimQhwCGMpF0zTOdWdKG2TCPLrUtmx8YksTo+3Mp/cDQ4 hNaMCs2rr6p/b2WhyzsGuzctQXD2mM6WTQzyfXWRZ3YEsrX2Q5iPu7UWvz1vZIb4ob tjLGVCUVbt4Jw== Date: Thu, 22 Jun 2023 19:11:34 -0700 From: Jakub Kicinski To: David Howells Cc: Eric Dumazet , netdev@vger.kernel.org, Alexander Duyck , "David S. Miller" , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Menglong Dong Subject: Re: [PATCH net-next v3 01/18] net: Copy slab data for sendmsg(MSG_SPLICE_PAGES) Message-ID: <20230622191134.54d5cb0b@kernel.org> In-Reply-To: <1958077.1687474471@warthog.procyon.org.uk> References: <20230622132835.3c4e38ea@kernel.org> <20230622111234.23aadd87@kernel.org> <20230620145338.1300897-1-dhowells@redhat.com> <20230620145338.1300897-2-dhowells@redhat.com> <1952674.1687462843@warthog.procyon.org.uk> <1958077.1687474471@warthog.procyon.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: DD001140018 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: g9brkgsq8rpfh7xebtjg9t8498shg5dq X-HE-Tag: 1687486297-399136 X-HE-Meta: U2FsdGVkX183PlqUH9ySFZmbaGUc5hTRlR/E/CLJKa/A3yFLYeJ+OZrULWavYoFHbb0SjUf/SMOpzup3Li+vNX0LcP08KJMD24nLwzHckgrqSk7odMWRB6XzwX3iwsL4Hm/yCG9c3v0v8JVDNMQq/j7rDkWKPriDstyxxN7QghloS8OHiin20CTrFMZBIAnl74Etm0f4SvOOPPvGwerKwjtpWimFDb/P3wBRh02FMlwlDe80MpJnU/+Unlf/GsqeFXISrEzFpvON2LusWv5w7CF2TN5L333MDRe00X9l+0pMryDDuOKZWMCkS9EpGOpYgWcOeR+Hu9fev/75/i3gkBlzcS9Hm0RKKgDoh1ihoApxyLX2rdP0hfxfVZs1IbCKSIa2elup3izvfCG3QRWiXXCFgxOqYaDbyWBCI/eUgELb4vrdSt9EJPutIZEf+8XvroqERgKNYjPnLpBbdQqb5lkOxXWosp6RQRo/Ngd4xxxEbPlTdQQ9Op8rmCcdDJUaNhb1PMKdrZXqQwLQ8w0VXgyNXcyHVpqLpyCMD4/ZMSnmgZ6VD03pUp6qFro/G9xM7cK60xMbWPOVDqpYhEvwbOzMWcC1B4/2MsClPvwhIGwIaMDe3ei1sgI7YMmLseuZCqFk37rg0pm8aiH5xOMIKpEPRYBrYhPVKeiVDNqPIJECVGZifpDnWxOs8xBLPdMwrRCgyREOX7zeZbV/+VWGdofyjxibQ5Bb8FDY5Fx1XZRbOGlwwkQ5QM5O5aR5nuho2mM5bS+7VwSI5LPleoWQorOuwzUl/50I81ISeVYvpuoGz7E7Fyio2CqCGzjOD6uec5GYZ8OKeo6WU9r7oVVgX7ja7L6CoqzjHA+mjZmXBmP3ugwG3AxGbJQCdmFSfhRC/mYtLHAjjq3jfUcLUok80BRNmkc+mQCDtri+X5UUxqXM4OMkmig+Tchf7+WQDdq3NR50eRxlkt6UohkNWY9 hIuENs3X 4lKo2JJFzncCLa34AA2G7T0wDrAW3kO+llEqC3JE1atkP3KQ7jGAwKbIrtLD5wg6W9V81WK/0SPkp1lkMAAUY2QTANiYWfRe7UclnqRPUCLz5OQElQt1Fkm7gUqBonweAJ2nYU/69QwV1y7+tOQDYMSU10dc5A9MDNygbkUT48uJvKVOiAzbVL3NSjFqznMkuRFXOOjzmMVcsWzAukirTVeFjwnKFb2iAmCI7HJCKvDcHIAMFmFY9ezmB3y4Fmhy4IIRxN9EkRjIXHFu6eepbMRGtGTOx5PHzG+0el0/cOPTfHbV7daiJUkERSm/EhAIvLgv3WdUh9Dz4OU4/xuQue5IGc8q2zgWAF2bmZW4pv8YJLUSDV+Ddq+4GdlnBTTpY1X4AkO6UFQEJcMqsvxEYyDmPpTKSHP3siD0c X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 22 Jun 2023 23:54:31 +0100 David Howells wrote: > > Maybe it's just me but I'd prefer to keep the clear rule that splice > > operates on pages not slab objects. > > sendpage isn't only being used for splice(). Or were you referring to > splicing pages into socket buffers more generally? Yes, sorry, any sort of "zero-copy attachment of data onto a socket queue". > > SIW is the software / fake implementation of RDMA, right? You couldn't have > > picked a less important user :( > > ISCSI and sunrpc could both make use of this, as could ceph and others. I > have patches for sunrpc to make it condense into a single bio_vec[] and > sendmsg() in the server code (ie. nfsd) but for the moment, Chuck wanted me to > just do the xdr payload. But to be clear (and I'm not implying that it's not a strong enough reason) - the only benefit from letting someone pass headers in a slab object is that the code already uses kmalloc(), right? IOW it could be changed to use frags without much of a LoC bloat? > > Maybe we can get Eric to comment. The ability to identify "frag type" > > seems cool indeed, but I haven't thought about using it to attach > > slab objects. > > Unfortunately, you can't attach slab objects. Their lifetime isn't controlled > by put_page() or folio_put(). kmalloc()/kfree() doesn't refcount them - > they're recycled immediately. Hence why I was copying them. (Well, you > could attach, but then you need a callback mechanism). Right, right, I thought you were saying that _in the future_ we may try to attach the slab objects as frags (and presumably copy when someone tries to ref them). Maybe I over-interpreted. > What I'm trying to do is make it so that the process of calling sock_sendmsg() > with MSG_SPLICE_PAGES looks exactly the same as without: You fill in a > bio_vec[] pointing to your protocol header, the payload and the trailer, > pointing as appropriate to bits of slab, static, stack data or ref'able pages, > and call sendmsg and then the data will get copied or spliced as appropriate > to the page type, whether the MSG_SPLICE_PAGES flag is supplied and whether > the flag is supported. > > There are a couple of things I'd like to avoid: (1) having to call > sock_sendmsg() more than once per message and (2) having sendmsg allocate more > space and make a copy of data that you had to copy into a frag before calling > sendmsg. If we're not planning to attach the slab objects as frags, then surely doing kmalloc() + free() in the caller, and then allocating a frag and copying the data over in the skb / socket code is also inefficient. Fixing the caller gives all the benefits you want, and then some. Granted some form of alloc_skb_frag() needs to be added so that callers don't curse us, I'd start with something based on sk_page_frag(). Or we could pull the coping out into an intermediate helper which first replaces all slab objects in the iovec with page frags and then calls sock_sendmsg()? Maybe that's stupid... Let's hear what others think. If we can't reach instant agreement -- can you strategically separate out the minimal set of changes required to just kill MSG_SENDPAGE_NOTLAST. IMHO it's worth getting that into 6.5.