From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3A42EB64DD for ; Fri, 21 Jul 2023 18:48:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C7A88D0002; Fri, 21 Jul 2023 14:48:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 078678D0001; Fri, 21 Jul 2023 14:48:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E81A28D0002; Fri, 21 Jul 2023 14:48:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D916A8D0001 for ; Fri, 21 Jul 2023 14:48:11 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A21C31A03CC for ; Fri, 21 Jul 2023 18:48:11 +0000 (UTC) X-FDA: 81036503982.08.5F5020B Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf08.hostedemail.com (Postfix) with ESMTP id 817D516000E for ; Fri, 21 Jul 2023 18:48:08 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=ofGXXsb1; dmarc=none; spf=none (imf08.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689965289; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=a4HzuP9b0pIZ/Y8IfgWdavC7qv9/cPHj5HMecrc3lf0=; b=UXzracR8Vvjr/SHOEu9mwRTZLlE5Ix1gaFf50odINBILUCL/WAyJb/Ir4E+l4nPsvC+Gz5 XDLaCBwWDXjwUzHtyh1vYjfeJ/fJqSY6SYDgWAL8y1t1EmoJmlxiWdN2BCKpbDVG/8Tp1I 2mcYJkvdMeo8U6bDrFmiJDaufwBGjbg= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=ofGXXsb1; dmarc=none; spf=none (imf08.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689965289; a=rsa-sha256; cv=none; b=n2LG5Q5D2baQUjfhl1yghIZe9iFr7Jn7Cf2FAzxUpGsvg/ctkf7MonzaKl+2/SlD+pTMyx w8svx6BE0+hd348mLuyx6TfKgVAoLAzl0W1iXcAGRK/dWb7O/5H6TJr1VcIx/f+vpZRseW LwiAEkRaz81uduNF5uy1DftyrXEBMT0= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=a4HzuP9b0pIZ/Y8IfgWdavC7qv9/cPHj5HMecrc3lf0=; b=ofGXXsb18eAgrHH82mCo0t2TnH mok84wTg2S3rFWDibuvUDyQV0U/0aYS+WQcLc2ASjFZJN832MEU++a7eE6ck/ibkdBJo3bgybfK5E gb7nII6KH6RN3fFeJG1OA0vwyG+d5VLgiQQ5FS6UYL9oB2/HhuKx5rmeHPJKw2lnz7Ar3qQ5dSjNb 3oRUoZj9lD7oeI0P5vNrkndor28hWorN8hcXWnEpHymODmloKeFm2dPigU2cRaJox8D9hJJoPW/sj NXVqJrDIkU19zocpf4ibRnl7VIUY01hAEgaaMC5hNUJua+uZv0GebOoPh2JYFrHbGRn0XTDwMTBj0 7QVsnk6A==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1qMvAZ-001N43-LZ; Fri, 21 Jul 2023 18:48:03 +0000 Date: Fri, 21 Jul 2023 19:48:03 +0100 From: Matthew Wilcox To: Arjun Roy , Eric Dumazet Cc: linux-mm@kvack.org, Suren Baghdasaryan , linux-fsdevel@vger.kernel.org, Punit Agrawal , "David S . Miller" Subject: Re: [PATCH v2 9/9] tcp: Use per-vma locking for receive zerocopy Message-ID: References: <20230711202047.3818697-1-willy@infradead.org> <20230711202047.3818697-10-willy@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230711202047.3818697-10-willy@infradead.org> X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 817D516000E X-Stat-Signature: zrn8gu9g7t1o5m8autczrkgo815dex58 X-HE-Tag: 1689965288-10030 X-HE-Meta: U2FsdGVkX1/tr7gIsF0OGHCaXL2a3oS8GDKwDILg0xbLB1joNUIsagR++zemXx7Tugx69KMxXJxT0KwcF0ZB0u3IH78jkQPhVQNaI71lS0cupZIrvWo1IAFOzMF1yaKvW9elnb8ky4HXvkF/KOot8yJGFcI+spWRFE8SdxQj5y4ClEbgH4Rs+A9ksepROdUXTfLSdMzOSt/fx57rwIByLa+NBnG9UiG8/EdCl9Va96wpldsE+a3M9Z1MQUBnm9ckcYQbF6vGgAMr8MuKxZoeypjhqZOuzYASxILQX2nrnvqaQiAKMKRu3f5ikp8hlgLI+IFDZbGoW0ga0/CQX91xm44LZyktPommOAZFyuVJZjChBYIjY9G5CRzX7mCE0awdPkjSv33ovyhwvwhKn5KpXy3TG7QikCNc9ZzWw8wQrs45ylWWxcCLaiFWcl+Sp9XahVggBW58SOoDXY7nriCXVNpCSLfJWPvP7wpoJRXhuQv5YNA0r0g0ctdIav0UC5+f9saSUP9SwadB13AMVjm8nfZhzw7aDeL61LFD+mVziYygBSfRrNyZPaP+PMxIu1YLOgAvgjFs81o0cZlktT94KHPOVYBQtCE08gWRvPy9d0xR5YAGs4PfXW9SdaW/twHJgmj15D12d28mzJlcTsMTo7i+SdlwcfzjcCliV0qMJuuhd91lRyzhH6ogpwiU6F0RmbC0IPe0MTqIZdt1fNo7yCDYa2zLvWPacFyjIpnVXIBOKHjzw9YSavBHNYVZ3gochyKExWhuKA79k+PgR5IQtS7HPxyYPjiCrE9gxDiEu8rXn4HIUMaxDVTOsbbBOdDN0e1klk9aE63x35UNI8/3XAWpwSE71AVcQoUCXr+d2rVmafy4uKkamjDWHNgPciqTjDO+qUOd0paaZMc180m6XWOUokFu3Qjwb3nmDRmVia2ZkMohy95TWwsWFqsE11ajOULMNI54gwmYmDMvZ95 UBgjUz0D kIniQF1yySnAZcJsZGchAFUuE4z3RbxAlIelQ+Z6kggNmz4znWPPID2VSJrxWSrbUftqMUBFvXAcxdhA1NubKOc1/V+zvCK6pgTYVXyNsYz6E1vMdadXEwePOrEc/LN8obJ1/FERD/DuCPBlgiD5WpInQ2q16t7/gOOn/Bmzier26K4K0a4idR8HuM6foOjOmIIw7ZFJjbIBgpRTzvGvn6rbykGd7+nqdjerf7vomBclBN9VQtK7MWgYl90G1Qa4+b1tRNwSfXg3l4lkeOKs3/liXqg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Eric? Arjun? Any comments? On Tue, Jul 11, 2023 at 09:20:47PM +0100, Matthew Wilcox (Oracle) wrote: > From: Arjun Roy > > Per-VMA locking allows us to lock a struct vm_area_struct without > taking the process-wide mmap lock in read mode. > > Consider a process workload where the mmap lock is taken constantly in > write mode. In this scenario, all zerocopy receives are periodically > blocked during that period of time - though in principle, the memory > ranges being used by TCP are not touched by the operations that need > the mmap write lock. This results in performance degradation. > > Now consider another workload where the mmap lock is never taken in > write mode, but there are many TCP connections using receive zerocopy > that are concurrently receiving. These connections all take the mmap > lock in read mode, but this does induce a lot of contention and atomic > ops for this process-wide lock. This results in additional CPU > overhead caused by contending on the cache line for this lock. > > However, with per-vma locking, both of these problems can be avoided. > > As a test, I ran an RPC-style request/response workload with 4KB > payloads and receive zerocopy enabled, with 100 simultaneous TCP > connections. I measured perf cycles within the > find_tcp_vma/mmap_read_lock/mmap_read_unlock codepath, with and > without per-vma locking enabled. > > When using process-wide mmap semaphore read locking, about 1% of > measured perf cycles were within this path. With per-VMA locking, this > value dropped to about 0.45%. > > Signed-off-by: Arjun Roy > Reviewed-by: Eric Dumazet > Signed-off-by: David S. Miller > Signed-off-by: Matthew Wilcox (Oracle) > --- > net/ipv4/tcp.c | 39 ++++++++++++++++++++++++++++++++------- > 1 file changed, 32 insertions(+), 7 deletions(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 1542de3f66f7..7118ec6cf886 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -2038,6 +2038,30 @@ static void tcp_zc_finalize_rx_tstamp(struct sock *sk, > } > } > > +static struct vm_area_struct *find_tcp_vma(struct mm_struct *mm, > + unsigned long address, bool *mmap_locked) > +{ > + struct vm_area_struct *vma = lock_vma_under_rcu(mm, address); > + > + if (vma) { > + if (vma->vm_ops != &tcp_vm_ops) { > + vma_end_read(vma); > + return NULL; > + } > + *mmap_locked = false; > + return vma; > + } > + > + mmap_read_lock(mm); > + vma = vma_lookup(mm, address); > + if (!vma || vma->vm_ops != &tcp_vm_ops) { > + mmap_read_unlock(mm); > + return NULL; > + } > + *mmap_locked = true; > + return vma; > +} > + > #define TCP_ZEROCOPY_PAGE_BATCH_SIZE 32 > static int tcp_zerocopy_receive(struct sock *sk, > struct tcp_zerocopy_receive *zc, > @@ -2055,6 +2079,7 @@ static int tcp_zerocopy_receive(struct sock *sk, > u32 seq = tp->copied_seq; > u32 total_bytes_to_map; > int inq = tcp_inq(sk); > + bool mmap_locked; > int ret; > > zc->copybuf_len = 0; > @@ -2079,13 +2104,10 @@ static int tcp_zerocopy_receive(struct sock *sk, > return 0; > } > > - mmap_read_lock(current->mm); > - > - vma = vma_lookup(current->mm, address); > - if (!vma || vma->vm_ops != &tcp_vm_ops) { > - mmap_read_unlock(current->mm); > + vma = find_tcp_vma(current->mm, address, &mmap_locked); > + if (!vma) > return -EINVAL; > - } > + > vma_len = min_t(unsigned long, zc->length, vma->vm_end - address); > avail_len = min_t(u32, vma_len, inq); > total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1); > @@ -2159,7 +2181,10 @@ static int tcp_zerocopy_receive(struct sock *sk, > zc, total_bytes_to_map); > } > out: > - mmap_read_unlock(current->mm); > + if (mmap_locked) > + mmap_read_unlock(current->mm); > + else > + vma_end_read(vma); > /* Try to copy straggler data. */ > if (!ret) > copylen = tcp_zc_handle_leftover(zc, sk, skb, &seq, copybuf_len, tss); > -- > 2.39.2 >