From: Suren Baghdasaryan <surenb@google.com>
To: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: linux-mm@kvack.org, Arjun Roy <arjunroy@google.com>,
Eric Dumazet <edumazet@google.com>,
linux-fsdevel@vger.kernel.org,
Punit Agrawal <punit.agrawal@bytedance.com>,
"David S . Miller" <davem@davemloft.net>
Subject: Re: [PATCH v2 9/9] tcp: Use per-vma locking for receive zerocopy
Date: Thu, 13 Jul 2023 20:40:46 -0700 [thread overview]
Message-ID: <CAJuCfpHDripryQ2T_gzwhZ0ogEgs6vdyrgOoj79ky8Uoyvg0vg@mail.gmail.com> (raw)
In-Reply-To: <20230711202047.3818697-10-willy@infradead.org>
On Tue, Jul 11, 2023 at 1:21 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> From: Arjun Roy <arjunroy@google.com>
>
> Per-VMA locking allows us to lock a struct vm_area_struct without
> taking the process-wide mmap lock in read mode.
>
> Consider a process workload where the mmap lock is taken constantly in
> write mode. In this scenario, all zerocopy receives are periodically
> blocked during that period of time - though in principle, the memory
> ranges being used by TCP are not touched by the operations that need
> the mmap write lock. This results in performance degradation.
>
> Now consider another workload where the mmap lock is never taken in
> write mode, but there are many TCP connections using receive zerocopy
> that are concurrently receiving. These connections all take the mmap
> lock in read mode, but this does induce a lot of contention and atomic
> ops for this process-wide lock. This results in additional CPU
> overhead caused by contending on the cache line for this lock.
>
> However, with per-vma locking, both of these problems can be avoided.
>
> As a test, I ran an RPC-style request/response workload with 4KB
> payloads and receive zerocopy enabled, with 100 simultaneous TCP
> connections. I measured perf cycles within the
> find_tcp_vma/mmap_read_lock/mmap_read_unlock codepath, with and
> without per-vma locking enabled.
>
> When using process-wide mmap semaphore read locking, about 1% of
> measured perf cycles were within this path. With per-VMA locking, this
> value dropped to about 0.45%.
>
> Signed-off-by: Arjun Roy <arjunroy@google.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Seems to match the original version with less churn.
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> ---
> net/ipv4/tcp.c | 39 ++++++++++++++++++++++++++++++++-------
> 1 file changed, 32 insertions(+), 7 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 1542de3f66f7..7118ec6cf886 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2038,6 +2038,30 @@ static void tcp_zc_finalize_rx_tstamp(struct sock *sk,
> }
> }
>
nit: Maybe add a comment that mmap_locked value is
undefined/meaningless if the function returns NULL?
> +static struct vm_area_struct *find_tcp_vma(struct mm_struct *mm,
> + unsigned long address, bool *mmap_locked)
> +{
> + struct vm_area_struct *vma = lock_vma_under_rcu(mm, address);
> +
> + if (vma) {
> + if (vma->vm_ops != &tcp_vm_ops) {
> + vma_end_read(vma);
> + return NULL;
> + }
> + *mmap_locked = false;
> + return vma;
> + }
> +
> + mmap_read_lock(mm);
> + vma = vma_lookup(mm, address);
> + if (!vma || vma->vm_ops != &tcp_vm_ops) {
> + mmap_read_unlock(mm);
> + return NULL;
> + }
> + *mmap_locked = true;
> + return vma;
> +}
> +
> #define TCP_ZEROCOPY_PAGE_BATCH_SIZE 32
> static int tcp_zerocopy_receive(struct sock *sk,
> struct tcp_zerocopy_receive *zc,
> @@ -2055,6 +2079,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
> u32 seq = tp->copied_seq;
> u32 total_bytes_to_map;
> int inq = tcp_inq(sk);
> + bool mmap_locked;
> int ret;
>
> zc->copybuf_len = 0;
> @@ -2079,13 +2104,10 @@ static int tcp_zerocopy_receive(struct sock *sk,
> return 0;
> }
>
> - mmap_read_lock(current->mm);
> -
> - vma = vma_lookup(current->mm, address);
> - if (!vma || vma->vm_ops != &tcp_vm_ops) {
> - mmap_read_unlock(current->mm);
> + vma = find_tcp_vma(current->mm, address, &mmap_locked);
> + if (!vma)
> return -EINVAL;
> - }
> +
> vma_len = min_t(unsigned long, zc->length, vma->vm_end - address);
> avail_len = min_t(u32, vma_len, inq);
> total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1);
> @@ -2159,7 +2181,10 @@ static int tcp_zerocopy_receive(struct sock *sk,
> zc, total_bytes_to_map);
> }
> out:
> - mmap_read_unlock(current->mm);
> + if (mmap_locked)
> + mmap_read_unlock(current->mm);
> + else
> + vma_end_read(vma);
> /* Try to copy straggler data. */
> if (!ret)
> copylen = tcp_zc_handle_leftover(zc, sk, skb, &seq, copybuf_len, tss);
> --
> 2.39.2
>
next prev parent reply other threads:[~2023-07-14 3:41 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-11 20:20 [PATCH v2 0/9] Avoid the mmap lock for fault-around Matthew Wilcox (Oracle)
2023-07-11 20:20 ` [PATCH v2 1/9] Revert "tcp: Use per-vma locking for receive zerocopy" Matthew Wilcox (Oracle)
2023-07-14 3:02 ` Suren Baghdasaryan
2023-07-14 3:34 ` Matthew Wilcox
2023-07-24 14:49 ` Jann Horn
2023-07-24 15:06 ` Matthew Wilcox
2023-07-24 21:42 ` Jakub Kicinski
2023-07-11 20:20 ` [PATCH v2 2/9] mm: Allow per-VMA locks on file-backed VMAs Matthew Wilcox (Oracle)
2023-07-14 3:03 ` Suren Baghdasaryan
2023-07-11 20:20 ` [PATCH v2 3/9] mm: Move FAULT_FLAG_VMA_LOCK check from handle_mm_fault() Matthew Wilcox (Oracle)
2023-07-14 3:04 ` Suren Baghdasaryan
2023-07-11 20:20 ` [PATCH v2 4/9] mm: Move FAULT_FLAG_VMA_LOCK check into handle_pte_fault() Matthew Wilcox (Oracle)
2023-07-14 3:17 ` Suren Baghdasaryan
2023-07-24 15:46 ` Jann Horn
2023-07-24 16:37 ` Matthew Wilcox
2023-07-11 20:20 ` [PATCH v2 5/9] mm: Move FAULT_FLAG_VMA_LOCK check down in handle_pte_fault() Matthew Wilcox (Oracle)
2023-07-14 3:26 ` Suren Baghdasaryan
2023-07-24 15:46 ` Jann Horn
2023-07-24 17:45 ` Matthew Wilcox
2023-07-11 20:20 ` [PATCH v2 6/9] mm: Move the FAULT_FLAG_VMA_LOCK check down from do_fault() Matthew Wilcox (Oracle)
2023-07-14 3:27 ` Suren Baghdasaryan
2023-07-11 20:20 ` [PATCH v2 7/9] mm: Run the fault-around code under the VMA lock Matthew Wilcox (Oracle)
2023-07-14 3:32 ` Suren Baghdasaryan
2023-07-24 17:38 ` Matthew Wilcox
2023-07-11 20:20 ` [PATCH v2 8/9] mm: Remove CONFIG_PER_VMA_LOCK ifdefs Matthew Wilcox (Oracle)
2023-07-14 3:34 ` Suren Baghdasaryan
2023-07-11 20:20 ` [PATCH v2 9/9] tcp: Use per-vma locking for receive zerocopy Matthew Wilcox (Oracle)
2023-07-14 3:40 ` Suren Baghdasaryan [this message]
2023-07-21 18:48 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJuCfpHDripryQ2T_gzwhZ0ogEgs6vdyrgOoj79ky8Uoyvg0vg@mail.gmail.com \
--to=surenb@google.com \
--cc=arjunroy@google.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=punit.agrawal@bytedance.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).