From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
To: linux-mm@kvack.org
Cc: Arjun Roy <arjunroy@google.com>,
Eric Dumazet <edumazet@google.com>,
Suren Baghdasaryan <surenb@google.com>,
linux-fsdevel@vger.kernel.org,
Punit Agrawal <punit.agrawal@bytedance.com>,
"David S . Miller" <davem@davemloft.net>,
Matthew Wilcox <willy@infradead.org>
Subject: [PATCH v2 9/9] tcp: Use per-vma locking for receive zerocopy
Date: Tue, 11 Jul 2023 21:20:47 +0100 [thread overview]
Message-ID: <20230711202047.3818697-10-willy@infradead.org> (raw)
In-Reply-To: <20230711202047.3818697-1-willy@infradead.org>
From: Arjun Roy <arjunroy@google.com>
Per-VMA locking allows us to lock a struct vm_area_struct without
taking the process-wide mmap lock in read mode.
Consider a process workload where the mmap lock is taken constantly in
write mode. In this scenario, all zerocopy receives are periodically
blocked during that period of time - though in principle, the memory
ranges being used by TCP are not touched by the operations that need
the mmap write lock. This results in performance degradation.
Now consider another workload where the mmap lock is never taken in
write mode, but there are many TCP connections using receive zerocopy
that are concurrently receiving. These connections all take the mmap
lock in read mode, but this does induce a lot of contention and atomic
ops for this process-wide lock. This results in additional CPU
overhead caused by contending on the cache line for this lock.
However, with per-vma locking, both of these problems can be avoided.
As a test, I ran an RPC-style request/response workload with 4KB
payloads and receive zerocopy enabled, with 100 simultaneous TCP
connections. I measured perf cycles within the
find_tcp_vma/mmap_read_lock/mmap_read_unlock codepath, with and
without per-vma locking enabled.
When using process-wide mmap semaphore read locking, about 1% of
measured perf cycles were within this path. With per-VMA locking, this
value dropped to about 0.45%.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
net/ipv4/tcp.c | 39 ++++++++++++++++++++++++++++++++-------
1 file changed, 32 insertions(+), 7 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1542de3f66f7..7118ec6cf886 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2038,6 +2038,30 @@ static void tcp_zc_finalize_rx_tstamp(struct sock *sk,
}
}
+static struct vm_area_struct *find_tcp_vma(struct mm_struct *mm,
+ unsigned long address, bool *mmap_locked)
+{
+ struct vm_area_struct *vma = lock_vma_under_rcu(mm, address);
+
+ if (vma) {
+ if (vma->vm_ops != &tcp_vm_ops) {
+ vma_end_read(vma);
+ return NULL;
+ }
+ *mmap_locked = false;
+ return vma;
+ }
+
+ mmap_read_lock(mm);
+ vma = vma_lookup(mm, address);
+ if (!vma || vma->vm_ops != &tcp_vm_ops) {
+ mmap_read_unlock(mm);
+ return NULL;
+ }
+ *mmap_locked = true;
+ return vma;
+}
+
#define TCP_ZEROCOPY_PAGE_BATCH_SIZE 32
static int tcp_zerocopy_receive(struct sock *sk,
struct tcp_zerocopy_receive *zc,
@@ -2055,6 +2079,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
u32 seq = tp->copied_seq;
u32 total_bytes_to_map;
int inq = tcp_inq(sk);
+ bool mmap_locked;
int ret;
zc->copybuf_len = 0;
@@ -2079,13 +2104,10 @@ static int tcp_zerocopy_receive(struct sock *sk,
return 0;
}
- mmap_read_lock(current->mm);
-
- vma = vma_lookup(current->mm, address);
- if (!vma || vma->vm_ops != &tcp_vm_ops) {
- mmap_read_unlock(current->mm);
+ vma = find_tcp_vma(current->mm, address, &mmap_locked);
+ if (!vma)
return -EINVAL;
- }
+
vma_len = min_t(unsigned long, zc->length, vma->vm_end - address);
avail_len = min_t(u32, vma_len, inq);
total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1);
@@ -2159,7 +2181,10 @@ static int tcp_zerocopy_receive(struct sock *sk,
zc, total_bytes_to_map);
}
out:
- mmap_read_unlock(current->mm);
+ if (mmap_locked)
+ mmap_read_unlock(current->mm);
+ else
+ vma_end_read(vma);
/* Try to copy straggler data. */
if (!ret)
copylen = tcp_zc_handle_leftover(zc, sk, skb, &seq, copybuf_len, tss);
--
2.39.2
next prev parent reply other threads:[~2023-07-11 20:21 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-11 20:20 [PATCH v2 0/9] Avoid the mmap lock for fault-around Matthew Wilcox (Oracle)
2023-07-11 20:20 ` [PATCH v2 1/9] Revert "tcp: Use per-vma locking for receive zerocopy" Matthew Wilcox (Oracle)
2023-07-14 3:02 ` Suren Baghdasaryan
2023-07-14 3:34 ` Matthew Wilcox
2023-07-24 14:49 ` Jann Horn
2023-07-24 15:06 ` Matthew Wilcox
2023-07-24 21:42 ` Jakub Kicinski
2023-07-11 20:20 ` [PATCH v2 2/9] mm: Allow per-VMA locks on file-backed VMAs Matthew Wilcox (Oracle)
2023-07-14 3:03 ` Suren Baghdasaryan
2023-07-11 20:20 ` [PATCH v2 3/9] mm: Move FAULT_FLAG_VMA_LOCK check from handle_mm_fault() Matthew Wilcox (Oracle)
2023-07-14 3:04 ` Suren Baghdasaryan
2023-07-11 20:20 ` [PATCH v2 4/9] mm: Move FAULT_FLAG_VMA_LOCK check into handle_pte_fault() Matthew Wilcox (Oracle)
2023-07-14 3:17 ` Suren Baghdasaryan
2023-07-24 15:46 ` Jann Horn
2023-07-24 16:37 ` Matthew Wilcox
2023-07-11 20:20 ` [PATCH v2 5/9] mm: Move FAULT_FLAG_VMA_LOCK check down in handle_pte_fault() Matthew Wilcox (Oracle)
2023-07-14 3:26 ` Suren Baghdasaryan
2023-07-24 15:46 ` Jann Horn
2023-07-24 17:45 ` Matthew Wilcox
2023-07-11 20:20 ` [PATCH v2 6/9] mm: Move the FAULT_FLAG_VMA_LOCK check down from do_fault() Matthew Wilcox (Oracle)
2023-07-14 3:27 ` Suren Baghdasaryan
2023-07-11 20:20 ` [PATCH v2 7/9] mm: Run the fault-around code under the VMA lock Matthew Wilcox (Oracle)
2023-07-14 3:32 ` Suren Baghdasaryan
2023-07-24 17:38 ` Matthew Wilcox
2023-07-11 20:20 ` [PATCH v2 8/9] mm: Remove CONFIG_PER_VMA_LOCK ifdefs Matthew Wilcox (Oracle)
2023-07-14 3:34 ` Suren Baghdasaryan
2023-07-11 20:20 ` Matthew Wilcox (Oracle) [this message]
2023-07-14 3:40 ` [PATCH v2 9/9] tcp: Use per-vma locking for receive zerocopy Suren Baghdasaryan
2023-07-21 18:48 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230711202047.3818697-10-willy@infradead.org \
--to=willy@infradead.org \
--cc=arjunroy@google.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=punit.agrawal@bytedance.com \
--cc=surenb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).