linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: linux-mm@kvack.org, Arjun Roy <arjunroy@google.com>,
	Eric Dumazet <edumazet@google.com>,
	linux-fsdevel@vger.kernel.org,
	Punit Agrawal <punit.agrawal@bytedance.com>,
	"David S . Miller" <davem@davemloft.net>
Subject: Re: [PATCH v2 9/9] tcp: Use per-vma locking for receive zerocopy
Date: Thu, 13 Jul 2023 20:40:46 -0700	[thread overview]
Message-ID: <CAJuCfpHDripryQ2T_gzwhZ0ogEgs6vdyrgOoj79ky8Uoyvg0vg@mail.gmail.com> (raw)
In-Reply-To: <20230711202047.3818697-10-willy@infradead.org>

On Tue, Jul 11, 2023 at 1:21 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> From: Arjun Roy <arjunroy@google.com>
>
> Per-VMA locking allows us to lock a struct vm_area_struct without
> taking the process-wide mmap lock in read mode.
>
> Consider a process workload where the mmap lock is taken constantly in
> write mode. In this scenario, all zerocopy receives are periodically
> blocked during that period of time - though in principle, the memory
> ranges being used by TCP are not touched by the operations that need
> the mmap write lock. This results in performance degradation.
>
> Now consider another workload where the mmap lock is never taken in
> write mode, but there are many TCP connections using receive zerocopy
> that are concurrently receiving. These connections all take the mmap
> lock in read mode, but this does induce a lot of contention and atomic
> ops for this process-wide lock. This results in additional CPU
> overhead caused by contending on the cache line for this lock.
>
> However, with per-vma locking, both of these problems can be avoided.
>
> As a test, I ran an RPC-style request/response workload with 4KB
> payloads and receive zerocopy enabled, with 100 simultaneous TCP
> connections. I measured perf cycles within the
> find_tcp_vma/mmap_read_lock/mmap_read_unlock codepath, with and
> without per-vma locking enabled.
>
> When using process-wide mmap semaphore read locking, about 1% of
> measured perf cycles were within this path. With per-VMA locking, this
> value dropped to about 0.45%.
>
> Signed-off-by: Arjun Roy <arjunroy@google.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Seems to match the original version with less churn.

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  net/ipv4/tcp.c | 39 ++++++++++++++++++++++++++++++++-------
>  1 file changed, 32 insertions(+), 7 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 1542de3f66f7..7118ec6cf886 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2038,6 +2038,30 @@ static void tcp_zc_finalize_rx_tstamp(struct sock *sk,
>         }
>  }
>

nit: Maybe add a comment that mmap_locked value is
undefined/meaningless if the function returns NULL?

> +static struct vm_area_struct *find_tcp_vma(struct mm_struct *mm,
> +               unsigned long address, bool *mmap_locked)
> +{
> +       struct vm_area_struct *vma = lock_vma_under_rcu(mm, address);
> +
> +       if (vma) {
> +               if (vma->vm_ops != &tcp_vm_ops) {
> +                       vma_end_read(vma);
> +                       return NULL;
> +               }
> +               *mmap_locked = false;
> +               return vma;
> +       }
> +
> +       mmap_read_lock(mm);
> +       vma = vma_lookup(mm, address);
> +       if (!vma || vma->vm_ops != &tcp_vm_ops) {
> +               mmap_read_unlock(mm);
> +               return NULL;
> +       }
> +       *mmap_locked = true;
> +       return vma;
> +}
> +
>  #define TCP_ZEROCOPY_PAGE_BATCH_SIZE 32
>  static int tcp_zerocopy_receive(struct sock *sk,
>                                 struct tcp_zerocopy_receive *zc,
> @@ -2055,6 +2079,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
>         u32 seq = tp->copied_seq;
>         u32 total_bytes_to_map;
>         int inq = tcp_inq(sk);
> +       bool mmap_locked;
>         int ret;
>
>         zc->copybuf_len = 0;
> @@ -2079,13 +2104,10 @@ static int tcp_zerocopy_receive(struct sock *sk,
>                 return 0;
>         }
>
> -       mmap_read_lock(current->mm);
> -
> -       vma = vma_lookup(current->mm, address);
> -       if (!vma || vma->vm_ops != &tcp_vm_ops) {
> -               mmap_read_unlock(current->mm);
> +       vma = find_tcp_vma(current->mm, address, &mmap_locked);
> +       if (!vma)
>                 return -EINVAL;
> -       }
> +
>         vma_len = min_t(unsigned long, zc->length, vma->vm_end - address);
>         avail_len = min_t(u32, vma_len, inq);
>         total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1);
> @@ -2159,7 +2181,10 @@ static int tcp_zerocopy_receive(struct sock *sk,
>                                                    zc, total_bytes_to_map);
>         }
>  out:
> -       mmap_read_unlock(current->mm);
> +       if (mmap_locked)
> +               mmap_read_unlock(current->mm);
> +       else
> +               vma_end_read(vma);
>         /* Try to copy straggler data. */
>         if (!ret)
>                 copylen = tcp_zc_handle_leftover(zc, sk, skb, &seq, copybuf_len, tss);
> --
> 2.39.2
>

  reply	other threads:[~2023-07-14  3:41 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-11 20:20 [PATCH v2 0/9] Avoid the mmap lock for fault-around Matthew Wilcox (Oracle)
2023-07-11 20:20 ` [PATCH v2 1/9] Revert "tcp: Use per-vma locking for receive zerocopy" Matthew Wilcox (Oracle)
2023-07-14  3:02   ` Suren Baghdasaryan
2023-07-14  3:34     ` Matthew Wilcox
2023-07-24 14:49       ` Jann Horn
2023-07-24 15:06         ` Matthew Wilcox
2023-07-24 21:42           ` Jakub Kicinski
2023-07-11 20:20 ` [PATCH v2 2/9] mm: Allow per-VMA locks on file-backed VMAs Matthew Wilcox (Oracle)
2023-07-14  3:03   ` Suren Baghdasaryan
2023-07-11 20:20 ` [PATCH v2 3/9] mm: Move FAULT_FLAG_VMA_LOCK check from handle_mm_fault() Matthew Wilcox (Oracle)
2023-07-14  3:04   ` Suren Baghdasaryan
2023-07-11 20:20 ` [PATCH v2 4/9] mm: Move FAULT_FLAG_VMA_LOCK check into handle_pte_fault() Matthew Wilcox (Oracle)
2023-07-14  3:17   ` Suren Baghdasaryan
2023-07-24 15:46   ` Jann Horn
2023-07-24 16:37     ` Matthew Wilcox
2023-07-11 20:20 ` [PATCH v2 5/9] mm: Move FAULT_FLAG_VMA_LOCK check down in handle_pte_fault() Matthew Wilcox (Oracle)
2023-07-14  3:26   ` Suren Baghdasaryan
2023-07-24 15:46   ` Jann Horn
2023-07-24 17:45     ` Matthew Wilcox
2023-07-11 20:20 ` [PATCH v2 6/9] mm: Move the FAULT_FLAG_VMA_LOCK check down from do_fault() Matthew Wilcox (Oracle)
2023-07-14  3:27   ` Suren Baghdasaryan
2023-07-11 20:20 ` [PATCH v2 7/9] mm: Run the fault-around code under the VMA lock Matthew Wilcox (Oracle)
2023-07-14  3:32   ` Suren Baghdasaryan
2023-07-24 17:38     ` Matthew Wilcox
2023-07-11 20:20 ` [PATCH v2 8/9] mm: Remove CONFIG_PER_VMA_LOCK ifdefs Matthew Wilcox (Oracle)
2023-07-14  3:34   ` Suren Baghdasaryan
2023-07-11 20:20 ` [PATCH v2 9/9] tcp: Use per-vma locking for receive zerocopy Matthew Wilcox (Oracle)
2023-07-14  3:40   ` Suren Baghdasaryan [this message]
2023-07-21 18:48   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJuCfpHDripryQ2T_gzwhZ0ogEgs6vdyrgOoj79ky8Uoyvg0vg@mail.gmail.com \
    --to=surenb@google.com \
    --cc=arjunroy@google.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=punit.agrawal@bytedance.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).