All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Xu <peterx@redhat.com>, Linux-MM <linux-mm@kvack.org>,
	kernel list <linux-kernel@vger.kernel.org>,
	Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	David Hildenbrand <david@redhat.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Mina Almasry <almasrymina@google.com>,
	Rik van Riel <riel@surriel.com>, Vlastimil Babka <vbabka@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v8 2/2] mm: remove zap_page_range and change callers to use zap_vma_range
Date: Thu, 10 Nov 2022 14:17:39 -0800	[thread overview]
Message-ID: <Y214g5vrDqOi6Tmw@monkey> (raw)
In-Reply-To: <C16276B9-F40A-4767-AB3A-8566FA61A8DF@gmail.com>

On 11/10/22 14:02, Nadav Amit wrote:
> On Nov 10, 2022, at 1:27 PM, Peter Xu <peterx@redhat.com> wrote:
> 
> > Hi, Nadav,
> > 
> > On Thu, Nov 10, 2022 at 01:09:43PM -0800, Nadav Amit wrote:
> >> But, are the callers really able to guarantee that the ranges are all in a
> >> single VMA? I am not familiar with the users, but how for instance
> >> tcp_zerocopy_receive() can guarantee that no one did some mprotect() of some
> >> sorts that caused the original VMA to be split?
> > 
> > Let me try to answer this one for Mike..  We have two callers in tcp
> > zerocopy code for this function:
> > 
> > tcp_zerocopy_vm_insert_batch_error[2095] zap_page_range(vma, *address, maybe_zap_len);
> > tcp_zerocopy_receive[2237]     zap_page_range(vma, address, total_bytes_to_map);
> > 
> > Both of them take the mmap lock for read, so firstly mprotect is not
> > possible.
> > 
> > The 1st call has:
> > 
> > 	mmap_read_lock(current->mm);
> > 
> > 	vma = vma_lookup(current->mm, address);
> > 	if (!vma || vma->vm_ops != &tcp_vm_ops) {
> > 		mmap_read_unlock(current->mm);
> > 		return -EINVAL;
> > 	}
> > 	vma_len = min_t(unsigned long, zc->length, vma->vm_end - address);
> > 	avail_len = min_t(u32, vma_len, inq);
> > 	total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1);
> > 	if (total_bytes_to_map) {
> > 		if (!(zc->flags & TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT))
> > 			zap_page_range(vma, address, total_bytes_to_map);
> > 
> > Here total_bytes_to_map comes from avail_len <--- vma_len, which is a min()
> > of the rest vma range.  So total_bytes_to_map will never go beyond the vma.
> > 
> > The 2nd call uses maybe_zap_len as len, we need to look two layers of the
> > callers, but ultimately it's something smaller than total_bytes_to_map we
> > discussed.  Hopefully it proves 100% safety on tcp zerocopy.
> 
> Thanks Peter for the detailed explanation.
> 
> I had another look at the code and indeed it should not break. I am not sure
> whether users who zero-copy receive and mprotect() part of the memory would
> not be surprised, but I guess that’s a different story, which I should
> further study at some point.

I did audit all calling sites and am fairly certain passed ranges are within
a single vma.  Because of this, Peter suggested removing zap_page_range.  If
there is concern, we can just fix up the mmu notifiers in zap_page_range and
leave it.  This is what is done in the patch which is currently in
mm-hotfixes-unstable.

-- 
Mike Kravetz


  reply	other threads:[~2022-11-10 22:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-08  1:19 [PATCH v8 0/2] hugetlb MADV_DONTNEED fix and zap_page_range cleanup Mike Kravetz
2022-11-08  1:19 ` [PATCH v8 1/2] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing Mike Kravetz
2022-11-10 20:56   ` Peter Xu
2022-11-10 21:55     ` Mike Kravetz
2022-11-10 20:59   ` Nadav Amit
2022-11-10 21:48     ` Mike Kravetz
2022-11-10 22:07       ` Peter Xu
2022-11-10 22:22       ` Nadav Amit
2022-11-10 22:31         ` Mike Kravetz
2022-11-10 21:07   ` Peter Xu
2022-11-08  1:19 ` [PATCH v8 2/2] mm: remove zap_page_range and change callers to use zap_vma_range Mike Kravetz
2022-11-10 21:09   ` Nadav Amit
2022-11-10 21:27     ` Peter Xu
2022-11-10 22:02       ` Nadav Amit
2022-11-10 22:17         ` Mike Kravetz [this message]
2022-11-10 19:46 ` [PATCH v8 0/2] hugetlb MADV_DONTNEED fix and zap_page_range cleanup Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y214g5vrDqOi6Tmw@monkey \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nadav.amit@gmail.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.