linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] tcp: add sanity checks to rx zerocopy
@ 2024-01-25 10:33 Eric Dumazet
  2024-01-25 16:07 ` Matthew Wilcox
  2024-01-29 12:10 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2024-01-25 10:33 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Neal Cardwell, netdev, eric.dumazet, Eric Dumazet, ZhangPeng,
	Arjun Roy, Matthew Wilcox, linux-mm, Andrew Morton, linux-fsdevel

TCP rx zerocopy intent is to map pages initially allocated
from NIC drivers, not pages owned by a fs.

This patch adds to can_map_frag() these additional checks:

- Page must not be a compound one.
- page->mapping must be NULL.

This fixes the panic reported by ZhangPeng.

syzbot was able to loopback packets built with sendfile(),
mapping pages owned by an ext4 file to TCP rx zerocopy.

r3 = socket$inet_tcp(0x2, 0x1, 0x0)
mmap(&(0x7f0000ff9000/0x4000)=nil, 0x4000, 0x0, 0x12, r3, 0x0)
r4 = socket$inet_tcp(0x2, 0x1, 0x0)
bind$inet(r4, &(0x7f0000000000)={0x2, 0x4e24, @multicast1}, 0x10)
connect$inet(r4, &(0x7f00000006c0)={0x2, 0x4e24, @empty}, 0x10)
r5 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
    0x181e42, 0x0)
fallocate(r5, 0x0, 0x0, 0x85b8)
sendfile(r4, r5, 0x0, 0x8ba0)
getsockopt$inet_tcp_TCP_ZEROCOPY_RECEIVE(r4, 0x6, 0x23,
    &(0x7f00000001c0)={&(0x7f0000ffb000/0x3000)=nil, 0x3000, 0x0, 0x0, 0x0,
    0x0, 0x0, 0x0, 0x0}, &(0x7f0000000440)=0x40)
r6 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
    0x181e42, 0x0)

Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive")
Link: https://lore.kernel.org/netdev/5106a58e-04da-372a-b836-9d3d0bd2507b@huawei.com/T/
Reported-and-bisected-by: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
---
 net/ipv4/tcp.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index a1c6de385ccef91fe3c3e072ac5d2a20f0394a2b..7e2481b9eae1b791e1ec65f39efa41837a9fcbd3 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1786,7 +1786,17 @@ static skb_frag_t *skb_advance_to_frag(struct sk_buff *skb, u32 offset_skb,
 
 static bool can_map_frag(const skb_frag_t *frag)
 {
-	return skb_frag_size(frag) == PAGE_SIZE && !skb_frag_off(frag);
+	struct page *page;
+
+	if (skb_frag_size(frag) != PAGE_SIZE || skb_frag_off(frag))
+		return false;
+
+	page = skb_frag_page(frag);
+
+	if (PageCompound(page) || page->mapping)
+		return false;
+
+	return true;
 }
 
 static int find_next_mappable_frag(const skb_frag_t *frag,
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: add sanity checks to rx zerocopy
  2024-01-25 10:33 [PATCH net] tcp: add sanity checks to rx zerocopy Eric Dumazet
@ 2024-01-25 16:07 ` Matthew Wilcox
  2024-01-25 16:09   ` Matthew Wilcox
  2024-01-25 22:23   ` Arjun Roy
  2024-01-29 12:10 ` patchwork-bot+netdevbpf
  1 sibling, 2 replies; 6+ messages in thread
From: Matthew Wilcox @ 2024-01-25 16:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
	netdev, eric.dumazet, ZhangPeng, Arjun Roy, linux-mm,
	Andrew Morton, linux-fsdevel

On Thu, Jan 25, 2024 at 10:33:17AM +0000, Eric Dumazet wrote:
> +++ b/net/ipv4/tcp.c
> @@ -1786,7 +1786,17 @@ static skb_frag_t *skb_advance_to_frag(struct sk_buff *skb, u32 offset_skb,
>  
>  static bool can_map_frag(const skb_frag_t *frag)
>  {
> -	return skb_frag_size(frag) == PAGE_SIZE && !skb_frag_off(frag);
> +	struct page *page;
> +
> +	if (skb_frag_size(frag) != PAGE_SIZE || skb_frag_off(frag))
> +		return false;
> +
> +	page = skb_frag_page(frag);
> +
> +	if (PageCompound(page) || page->mapping)
> +		return false;

I'm not entirely sure why you're testing PageCompound here.  If a driver
allocates a compound page, we'd still want to be able to insert it,
right?

I have a feeling that we want to fix this in the VM layer.  There are
some weird places calling vm_insert_page() and we should probably make
them all fail.

Something like this, perhaps?

diff --git a/mm/memory.c b/mm/memory.c
index 1a60faad2e49..ae0abab56d38 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1871,6 +1871,10 @@ static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
 
 	if (!pte_none(ptep_get(pte)))
 		return -EBUSY;
+	if (folio->mapping &&
+	    ((addr - vma->vm_start) / PAGE_SIZE + vma->vm_pgoff) !=
+	    (folio->index + folio_page_idx(folio, page)))
+		return -EINVAL;
 	/* Ok, finally just insert the thing.. */
 	folio_get(folio);
 	inc_mm_counter(vma->vm_mm, mm_counter_file(folio));

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: add sanity checks to rx zerocopy
  2024-01-25 16:07 ` Matthew Wilcox
@ 2024-01-25 16:09   ` Matthew Wilcox
  2024-01-25 16:54     ` Eric Dumazet
  2024-01-25 22:23   ` Arjun Roy
  1 sibling, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2024-01-25 16:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
	netdev, eric.dumazet, ZhangPeng, Arjun Roy, linux-mm,
	Andrew Morton, linux-fsdevel


Fixing email address for linux-mm.

On Thu, Jan 25, 2024 at 04:07:50PM +0000, Matthew Wilcox wrote:
> On Thu, Jan 25, 2024 at 10:33:17AM +0000, Eric Dumazet wrote:
> > +++ b/net/ipv4/tcp.c
> > @@ -1786,7 +1786,17 @@ static skb_frag_t *skb_advance_to_frag(struct sk_buff *skb, u32 offset_skb,
> >  
> >  static bool can_map_frag(const skb_frag_t *frag)
> >  {
> > -	return skb_frag_size(frag) == PAGE_SIZE && !skb_frag_off(frag);
> > +	struct page *page;
> > +
> > +	if (skb_frag_size(frag) != PAGE_SIZE || skb_frag_off(frag))
> > +		return false;
> > +
> > +	page = skb_frag_page(frag);
> > +
> > +	if (PageCompound(page) || page->mapping)
> > +		return false;
> 
> I'm not entirely sure why you're testing PageCompound here.  If a driver
> allocates a compound page, we'd still want to be able to insert it,
> right?
> 
> I have a feeling that we want to fix this in the VM layer.  There are
> some weird places calling vm_insert_page() and we should probably make
> them all fail.
> 
> Something like this, perhaps?
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 1a60faad2e49..ae0abab56d38 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1871,6 +1871,10 @@ static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
>  
>  	if (!pte_none(ptep_get(pte)))
>  		return -EBUSY;
> +	if (folio->mapping &&
> +	    ((addr - vma->vm_start) / PAGE_SIZE + vma->vm_pgoff) !=
> +	    (folio->index + folio_page_idx(folio, page)))
> +		return -EINVAL;
>  	/* Ok, finally just insert the thing.. */
>  	folio_get(folio);
>  	inc_mm_counter(vma->vm_mm, mm_counter_file(folio));
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: add sanity checks to rx zerocopy
  2024-01-25 16:09   ` Matthew Wilcox
@ 2024-01-25 16:54     ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2024-01-25 16:54 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
	netdev, eric.dumazet, ZhangPeng, Arjun Roy, linux-mm,
	Andrew Morton, linux-fsdevel

On Thu, Jan 25, 2024 at 5:09 PM Matthew Wilcox <willy@infradead.org> wrote:
>
>
> Fixing email address for linux-mm.
>
> On Thu, Jan 25, 2024 at 04:07:50PM +0000, Matthew Wilcox wrote:
> > On Thu, Jan 25, 2024 at 10:33:17AM +0000, Eric Dumazet wrote:
> > > +++ b/net/ipv4/tcp.c
> > > @@ -1786,7 +1786,17 @@ static skb_frag_t *skb_advance_to_frag(struct sk_buff *skb, u32 offset_skb,
> > >
> > >  static bool can_map_frag(const skb_frag_t *frag)
> > >  {
> > > -   return skb_frag_size(frag) == PAGE_SIZE && !skb_frag_off(frag);
> > > +   struct page *page;
> > > +
> > > +   if (skb_frag_size(frag) != PAGE_SIZE || skb_frag_off(frag))
> > > +           return false;
> > > +
> > > +   page = skb_frag_page(frag);
> > > +
> > > +   if (PageCompound(page) || page->mapping)
> > > +           return false;
> >
> > I'm not entirely sure why you're testing PageCompound here.  If a driver
> > allocates a compound page, we'd still want to be able to insert it,
> > right?

I tried to get something that would be free of merge conflicts, up to linux-4.18
I was not sure if I had to use compound_head(page) in order to test
for the mapping ?

page = compound_head(page);
if (page->mapping)
     return false;

I guess that we would have to adjust the page pointer based on
skb_frag_off(frag),
right now we bail if skb_frag_off(frag) is not zero.

I would leave this change for future kernels if there is interest.

> >
> > I have a feeling that we want to fix this in the VM layer.  There are
> > some weird places calling vm_insert_page() and we should probably make
> > them all fail.
> >
> > Something like this, perhaps?


Perhaps, but backports to stable versions (without folio) would be a
bit of a work ?

> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 1a60faad2e49..ae0abab56d38 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1871,6 +1871,10 @@ static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
> >
> >       if (!pte_none(ptep_get(pte)))
> >               return -EBUSY;
> > +     if (folio->mapping &&
> > +         ((addr - vma->vm_start) / PAGE_SIZE + vma->vm_pgoff) !=
> > +         (folio->index + folio_page_idx(folio, page)))
> > +             return -EINVAL;
> >       /* Ok, finally just insert the thing.. */
> >       folio_get(folio);
> >       inc_mm_counter(vma->vm_mm, mm_counter_file(folio));
> >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: add sanity checks to rx zerocopy
  2024-01-25 16:07 ` Matthew Wilcox
  2024-01-25 16:09   ` Matthew Wilcox
@ 2024-01-25 22:23   ` Arjun Roy
  1 sibling, 0 replies; 6+ messages in thread
From: Arjun Roy @ 2024-01-25 22:23 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Neal Cardwell, netdev, eric.dumazet, ZhangPeng, linux-mm,
	Andrew Morton, linux-fsdevel

On Thu, Jan 25, 2024 at 8:07 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Jan 25, 2024 at 10:33:17AM +0000, Eric Dumazet wrote:
> > +++ b/net/ipv4/tcp.c
> > @@ -1786,7 +1786,17 @@ static skb_frag_t *skb_advance_to_frag(struct sk_buff *skb, u32 offset_skb,
> >
> >  static bool can_map_frag(const skb_frag_t *frag)
> >  {
> > -     return skb_frag_size(frag) == PAGE_SIZE && !skb_frag_off(frag);
> > +     struct page *page;
> > +
> > +     if (skb_frag_size(frag) != PAGE_SIZE || skb_frag_off(frag))
> > +             return false;
> > +
> > +     page = skb_frag_page(frag);
> > +
> > +     if (PageCompound(page) || page->mapping)
> > +             return false;
>
> I'm not entirely sure why you're testing PageCompound here.  If a driver
> allocates a compound page, we'd still want to be able to insert it,
> right?
>

Resend b/c I forgot I was in HTML mode email, oops.

Is there a common use case for a NIC driver to be doing this? I was
under the impression NIC drivers would get pages individually since it
would be harder to find physically contiguous groupings in general.

Anyways, a possible reason to not allow compound pages - if we have
some large memory range and different receiving users are interested
in different parts of the range in userspace - since the whole range
is pinned till everyone is done with it, it can lead to worse cases of
memory lingering when some user only wanted 10 bytes out of some N *
PAGE_SIZE range for a multi-minute long period.

-Arjun


> I have a feeling that we want to fix this in the VM layer.  There are
> some weird places calling vm_insert_page() and we should probably make
> them all fail.
>
> Something like this, perhaps?
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 1a60faad2e49..ae0abab56d38 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1871,6 +1871,10 @@ static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
>
>         if (!pte_none(ptep_get(pte)))
>                 return -EBUSY;
> +       if (folio->mapping &&
> +           ((addr - vma->vm_start) / PAGE_SIZE + vma->vm_pgoff) !=
> +           (folio->index + folio_page_idx(folio, page)))
> +               return -EINVAL;
>         /* Ok, finally just insert the thing.. */
>         folio_get(folio);
>         inc_mm_counter(vma->vm_mm, mm_counter_file(folio));

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: add sanity checks to rx zerocopy
  2024-01-25 10:33 [PATCH net] tcp: add sanity checks to rx zerocopy Eric Dumazet
  2024-01-25 16:07 ` Matthew Wilcox
@ 2024-01-29 12:10 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-01-29 12:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, pabeni, ncardwell, netdev, eric.dumazet,
	zhangpeng362, arjunroy, willy, linux-mm, akpm, linux-fsdevel

Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Thu, 25 Jan 2024 10:33:17 +0000 you wrote:
> TCP rx zerocopy intent is to map pages initially allocated
> from NIC drivers, not pages owned by a fs.
> 
> This patch adds to can_map_frag() these additional checks:
> 
> - Page must not be a compound one.
> - page->mapping must be NULL.
> 
> [...]

Here is the summary with links:
  - [net] tcp: add sanity checks to rx zerocopy
    https://git.kernel.org/netdev/net/c/577e4432f3ac

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-01-29 12:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-25 10:33 [PATCH net] tcp: add sanity checks to rx zerocopy Eric Dumazet
2024-01-25 16:07 ` Matthew Wilcox
2024-01-25 16:09   ` Matthew Wilcox
2024-01-25 16:54     ` Eric Dumazet
2024-01-25 22:23   ` Arjun Roy
2024-01-29 12:10 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).