linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Btrfs: fix memory leak in reading btree blocks
@ 2016-08-03 19:33 Liu Bo
  2016-08-24 23:15 ` Liu Bo
  0 siblings, 1 reply; 3+ messages in thread
From: Liu Bo @ 2016-08-03 19:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

So we can read a btree block via readahead or intentional read,
and we can end up with a memory leak when something happens as
follows,
1) readahead starts to read block A but does not wait for read
   completion,
2) btree_readpage_end_io_hook finds that block A is corrupted,
   and it needs to clear all block A's pages' uptodate bit.
3) meanwhile an intentional read kicks in and checks block A's
   pages' uptodate to decide which page needs to be read.
4) when some pages have the uptodate bit during 3)'s check so
   3) doesn't count them for eb->io_pages, but they are later
   cleared by 2) so we has to readpage on the page, we get
   the wrong eb->io_pages which results in a memory leak of
   this block.

This fixes the problem by firstly getting all pages's locking and
then checking pages' uptodate bit.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent_io.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index bd29b9b..a77050e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5215,11 +5215,20 @@ int read_extent_buffer_pages(struct extent_io_tree *tree,
 			lock_page(page);
 		}
 		locked_pages++;
+	}
+	/*
+	 * We need to firstly lock all pages to make sure that
+	 * the uptodate bit of our pages won't be affected by
+	 * clear_extent_buffer_uptodate().
+	 */
+	for (i = start_i; i < num_pages; i++) {
+		page = eb->pages[i];
 		if (!PageUptodate(page)) {
 			num_reads++;
 			all_uptodate = 0;
 		}
 	}
+
 	if (all_uptodate) {
 		if (start_i == 0)
 			set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] Btrfs: fix memory leak in reading btree blocks
  2016-08-03 19:33 [PATCH] Btrfs: fix memory leak in reading btree blocks Liu Bo
@ 2016-08-24 23:15 ` Liu Bo
  2016-09-01 15:07   ` David Sterba
  0 siblings, 1 reply; 3+ messages in thread
From: Liu Bo @ 2016-08-24 23:15 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Hi David,

On Wed, Aug 03, 2016 at 12:33:01PM -0700, Liu Bo wrote:
> So we can read a btree block via readahead or intentional read,
> and we can end up with a memory leak when something happens as
> follows,
> 1) readahead starts to read block A but does not wait for read
>    completion,
> 2) btree_readpage_end_io_hook finds that block A is corrupted,
>    and it needs to clear all block A's pages' uptodate bit.
> 3) meanwhile an intentional read kicks in and checks block A's
>    pages' uptodate to decide which page needs to be read.
> 4) when some pages have the uptodate bit during 3)'s check so
>    3) doesn't count them for eb->io_pages, but they are later
>    cleared by 2) so we has to readpage on the page, we get
>    the wrong eb->io_pages which results in a memory leak of
>    this block.
> 
> This fixes the problem by firstly getting all pages's locking and
> then checking pages' uptodate bit.

   t1(readahead)                              t2(readahead endio)                                       t3(the following read)
read_extent_buffer_pages                    end_bio_extent_readpage                                  
  for pg in eb:                                for page 0,1,2 in eb:
      if pg is uptodate:                           btree_readpage_end_io_hook(pg)
          num_reads++                              if uptodate:                                                
  eb->io_pages = num_reads                             SetPageUptodate(pg)              _______________
  for pg in eb:                                for page 3 in eb:                                     read_extent_buffer_pages   
       if pg is NOT uptodate:                      btree_readpage_end_io_hook(pg)                       for pg in eb:
           __extent_read_full_page(pg)                 sanity check reports something wrong                 if pg is uptodate:
                                                       clear_extent_buffer_uptodate(eb)                         num_reads++
                                                           for pg in eb:                                eb->io_pages = num_reads
                                                               ClearPageUptodate(page)  _______________
                                                                                                        for pg in eb:
                                                                                                            if pg is NOT uptodate:
                                                                                                                __extent_read_full_page(pg)

So t3's eb->io_pages is not consistent with the number of pages it's reading, and during endio(), atomic_dec_and_test(&eb->io_pages) will get a negative number so that we're not able to free the eb.

Thanks,

-liubo

> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/extent_io.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index bd29b9b..a77050e 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5215,11 +5215,20 @@ int read_extent_buffer_pages(struct extent_io_tree *tree,
>  			lock_page(page);
>  		}
>  		locked_pages++;
> +	}
> +	/*
> +	 * We need to firstly lock all pages to make sure that
> +	 * the uptodate bit of our pages won't be affected by
> +	 * clear_extent_buffer_uptodate().
> +	 */
> +	for (i = start_i; i < num_pages; i++) {
> +		page = eb->pages[i];
>  		if (!PageUptodate(page)) {
>  			num_reads++;
>  			all_uptodate = 0;
>  		}
>  	}
> +
>  	if (all_uptodate) {
>  		if (start_i == 0)
>  			set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
> -- 
> 2.5.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Btrfs: fix memory leak in reading btree blocks
  2016-08-24 23:15 ` Liu Bo
@ 2016-09-01 15:07   ` David Sterba
  0 siblings, 0 replies; 3+ messages in thread
From: David Sterba @ 2016-09-01 15:07 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs, David Sterba

On Wed, Aug 24, 2016 at 04:15:50PM -0700, Liu Bo wrote:
> Hi David,
> 
> On Wed, Aug 03, 2016 at 12:33:01PM -0700, Liu Bo wrote:
> > So we can read a btree block via readahead or intentional read,
> > and we can end up with a memory leak when something happens as
> > follows,
> > 1) readahead starts to read block A but does not wait for read
> >    completion,
> > 2) btree_readpage_end_io_hook finds that block A is corrupted,
> >    and it needs to clear all block A's pages' uptodate bit.
> > 3) meanwhile an intentional read kicks in and checks block A's
> >    pages' uptodate to decide which page needs to be read.
> > 4) when some pages have the uptodate bit during 3)'s check so
> >    3) doesn't count them for eb->io_pages, but they are later
> >    cleared by 2) so we has to readpage on the page, we get
> >    the wrong eb->io_pages which results in a memory leak of
> >    this block.
> > 
> > This fixes the problem by firstly getting all pages's locking and
> > then checking pages' uptodate bit.
> 
>    t1(readahead)                              t2(readahead endio)                                       t3(the following read)
> read_extent_buffer_pages                    end_bio_extent_readpage                                  
>   for pg in eb:                                for page 0,1,2 in eb:
>       if pg is uptodate:                           btree_readpage_end_io_hook(pg)
>           num_reads++                              if uptodate:                                                
>   eb->io_pages = num_reads                             SetPageUptodate(pg)              _______________
>   for pg in eb:                                for page 3 in eb:                                     read_extent_buffer_pages   
>        if pg is NOT uptodate:                      btree_readpage_end_io_hook(pg)                       for pg in eb:
>            __extent_read_full_page(pg)                 sanity check reports something wrong                 if pg is uptodate:
>                                                        clear_extent_buffer_uptodate(eb)                         num_reads++
>                                                            for pg in eb:                                eb->io_pages = num_reads
>                                                                ClearPageUptodate(page)  _______________
>                                                                                                         for pg in eb:
>                                                                                                             if pg is NOT uptodate:
>                                                                                                                 __extent_read_full_page(pg)
> 
> So t3's eb->io_pages is not consistent with the number of pages it's
> reading, and during endio(), atomic_dec_and_test(&eb->io_pages) will
> get a negative number so that we're not able to free the eb.

Thanks for the details, that helped me. I'll add the schematic to the
commit log.

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-09-01 15:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-03 19:33 [PATCH] Btrfs: fix memory leak in reading btree blocks Liu Bo
2016-08-24 23:15 ` Liu Bo
2016-09-01 15:07   ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).