linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: takafumi-sslab <takafumi.kubota1012@sslab.ics.keio.ac.jp>
To: bo.li.liu@oracle.com
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] Btrfs: add another missing end_page_writeback on submit_extent_page failure
Date: Mon, 6 Feb 2017 14:50:18 +0900	[thread overview]
Message-ID: <fe79b315-196c-b981-5151-9c8cc1901632@sslab.ics.keio.ac.jp> (raw)
In-Reply-To: <20170206033527.GA23573@localhost.localdomain>



On 2017/02/06 12:35, Liu Bo wrote:
> a) __extent_writepage has handled the case when nr == 0.

Yes, I agree this.

> b) when nr == 1, the page is marked with writeback bit and added into a
>     bio, thus we have bio_end to deal with page bits.

However, I don't think this is always correct,
because, as I said before as the Error A, when (bio_ret == NULL or 
*bio_ret == NULL) and btrfs_bio_alloc() fails, the page is not added 
into a bio.
Thus, bio_end does not deal with the page's bit, and the writeback bit 
remains with nr == 1.

To confirm this, I inject a fault to mimic the fail of btrfs_bio_alloc().
I can reproduce the hang at the kernel ver4.10-rc7 that commit id is 
d5adbfcd5f7bcc6fa58a41c5c5ada0e5c826ce2c
and  Ftrace log messages are like below

a.out-2867  : __extent_writepage: call __extent_writepage_io: 
page:0xffffea000006d200(writeback:0), nr:0, logged at just previous to 
line 3523 of extent_io.c
a.out-2867  : __extent_writepage_io: call submit_extent_page: 
page:0xffffea000006d200(writeback:1), nr:0, bio_ret:0xffffc90003bd7d40, 
*bio_ret: (null), logged at just previous to line 3443 of extent_io.c
a.out-2867  : submit_extent_page.isra.53: Mimic Error A: 
btrfs_bio_alloc() fails and submit_extent_page() return -ENOMEM, logged 
at just previous to line 2807 of extent_io.c
a.out-2867  : __extent_writepage_io: after call submit_extent_page: 
page:0xffffea000006d200(writeback:1), nr:0, bio_ret:0xffffc90003bd7d40, 
*bio_ret: (null), logged at just next to line 3447 of extent_io.c
a.out-2867  : __extent_writepage_io: nr == 1: 
page:0xffffea000006d200(writeback:1), nr:1, bio_ret:0xffffc90003de7d40, 
logged at just next to line 3456 of extent_io.c //nr == 1 and the 
writeback bit remains
a.out-2867  : __extent_writepage: after call __extent_writepage_io: 
page:0xffffea000006d200(writeback:1), nr:1, logged at just next to line 
3524 of extent_io.c // __extent_writepage does not handle, because nr == 1
...
sync-2868   : __filemap_fdatawait_range: wait_on_page_writeback: 
page:0xffffea000006d200, logged at just previous to line 404 of 
mm/filemap.c // Then, sync hangs

Sincerely,

-takafumi
>
> So I don't think the patch is necessary for now.
>
> But as I said, the fact (nr == 0 or 1) would be changed if the
> subpagesize blocksize is supported.
>
> Thanks,
>
> -liubo
>
>> Sincerely,
>>
>> -takafumi
>>> Thanks,
>>>
>>> -liubo
>>>> Sincerely,
>>>>
>>>> On 2017/01/31 5:09, Liu Bo wrote:
>>>>> On Fri, Jan 13, 2017 at 03:12:31PM +0900, takafumi-sslab wrote:
>>>>>> Thanks for your replying.
>>>>>>
>>>>>> I understand this bug is more complicated than I expected.
>>>>>> I classify error cases under submit_extent_page() below
>>>>>>
>>>>>> A: ENOMEM error at btrfs_bio_alloc() in submit_extent_page()
>>>>>> I first assumed this case and sent the mail.
>>>>>> When bio_ret is NULL, submit_extent_page() calls btrfs_bio_alloc().
>>>>>> Then, btrfs_bio_alloc() may fail and submit_extent_page() returns -ENOMEM.
>>>>>> In this case, bio_endio() is not called and the page's writeback bit
>>>>>> remains.
>>>>>> So, there is a need to call end_page_writeback() in the error handling.
>>>>>>
>>>>>> B: errors under submit_one_bio() of submit_extent_page()
>>>>>> Errors that occur under submit_one_bio() handles at bio_endio(), and
>>>>>> bio_endio() would call end_page_writeback().
>>>>>>
>>>>>> Therefore, as you mentioned in the last mail, simply adding
>>>>>> end_page_writeback() like my last email and commit 55e3bd2e0c2e1 can
>>>>>> conflict in the case of B.
>>>>>> To avoid such conflict, one easy solution is adding PageWriteback() check
>>>>>> too.
>>>>>>
>>>>>> How do you think of this solution?
>>>>> (sorry for the late reply.)
>>>>>
>>>>> I think its caller, "__extent_writepage", has covered the above case
>>>>> by setting page writeback again.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -liubo
>>>>>> Sincerely,
>>>>>>
>>>>>> On 2016/12/22 15:20, Liu Bo wrote:
>>>>>>> On Fri, Dec 16, 2016 at 03:41:50PM +0900, Takafumi Kubota wrote:
>>>>>>>> This is actually inspired by Filipe's patch(55e3bd2e0c2e1).
>>>>>>>>
>>>>>>>> When submit_extent_page() in __extent_writepage_io() fails,
>>>>>>>> Btrfs misses clearing a writeback bit of the failed page.
>>>>>>>> This causes the false under-writeback page.
>>>>>>>> Then, another sync task hangs in filemap_fdatawait_range(),
>>>>>>>> because it waits the false under-writeback page.
>>>>>>>>
>>>>>>>> CPU0                            CPU1
>>>>>>>>
>>>>>>>> __extent_writepage_io()
>>>>>>>>       ret = submit_extent_page() // fail
>>>>>>>>
>>>>>>>>       if (ret)
>>>>>>>>         SetPageError(page)
>>>>>>>>         // miss clearing the writeback bit
>>>>>>>>
>>>>>>>>                                     sync()
>>>>>>>>                                       ...
>>>>>>>>                                       filemap_fdatawait_range()
>>>>>>>>                                         wait_on_page_writeback(page);
>>>>>>>>                                         // wait the false under-writeback page
>>>>>>>>
>>>>>>>> Signed-off-by: Takafumi Kubota <takafumi.kubota1012@sslab.ics.keio.ac.jp>
>>>>>>>> ---
>>>>>>>>      fs/btrfs/extent_io.c | 4 +++-
>>>>>>>>      1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>>>>>>> index 1e67723..ef9793b 100644
>>>>>>>> --- a/fs/btrfs/extent_io.c
>>>>>>>> +++ b/fs/btrfs/extent_io.c
>>>>>>>> @@ -3443,8 +3443,10 @@ static noinline_for_stack int __extent_writepage_io(struct inode *inode,
>>>>>>>>      					 bdev->bio, max_nr,
>>>>>>>>      					 end_bio_extent_writepage,
>>>>>>>>      					 0, 0, 0, false);
>>>>>>>> -		if (ret)
>>>>>>>> +		if (ret) {
>>>>>>>>      			SetPageError(page);
>>>>>>>> +			end_page_writeback(page);
>>>>>>>> +		}
>>>>>>> OK...this could be complex as we don't know which part in
>>>>>>> submit_extent_page gets the error, if the page has been added into bio
>>>>>>> and bio_end would call end_page_writepage(page) as well, so whichever
>>>>>>> comes later, the BUG() in end_page_writeback() would complain.
>>>>>>>
>>>>>>> Looks like commit 55e3bd2e0c2e1 also has the same problem although I
>>>>>>> gave it my reviewed-by.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> -liubo
>>>>>>>
>>>>>>>>      		cur = cur + iosize;
>>>>>>>>      		pg_offset += iosize;
>>>>>>>> -- 
>>>>>>>> 1.9.3
>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> -- 
>>>>>> Keio University
>>>>>> System Software Laboratory
>>>>>> Takafumi Kubota
>>>>>> takafumi.kubota1012@sslab.ics.keio.jp
>>>>>>
>>>> -- 
>>>> Keio University
>>>> System Software Laboratory
>>>> Takafumi Kubota
>>>> takafumi.kubota1012@sslab.ics.keio.jp
>>>>
>> -- 
>> Keio University
>> System Software Laboratory
>> Takafumi Kubota
>> takafumi.kubota1012@sslab.ics.keio.jp
>>

-- 
Keio University
System Software Laboratory
Takafumi Kubota
takafumi.kubota1012@sslab.ics.keio.jp


  reply	other threads:[~2017-02-06  5:50 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-16  6:41 [PATCH] Btrfs: add another missing end_page_writeback on submit_extent_page failure Takafumi Kubota
2016-12-22  6:20 ` Liu Bo
2017-01-13  6:12   ` takafumi-sslab
2017-01-30 20:09     ` Liu Bo
2017-02-01  3:27       ` takafumi-sslab
2017-02-01 16:19         ` Liu Bo
2017-02-04 12:42           ` takafumi-sslab
2017-02-06  3:35             ` Liu Bo
2017-02-06  5:50               ` takafumi-sslab [this message]
2017-02-06 16:26                 ` Liu Bo
2017-02-06 16:34                   ` Liu Bo
2017-02-07 11:09                     ` takafumi-sslab
2017-02-07 20:14                       ` Liu Bo
2017-02-08 18:30                         ` David Sterba
2017-02-06  9:36               ` takafumi-sslab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fe79b315-196c-b981-5151-9c8cc1901632@sslab.ics.keio.ac.jp \
    --to=takafumi.kubota1012@sslab.ics.keio.ac.jp \
    --cc=bo.li.liu@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).