Re: [RFC PATCH 1/2] ext4: Fix possible deadlock with local interrupts disabled and page-draining IPI

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nikolay Borisov <kernel@kyup.com>
To: Jan Kara <jack@suse.cz>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>,
	'linux-kernel' <linux-kernel@vger.kernel.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	linux-fsdevel@vger.kernel.org,
	SiteGround Operations <operations@siteground.com>,
	vbabka@suse.cz, gilad@benyossef.com, mgorman@suse.de,
	linux-mm@kvack.org, Marian Marinov <mm@1h.com>
Subject: Re: [RFC PATCH 1/2] ext4: Fix possible deadlock with local interrupts disabled and page-draining IPI
Date: Mon, 12 Oct 2015 17:51:07 +0300	[thread overview]
Message-ID: <561BC8DB.6070600@kyup.com> (raw)
In-Reply-To: <20151012134020.GA21302@quack.suse.cz>

Hello and thanks for the reply,

On 10/12/2015 04:40 PM, Jan Kara wrote:
> On Fri 09-10-15 11:03:30, Nikolay Borisov wrote:
>> On 10/09/2015 10:37 AM, Hillf Danton wrote:
>>>>>> @@ -109,8 +109,8 @@ static void ext4_finish_bio(struct bio *bio)
>>>>>>  			if (bio->bi_error)
>>>>>>  				buffer_io_error(bh);
>>>>>>  		} while ((bh = bh->b_this_page) != head);
>>>>>> -		bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
>>>>>>  		local_irq_restore(flags);
>>>>>
>>>>> What if it takes 100ms to unlock after IRQ restored?
>>>>
>>>> I'm not sure I understand in what direction you are going? Care to
>>>> elaborate?
>>>>
>>> Your change introduces extra time cost the lock waiter has to pay in
>>> the case that irq happens before the lock is released.
>>
>> [CC filesystem and mm people. For reference the thread starts here:
>>  http://thread.gmane.org/gmane.linux.kernel/2056996 ]
>>
>> Right, I see what you mean and it's a good point but when doing the
>> patches I was striving for correctness and starting a discussion, hence
>> the RFC. In any case I'd personally choose correctness over performance
>> always ;).
>>
>> As I'm not an fs/ext4 expert and have added the relevant parties (please
>> use reply-all from now on so that the thread is not being cut in the
>> middle) who will be able to say whether it impact is going to be that
>> big. I guess in this particular code path worrying about this is prudent
>> as writeback sounds like a heavily used path.
>>
>> Maybe the problem should be approached from a different angle e.g.
>> drain_all_pages and its reliance on the fact that the IPI will always be
>> delivered in some finite amount of time? But what if a cpu with disabled
>> interrupts is waiting on the task issuing the IPI?
> 
> So I have looked through your patch and also original report (thread starts
> here: https://lkml.org/lkml/2015/10/8/341) and IMHO one question hasn't
> been properly answered yet: Who is holding BH_Uptodate_Lock we are spinning
> on? You have suggested in https://lkml.org/lkml/2015/10/8/464 that it was
> __block_write_full_page_endio() call but that cannot really be the case.
> BH_Uptodate_Lock is used only in IO completion handlers -
> end_buffer_async_read, end_buffer_async_write, ext4_finish_bio. So there
> really should be some end_io function running on some other CPU which holds
> BH_Uptodate_Lock for that buffer.

I did check all the call traces of the current processes on the machine
at the time of the hard lockup and none of the 3 functions you mentioned
were in any of the call chains. But while I was looking the code of
end_buffer_async_write and in the comments I saw it was mentioned that
those completion handler were called from __block_write_full_page_endio
so that's what pointed my attention to that function. But you are right
that it doesn't take the BH lock.

Furthermore the fact that the BH_Async_Write flag is set points me in
the direction that end_buffer_async_write should have been executing but
as I said issuing "bt" for all the tasks didn't show this function.

I'm beginning to wonder if it's possible that a single bit memory error
has crept up, but this still seems like a long shot...

Btw I think in any case the spin_lock patch is wrong as this code can be
called from within softirq context and we do want to be interrupt safe
at that point.

> 
> BTW: I suppose the filesystem uses 4k blocksize, doesn't it?

Unfortunately I cannot tell you with 100% certainty, since on this
server there are multiple block devices with blocksize either 1k or 4k.
So it is one of these. If you know a way to extract this information
from a vmcore file I'd be happy to do it.

> 
> 								Honza
> 
>>>>>> +		bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
>>>>>>  		if (!under_io) {
>>>>>>  #ifdef CONFIG_EXT4_FS_ENCRYPTION
>>>>>>  			if (ctx)
>>>>>> --
>>>>>> 2.5.0
>>>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)

From: Nikolay Borisov <kernel@kyup.com>
To: Jan Kara <jack@suse.cz>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>,
	'linux-kernel' <linux-kernel@vger.kernel.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	linux-fsdevel@vger.kernel.org,
	SiteGround Operations <operations@siteground.com>,
	vbabka@suse.cz, gilad@benyossef.com, mgorman@suse.de,
	linux-mm@kvack.org, Marian Marinov <mm@1h.com>
Subject: Re: [RFC PATCH 1/2] ext4: Fix possible deadlock with local interrupts disabled and page-draining IPI
Date: Mon, 12 Oct 2015 17:51:07 +0300	[thread overview]
Message-ID: <561BC8DB.6070600@kyup.com> (raw)
In-Reply-To: <20151012134020.GA21302@quack.suse.cz>

Hello and thanks for the reply,

On 10/12/2015 04:40 PM, Jan Kara wrote:
> On Fri 09-10-15 11:03:30, Nikolay Borisov wrote:
>> On 10/09/2015 10:37 AM, Hillf Danton wrote:
>>>>>> @@ -109,8 +109,8 @@ static void ext4_finish_bio(struct bio *bio)
>>>>>>  			if (bio->bi_error)
>>>>>>  				buffer_io_error(bh);
>>>>>>  		} while ((bh = bh->b_this_page) != head);
>>>>>> -		bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
>>>>>>  		local_irq_restore(flags);
>>>>>
>>>>> What if it takes 100ms to unlock after IRQ restored?
>>>>
>>>> I'm not sure I understand in what direction you are going? Care to
>>>> elaborate?
>>>>
>>> Your change introduces extra time cost the lock waiter has to pay in
>>> the case that irq happens before the lock is released.
>>
>> [CC filesystem and mm people. For reference the thread starts here:
>>  http://thread.gmane.org/gmane.linux.kernel/2056996 ]
>>
>> Right, I see what you mean and it's a good point but when doing the
>> patches I was striving for correctness and starting a discussion, hence
>> the RFC. In any case I'd personally choose correctness over performance
>> always ;).
>>
>> As I'm not an fs/ext4 expert and have added the relevant parties (please
>> use reply-all from now on so that the thread is not being cut in the
>> middle) who will be able to say whether it impact is going to be that
>> big. I guess in this particular code path worrying about this is prudent
>> as writeback sounds like a heavily used path.
>>
>> Maybe the problem should be approached from a different angle e.g.
>> drain_all_pages and its reliance on the fact that the IPI will always be
>> delivered in some finite amount of time? But what if a cpu with disabled
>> interrupts is waiting on the task issuing the IPI?
> 
> So I have looked through your patch and also original report (thread starts
> here: https://lkml.org/lkml/2015/10/8/341) and IMHO one question hasn't
> been properly answered yet: Who is holding BH_Uptodate_Lock we are spinning
> on? You have suggested in https://lkml.org/lkml/2015/10/8/464 that it was
> __block_write_full_page_endio() call but that cannot really be the case.
> BH_Uptodate_Lock is used only in IO completion handlers -
> end_buffer_async_read, end_buffer_async_write, ext4_finish_bio. So there
> really should be some end_io function running on some other CPU which holds
> BH_Uptodate_Lock for that buffer.

I did check all the call traces of the current processes on the machine
at the time of the hard lockup and none of the 3 functions you mentioned
were in any of the call chains. But while I was looking the code of
end_buffer_async_write and in the comments I saw it was mentioned that
those completion handler were called from __block_write_full_page_endio
so that's what pointed my attention to that function. But you are right
that it doesn't take the BH lock.

Furthermore the fact that the BH_Async_Write flag is set points me in
the direction that end_buffer_async_write should have been executing but
as I said issuing "bt" for all the tasks didn't show this function.

I'm beginning to wonder if it's possible that a single bit memory error
has crept up, but this still seems like a long shot...

Btw I think in any case the spin_lock patch is wrong as this code can be
called from within softirq context and we do want to be interrupt safe
at that point.

> 
> BTW: I suppose the filesystem uses 4k blocksize, doesn't it?

Unfortunately I cannot tell you with 100% certainty, since on this
server there are multiple block devices with blocksize either 1k or 4k.
So it is one of these. If you know a way to extract this information
from a vmcore file I'd be happy to do it.

> 
> 								Honza
> 
>>>>>> +		bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
>>>>>>  		if (!under_io) {
>>>>>>  #ifdef CONFIG_EXT4_FS_ENCRYPTION
>>>>>>  			if (ctx)
>>>>>> --
>>>>>> 2.5.0
>>>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2015-10-12 14:51 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-09  7:19 [RFC PATCH 1/2] ext4: Fix possible deadlock with local interrupts disabled and page-draining IPI Hillf Danton
2015-10-09  7:26 ` Nikolay Borisov
2015-10-09  7:37   ` Hillf Danton
2015-10-09  8:03     ` Nikolay Borisov
2015-10-09  8:03       ` Nikolay Borisov
2015-10-12 13:40       ` Jan Kara
2015-10-12 13:40         ` Jan Kara
2015-10-12 14:51         ` Nikolay Borisov [this message]
2015-10-12 14:51           ` Nikolay Borisov
2015-10-13  8:15           ` Jan Kara
2015-10-13  8:15             ` Jan Kara
2015-10-13 10:37             ` Nikolay Borisov
2015-10-13 10:37               ` Nikolay Borisov
2015-10-13 13:14               ` Jan Kara
2015-10-13 13:14                 ` Jan Kara
2015-10-14  9:02                 ` Nikolay Borisov
2015-10-14  9:02                   ` Nikolay Borisov
2015-10-16  8:08                 ` Nikolay Borisov
2015-10-16 12:51                   ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2015-10-08 15:31 Nikolay Borisov
     [not found] ` <CAOtvUMcrhq3epOPCEciMGq53S6rTyURAKEWhQ=NwrkF95aJ+_Q@mail.gmail.com>
2015-10-09  8:50   ` Nikolay Borisov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=561BC8DB.6070600@kyup.com \
    --to=kernel@kyup.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=gilad@benyossef.com \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mm@1h.com \
    --cc=operations@siteground.com \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.