Re: Making per-cpu lists draining dependant on a flag

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nikolay Borisov <kernel@kyup.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, mgorman@suse.de,
	Andrew Morton <akpm@linux-foundation.org>,
	Marian Marinov <mm@1h.com>,
	SiteGround Operations <operations@siteground.com>,
	Jan Kara <jack@suse.cz>
Subject: Re: Making per-cpu lists draining dependant on a flag
Date: Wed, 14 Oct 2015 12:06:21 +0300	[thread overview]
Message-ID: <561E1B0D.9050809@kyup.com> (raw)
In-Reply-To: <20151014083710.GF28333@dhcp22.suse.cz>



On 10/14/2015 11:37 AM, Michal Hocko wrote:
> On Fri 09-10-15 14:00:31, Nikolay Borisov wrote:
>> Hello mm people,
>>
>>
>> I want to ask you the following question which stemmed from analysing
>> and chasing this particular deadlock:
>> http://permalink.gmane.org/gmane.linux.kernel/2056730
> 
> This link doesn't seem to work properly for me. Could you post a
> http://lkml.kernel.org/r/$msg_id link please?
> 
>> To summarise it:
>>
>> For simplicity I will use the following nomenclature:
>> t1 - kworker/u96:0
>> t2 - kworker/u98:39
>> t3 - kworker/u98:7
>>
>> t1 issues drain_all_pages which generates IPI's, at the same time
>> however,
> 
> OK, as per
> http://lkml.kernel.org/r/1444318308-27560-1-git-send-email-kernel%40kyup.com
> drain_all_pages is called from the __alloc_pages_nodemask called from
> slab allocator. There is no stack leading to the allocation but then you
> are saying
> 
>> t2 has already started doing async write of pages
>> as part of its normal operation but is blocked upon t1 completion of
>> its IPI (generated from drain_all_pages) since they both work on the
>> same dm-thin volume.
> 
> which I read as the allocator is holding the same dm_bufio_lock, right?
> 
>> At the same time again, t3 is executing
>> ext4_finish_bio, which disables interrupts, yet is dependent on t2
>> completing its writes.
> 
> That would be a bug on its own because ext4_finish_bio seems to be
> called from SoftIRQ context so it cannot wait for a regular scheduling
> context. Whoever is holding that lock BH_Uptodate_Lock has to be in
> (soft)IRQ context.
> 
> <found the original thread on linux-mm finally - the threading got
> broken on the way>
> http://lkml.kernel.org/r/20151013131453.GA1332%40quack.suse.cz
> 
> So Jack (CCed) thinks this is a non-atomic update of flags and that
> indeed sounds plausible.
> 
>> But since it has disabled interrupts, it wont
>> respond to t1's IPI and at this point a hard lock up occurs. This
>> happens, since drain_all_pages calls on_each_cpu_mask with the last
>> argument equal to  "true" meaning "wait until the ipi handler has
>> finished", which of course will never happen in the described situation.
>>
>> Based on that I was wondering whether avoiding such situation might
>> merit making drain_all_pages invocation from
>> __alloc_pages_direct_reclaim dependent on a particular GFP being passed
>> e.g. GFP_NOPCPDRAIN or something along those lines?
> 
> I do not think so. Even if the dependency was real it would be a clear
> deadlock even without drain_all_pages AFAICS.
> 
>> Alternatively would it be possible to make the IPI asycnrhonous e.g.
>> calling on_each_cpu_mask with the last argument equal to false?
> 
> Strictly speaking the allocation path doesn't really depend on the sync
> behavior. We are just trying to release pages on pcp lists and retry the
> allocation. Even if the allocation context was faster than other CPUs
> and fail the request then we would try again without triggering the OOM
> because the reclaim has apparently made some progress.
> 
> Other callers might be more sensitive. Anyway this is called only if the
> allocator issues a sleeping allocation request so I think that waiting
> here is perfectly acceptable.

Thanks for taking the time to look over the issue. Indeed, I guess I
have been misled as to who the real culprit is, though the call traces
seemed to make the issue apparent. But kernel land seems to be a lot
more subtle :)

In any case I will test with Jack's patch and hopefully report that
everything is okay.

Nikolay

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

     prev parent reply	other threads:[~2015-10-14  9:06 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-09 11:00 Making per-cpu lists draining dependant on a flag Nikolay Borisov
2015-10-13 14:43 ` Michal Hocko
2015-10-13 14:55   ` Nikolay Borisov
2015-10-14  8:37 ` Michal Hocko
2015-10-14  9:06   ` Nikolay Borisov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=561E1B0D.9050809@kyup.com \
    --to=kernel@kyup.com \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=mm@1h.com \
    --cc=operations@siteground.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.