public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	Qian Cai <cai@lca.pw>, Eric Sandeen <sandeen@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 1/2] sched: Add PF_MEMALLOC_NOLOCKDEP flag
Date: Wed, 17 Jun 2020 21:32:58 -0400	[thread overview]
Message-ID: <7371c5dc-7ba7-6d86-75ca-43bedfa6b24f@redhat.com> (raw)
In-Reply-To: <20200618000110.GF2005@dread.disaster.area>

On 6/17/20 8:01 PM, Dave Chinner wrote:
> On Wed, Jun 17, 2020 at 01:53:09PM -0400, Waiman Long wrote:
>> There are cases where calling kmalloc() can lead to false positive
>> lockdep splat. One notable example that can happen in the freezing of
>> the xfs filesystem is as follows:
>>
>>   Possible unsafe locking scenario:
>>
>>         CPU0                    CPU1
>>         ----                    ----
>>    lock(sb_internal);
>>                                 lock(fs_reclaim);
>>                                 lock(sb_internal);
>>    lock(fs_reclaim);
>>
>>   *** DEADLOCK ***
>>
>> This is a false positive as all the dirty pages are flushed out before
>> the filesystem can be frozen. However, there is no easy way to modify
>> lockdep to handle this situation properly.
>>
>> One possible workaround is to disable lockdep by setting __GFP_NOLOCKDEP
>> in the appropriate kmalloc() calls.  However, it will be cumbersome to
>> locate all the right kmalloc() calls to insert __GFP_NOLOCKDEP and it
>> is easy to miss some especially when the code is updated in the future.
>>
>> Another alternative is to have a per-process global state that indicates
>> the equivalent of __GFP_NOLOCKDEP without the need to set the gfp_t flag
>> individually. To allow the latter case, a new PF_MEMALLOC_NOLOCKDEP
>> per-process flag is now added. After adding this new bit, there are
>> still 2 free bits left.
>>
>> Suggested-by: Dave Chinner <david@fromorbit.com>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   include/linux/sched.h    |  7 +++++++
>>   include/linux/sched/mm.h | 15 ++++++++++-----
>>   2 files changed, 17 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index b62e6aaf28f0..44247cbc9073 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1508,6 +1508,7 @@ extern struct pid *cad_pid;
>>   #define PF_MEMALLOC_NOIO	0x00080000	/* All allocation requests will inherit GFP_NOIO */
>>   #define PF_LOCAL_THROTTLE	0x00100000	/* Throttle writes only against the bdi I write to,
>>   						 * I am cleaning dirty pages from some other bdi. */
>> +#define __PF_MEMALLOC_NOLOCKDEP	0x00100000	/* All allocation requests will inherit __GFP_NOLOCKDEP */
> Why is this considered a safe thing to do? Any context that sets
> __PF_MEMALLOC_NOLOCKDEP will now behave differently in memory
> reclaim as it will think that PF_LOCAL_THROTTLE is set when lockdep
> is enabled.

Oh, my mistake, it should be 0x01000000 which is not currently being 
used. Thank for catching that. I will repost a new version. I have no 
intention to reuse any existing bit. As said in the commit log, there 
are actually 2 more free bits left.


>
>>   #define PF_KTHREAD		0x00200000	/* I am a kernel thread */
>>   #define PF_RANDOMIZE		0x00400000	/* Randomize virtual address space */
>>   #define PF_SWAPWRITE		0x00800000	/* Allowed to write to swap */
>> @@ -1519,6 +1520,12 @@ extern struct pid *cad_pid;
>>   #define PF_FREEZER_SKIP		0x40000000	/* Freezer should not count it as freezable */
>>   #define PF_SUSPEND_TASK		0x80000000      /* This thread called freeze_processes() and should not be frozen */
>>   
>> +#ifdef CONFIG_LOCKDEP
>> +#define PF_MEMALLOC_NOLOCKDEP	__PF_MEMALLOC_NOLOCKDEP
>> +#else
>> +#define PF_MEMALLOC_NOLOCKDEP	0
>> +#endif
>> +
>>   /*
>>    * Only the _current_ task can read/write to tsk->flags, but other
>>    * tasks can access tsk->flags in readonly mode for example
>> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
>> index 480a4d1b7dd8..4a076a148568 100644
>> --- a/include/linux/sched/mm.h
>> +++ b/include/linux/sched/mm.h
>> @@ -177,22 +177,27 @@ static inline bool in_vfork(struct task_struct *tsk)
>>    * Applies per-task gfp context to the given allocation flags.
>>    * PF_MEMALLOC_NOIO implies GFP_NOIO
>>    * PF_MEMALLOC_NOFS implies GFP_NOFS
>> + * PF_MEMALLOC_NOLOCKDEP implies __GFP_NOLOCKDEP
>>    * PF_MEMALLOC_NOCMA implies no allocation from CMA region.
>>    */
>>   static inline gfp_t current_gfp_context(gfp_t flags)
>>   {
>> -	if (unlikely(current->flags &
>> -		     (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_NOCMA))) {
>> +	unsigned int pflags = current->flags;
>> +
>> +	if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS |
>> +			       PF_MEMALLOC_NOCMA | PF_MEMALLOC_NOLOCKDEP))) {
> That needs a PF_MEMALLOC_MASK.

Will add that in the next version.

Thanks,
Longman


  reply	other threads:[~2020-06-18  1:34 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-17 17:53 [PATCH v2 0/2] sched, xfs: Add PF_MEMALLOC_NOLOCKDEP to fix lockdep problem in xfs Waiman Long
2020-06-17 17:53 ` [PATCH v2 1/2] sched: Add PF_MEMALLOC_NOLOCKDEP flag Waiman Long
2020-06-18  0:01   ` Dave Chinner
2020-06-18  1:32     ` Waiman Long [this message]
2020-06-22 19:16     ` Peter Zijlstra
2020-06-17 17:53 ` [PATCH v2 2/2] xfs: Fix false positive lockdep warning with sb_internal & fs_reclaim Waiman Long
2020-06-18  0:45   ` Dave Chinner
2020-06-18  1:35     ` Waiman Long
2020-06-18  1:36     ` Darrick J. Wong
2020-06-19 13:21   ` Christoph Hellwig
2020-06-19 15:08     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7371c5dc-7ba7-6d86-75ca-43bedfa6b24f@redhat.com \
    --to=longman@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cai@lca.pw \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sandeen@redhat.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox