From: Dave Chinner <david@fromorbit.com>
To: Waiman Long <longman@redhat.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org,
Qian Cai <cai@lca.pw>, Eric Sandeen <sandeen@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 1/2] sched: Add PF_MEMALLOC_NOLOCKDEP flag
Date: Thu, 18 Jun 2020 10:01:10 +1000 [thread overview]
Message-ID: <20200618000110.GF2005@dread.disaster.area> (raw)
In-Reply-To: <20200617175310.20912-2-longman@redhat.com>
On Wed, Jun 17, 2020 at 01:53:09PM -0400, Waiman Long wrote:
> There are cases where calling kmalloc() can lead to false positive
> lockdep splat. One notable example that can happen in the freezing of
> the xfs filesystem is as follows:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(sb_internal);
> lock(fs_reclaim);
> lock(sb_internal);
> lock(fs_reclaim);
>
> *** DEADLOCK ***
>
> This is a false positive as all the dirty pages are flushed out before
> the filesystem can be frozen. However, there is no easy way to modify
> lockdep to handle this situation properly.
>
> One possible workaround is to disable lockdep by setting __GFP_NOLOCKDEP
> in the appropriate kmalloc() calls. However, it will be cumbersome to
> locate all the right kmalloc() calls to insert __GFP_NOLOCKDEP and it
> is easy to miss some especially when the code is updated in the future.
>
> Another alternative is to have a per-process global state that indicates
> the equivalent of __GFP_NOLOCKDEP without the need to set the gfp_t flag
> individually. To allow the latter case, a new PF_MEMALLOC_NOLOCKDEP
> per-process flag is now added. After adding this new bit, there are
> still 2 free bits left.
>
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> include/linux/sched.h | 7 +++++++
> include/linux/sched/mm.h | 15 ++++++++++-----
> 2 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b62e6aaf28f0..44247cbc9073 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1508,6 +1508,7 @@ extern struct pid *cad_pid;
> #define PF_MEMALLOC_NOIO 0x00080000 /* All allocation requests will inherit GFP_NOIO */
> #define PF_LOCAL_THROTTLE 0x00100000 /* Throttle writes only against the bdi I write to,
> * I am cleaning dirty pages from some other bdi. */
> +#define __PF_MEMALLOC_NOLOCKDEP 0x00100000 /* All allocation requests will inherit __GFP_NOLOCKDEP */
Why is this considered a safe thing to do? Any context that sets
__PF_MEMALLOC_NOLOCKDEP will now behave differently in memory
reclaim as it will think that PF_LOCAL_THROTTLE is set when lockdep
is enabled.
> #define PF_KTHREAD 0x00200000 /* I am a kernel thread */
> #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */
> #define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */
> @@ -1519,6 +1520,12 @@ extern struct pid *cad_pid;
> #define PF_FREEZER_SKIP 0x40000000 /* Freezer should not count it as freezable */
> #define PF_SUSPEND_TASK 0x80000000 /* This thread called freeze_processes() and should not be frozen */
>
> +#ifdef CONFIG_LOCKDEP
> +#define PF_MEMALLOC_NOLOCKDEP __PF_MEMALLOC_NOLOCKDEP
> +#else
> +#define PF_MEMALLOC_NOLOCKDEP 0
> +#endif
> +
> /*
> * Only the _current_ task can read/write to tsk->flags, but other
> * tasks can access tsk->flags in readonly mode for example
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 480a4d1b7dd8..4a076a148568 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -177,22 +177,27 @@ static inline bool in_vfork(struct task_struct *tsk)
> * Applies per-task gfp context to the given allocation flags.
> * PF_MEMALLOC_NOIO implies GFP_NOIO
> * PF_MEMALLOC_NOFS implies GFP_NOFS
> + * PF_MEMALLOC_NOLOCKDEP implies __GFP_NOLOCKDEP
> * PF_MEMALLOC_NOCMA implies no allocation from CMA region.
> */
> static inline gfp_t current_gfp_context(gfp_t flags)
> {
> - if (unlikely(current->flags &
> - (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_NOCMA))) {
> + unsigned int pflags = current->flags;
> +
> + if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS |
> + PF_MEMALLOC_NOCMA | PF_MEMALLOC_NOLOCKDEP))) {
That needs a PF_MEMALLOC_MASK.
And, really, if we are playing "re-use existing bits" games because
we've run out of process flags, all these memalloc flags should be
moved to a new field in the task, say current->memalloc_flags. You
could also move PF_SWAPWRITE, PF_LOCAL_THROTTLE, and PF_KSWAPD into
that field as well as they are all memory allocation context process
flags...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2020-06-18 0:02 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-17 17:53 [PATCH v2 0/2] sched, xfs: Add PF_MEMALLOC_NOLOCKDEP to fix lockdep problem in xfs Waiman Long
2020-06-17 17:53 ` [PATCH v2 1/2] sched: Add PF_MEMALLOC_NOLOCKDEP flag Waiman Long
2020-06-18 0:01 ` Dave Chinner [this message]
2020-06-18 1:32 ` Waiman Long
2020-06-22 19:16 ` Peter Zijlstra
2020-06-17 17:53 ` [PATCH v2 2/2] xfs: Fix false positive lockdep warning with sb_internal & fs_reclaim Waiman Long
2020-06-18 0:45 ` Dave Chinner
2020-06-18 1:35 ` Waiman Long
2020-06-18 1:36 ` Darrick J. Wong
2020-06-19 13:21 ` Christoph Hellwig
2020-06-19 15:08 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200618000110.GF2005@dread.disaster.area \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=cai@lca.pw \
--cc=darrick.wong@oracle.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=sandeen@redhat.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox