From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: gerald.schaefer@de.ibm.com,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Milton Miller <miltonm@bga.com>,
linux-kernel@vger.kernel.org, linux-mm@vger.kernel.org,
linux-ext4@vger.kernel.org, "Ted Ts'o" <tytso@mit.edu>,
Arun Bhanu <ab@arunbhanu.com>, Mel Gorman <mel@csn.ul.ie>,
Andrew Morton <akpm@linux-foundation.org>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>
Subject: Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
Date: Thu, 16 Dec 2010 21:47:22 -0800 [thread overview]
Message-ID: <20101217054722.GG2253@linux.vnet.ibm.com> (raw)
In-Reply-To: <AANLkTinfTCpdKo-A8gF+xbss24PNQXvQV6Odh51mXxiH@mail.gmail.com>
On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
> <gerald.schaefer@de.ibm.com> wrote:
> > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> > below. Both cases are easily reproducible: memory unplug with big page cache,
> > or adding large pages during run-time.
> >
> > ===================================================
> > [ INFO: suspicious rcu_dereference_check() usage. ]
> > ---------------------------------------------------
> > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> >
> > other info that might help us debug this:
> >
> >
> > rcu_scheduler_active = 1, debug_locks = 0
> > 1 lock held by bash/761:
> > #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
> >
> > stack backtrace:
> > CPU: 1 Not tainted 2.6.37-rc6 #4
> > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
> > 00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
> > 0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
> > 000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
> > 0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> > Call Trace:
> > ([<0000000000100b02>] show_trace+0xee/0x144)
> > [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
> > [<0000000000226c80>] migrate_page+0x38/0x68
> > [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
> > [<000000000022785a>] migrate_pages+0x496/0x568
> > [<000000000021e24e>] compact_zone+0x432/0x7d8
> > [<000000000021e772>] compact_zone_order+0x9e/0xbc
> > [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
> > [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
> > [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
> > [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
> > [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
> > [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
> > [<00000000002a65be>] proc_sys_write+0x26/0x34
> > [<00000000002336e0>] vfs_write+0xac/0x18c
> > [<00000000002338bc>] SyS_write+0x58/0xa8
> > [<0000000000113976>] sysc_noemu+0x16/0x1c
> > [<0000020000162edc>] 0x20000162edc
> > INFO: lockdep is turned off.
> >
> > I honestly do not understand 100% why this is a false positive, seeing that
> > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> > the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> > properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> > even if it is not necessary for synchronization, would get rid of the warning,
> > like in the following patch. Any ideas?
>
> In case of anon page, we hold rcu_read_lock in unmap_and_move.
> The problem is file-backed page. In case of that, we hold lock_page
> and mapping->tree_lock as update-side lock.
> So we don't need rcu_read_lock.
>
> >
> > ---
> > fs/hugetlbfs/inode.c | 2 ++
> > mm/migrate.c | 4 ++++
> > 2 files changed, 6 insertions(+)
> >
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
> > {
> > int rc;
> >
> > + rcu_read_lock();
> > rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> > + rcu_read_unlock();
> > if (rc)
> > return rc;
> > migrate_page_copy(newpage, page);
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
> >
> > BUG_ON(PageWriteback(page)); /* Writeback must be complete */
> >
> > + rcu_read_lock();
> > rc = migrate_page_move_mapping(mapping, newpage, page);
> > + rcu_read_unlock();
> >
> > if (rc)
> > return rc;
> > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
> >
> > head = page_buffers(page);
> >
> > + rcu_read_lock();
> > rc = migrate_page_move_mapping(mapping, newpage, page);
> > + rcu_read_unlock();
> >
> > if (rc)
> > return rc;
> >
> >
> >
>
>
> How about this?
> Maybe Paul have better idea.
> (It's apparently be word-wrapped.)
>
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index ab2baa5..135af1e 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
> }
>
> /**
> + * radix_tree_deref_slot_nocheck - dereference a slot without RCU check
> + * @pslot: pointer to slot, returned by radix_tree_lookup_slot
> + * Returns: item that was stored in that slot with any direct pointer flag
> + * removed.
> + *
> + * This functions works like radix_tree_deref_slot except it doesn't check
> + * RCU rule. Normally this funcion is used with update-side lock.
> + * You should use this function very carefully.
> + */
> +static inline void *radix_tree_deref_slot_nocheck(void **pslot)
> +{
> + return rcu_dereference_protected(*pslot, 1);
I suggest replacing the "1" with lockdep expressions for the locks
that you say might be held:
return rcu_dereference_check(*pslot,
lockdep_is_held(&mapping->tree_lock));
This assumes that when you said "and" you meant both lock_page() and
mapping->tree_lock. Also you need to pass in the mapping, which
should not be a problem given likely inlining.
If you meant that either mapping->tree_lock or page_lock() might be
held, I suppose that the page_lock() state could be passed in, but
perhaps better to take a general lockdep expression.
So, either or both? ;-)
Thanx, Paul
> +}
> +/**
> * radix_tree_deref_retry - check radix_tree_deref_slot
> * @arg: pointer returned by radix_tree_deref_slot
> * Returns: 0 if retry is not required, otherwise retry is required
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 2eb2243..5be2841 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct
> address_space *mappin
>
> expected_count = 2 + page_has_private(page);
> if (page_count(page) != expected_count ||
> - (struct page *)radix_tree_deref_slot(pslot) != page) {
> + (struct page *)radix_tree_deref_slot_nocheck(pslot)
> + != page) {
> spin_unlock_irq(&mapping->tree_lock);
> return -EAGAIN;
> }
> @@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct
> address_space *mapping,
>
> expected_count = 2 + page_has_private(page);
> if (page_count(page) != expected_count ||
> - (struct page *)radix_tree_deref_slot(pslot) != page) {
> + (struct page *)radix_tree_deref_slot_nocheck(pslot)
> + != page) {
> spin_unlock_irq(&mapping->tree_lock);
> return -EAGAIN;
> }
>
>
>
>
> --
> Kind regards,
> Minchan Kim
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: gerald.schaefer@de.ibm.com,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Milton Miller <miltonm@bga.com>,
linux-kernel@vger.kernel.org, linux-mm@vger.kernel.org,
linux-ext4@vger.kernel.org, "Ted Ts'o" <tytso@mit.edu>,
Arun Bhanu <ab@arunbhanu.com>, Mel Gorman <mel@csn.ul.ie>,
Andrew Morton <akpm@linux-foundation.org>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>
Subject: Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
Date: Thu, 16 Dec 2010 21:47:22 -0800 [thread overview]
Message-ID: <20101217054722.GG2253@linux.vnet.ibm.com> (raw)
In-Reply-To: <AANLkTinfTCpdKo-A8gF+xbss24PNQXvQV6Odh51mXxiH@mail.gmail.com>
On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
> <gerald.schaefer@de.ibm.com> wrote:
> > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> > below. Both cases are easily reproducible: memory unplug with big page cache,
> > or adding large pages during run-time.
> >
> > ===================================================
> > [ INFO: suspicious rcu_dereference_check() usage. ]
> > ---------------------------------------------------
> > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> >
> > other info that might help us debug this:
> >
> >
> > rcu_scheduler_active = 1, debug_locks = 0
> > 1 lock held by bash/761:
> > #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
> >
> > stack backtrace:
> > CPU: 1 Not tainted 2.6.37-rc6 #4
> > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
> > 00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
> > 0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
> > 000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
> > 0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> > Call Trace:
> > ([<0000000000100b02>] show_trace+0xee/0x144)
> > [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
> > [<0000000000226c80>] migrate_page+0x38/0x68
> > [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
> > [<000000000022785a>] migrate_pages+0x496/0x568
> > [<000000000021e24e>] compact_zone+0x432/0x7d8
> > [<000000000021e772>] compact_zone_order+0x9e/0xbc
> > [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
> > [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
> > [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
> > [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
> > [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
> > [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
> > [<00000000002a65be>] proc_sys_write+0x26/0x34
> > [<00000000002336e0>] vfs_write+0xac/0x18c
> > [<00000000002338bc>] SyS_write+0x58/0xa8
> > [<0000000000113976>] sysc_noemu+0x16/0x1c
> > [<0000020000162edc>] 0x20000162edc
> > INFO: lockdep is turned off.
> >
> > I honestly do not understand 100% why this is a false positive, seeing that
> > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> > the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> > properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> > even if it is not necessary for synchronization, would get rid of the warning,
> > like in the following patch. Any ideas?
>
> In case of anon page, we hold rcu_read_lock in unmap_and_move.
> The problem is file-backed page. In case of that, we hold lock_page
> and mapping->tree_lock as update-side lock.
> So we don't need rcu_read_lock.
>
> >
> > ---
> > fs/hugetlbfs/inode.c | 2 ++
> > mm/migrate.c | 4 ++++
> > 2 files changed, 6 insertions(+)
> >
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
> > {
> > int rc;
> >
> > + rcu_read_lock();
> > rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> > + rcu_read_unlock();
> > if (rc)
> > return rc;
> > migrate_page_copy(newpage, page);
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
> >
> > BUG_ON(PageWriteback(page)); /* Writeback must be complete */
> >
> > + rcu_read_lock();
> > rc = migrate_page_move_mapping(mapping, newpage, page);
> > + rcu_read_unlock();
> >
> > if (rc)
> > return rc;
> > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
> >
> > head = page_buffers(page);
> >
> > + rcu_read_lock();
> > rc = migrate_page_move_mapping(mapping, newpage, page);
> > + rcu_read_unlock();
> >
> > if (rc)
> > return rc;
> >
> >
> >
>
>
> How about this?
> Maybe Paul have better idea.
> (It's apparently be word-wrapped.)
>
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index ab2baa5..135af1e 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
> }
>
> /**
> + * radix_tree_deref_slot_nocheck - dereference a slot without RCU check
> + * @pslot: pointer to slot, returned by radix_tree_lookup_slot
> + * Returns: item that was stored in that slot with any direct pointer flag
> + * removed.
> + *
> + * This functions works like radix_tree_deref_slot except it doesn't check
> + * RCU rule. Normally this funcion is used with update-side lock.
> + * You should use this function very carefully.
> + */
> +static inline void *radix_tree_deref_slot_nocheck(void **pslot)
> +{
> + return rcu_dereference_protected(*pslot, 1);
I suggest replacing the "1" with lockdep expressions for the locks
that you say might be held:
return rcu_dereference_check(*pslot,
lockdep_is_held(&mapping->tree_lock));
This assumes that when you said "and" you meant both lock_page() and
mapping->tree_lock. Also you need to pass in the mapping, which
should not be a problem given likely inlining.
If you meant that either mapping->tree_lock or page_lock() might be
held, I suppose that the page_lock() state could be passed in, but
perhaps better to take a general lockdep expression.
So, either or both? ;-)
Thanx, Paul
> +}
> +/**
> * radix_tree_deref_retry - check radix_tree_deref_slot
> * @arg: pointer returned by radix_tree_deref_slot
> * Returns: 0 if retry is not required, otherwise retry is required
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 2eb2243..5be2841 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct
> address_space *mappin
>
> expected_count = 2 + page_has_private(page);
> if (page_count(page) != expected_count ||
> - (struct page *)radix_tree_deref_slot(pslot) != page) {
> + (struct page *)radix_tree_deref_slot_nocheck(pslot)
> + != page) {
> spin_unlock_irq(&mapping->tree_lock);
> return -EAGAIN;
> }
> @@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct
> address_space *mapping,
>
> expected_count = 2 + page_has_private(page);
> if (page_count(page) != expected_count ||
> - (struct page *)radix_tree_deref_slot(pslot) != page) {
> + (struct page *)radix_tree_deref_slot_nocheck(pslot)
> + != page) {
> spin_unlock_irq(&mapping->tree_lock);
> return -EAGAIN;
> }
>
>
>
>
> --
> Kind regards,
> Minchan Kim
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2010-12-17 5:47 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-21 11:26 [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage Arun Bhanu
2010-11-21 13:30 ` Ted Ts'o
2010-11-21 13:30 ` Ted Ts'o
2010-11-21 13:30 ` Ted Ts'o
2010-11-21 15:39 ` Minchan Kim
2010-11-21 15:39 ` Minchan Kim
2010-11-21 15:39 ` Minchan Kim
2010-11-21 17:37 ` Ted Ts'o
2010-11-21 17:37 ` Ted Ts'o
2010-11-22 0:38 ` Minchan Kim
2010-11-22 0:38 ` Minchan Kim
2010-11-22 0:38 ` Minchan Kim
2010-11-22 3:31 ` Milton Miller
2010-11-22 6:16 ` Paul E. McKenney
2010-12-07 19:01 ` [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! Gerald Schaefer
2010-12-08 1:19 ` KAMEZAWA Hiroyuki
2010-12-16 13:50 ` Gerald Schaefer
2010-12-17 0:04 ` Minchan Kim
2010-12-17 0:04 ` Minchan Kim
2010-12-17 5:47 ` Paul E. McKenney [this message]
2010-12-17 5:47 ` Paul E. McKenney
2010-12-17 5:59 ` Eric Dumazet
2010-12-17 5:59 ` Eric Dumazet
2010-12-17 15:08 ` Minchan Kim
2010-12-17 16:03 ` Paul E. McKenney
2010-12-17 16:03 ` Paul E. McKenney
2010-12-17 8:39 ` Mel Gorman
2010-12-17 8:39 ` Mel Gorman
2010-12-17 9:28 ` Mel Gorman
2010-12-17 15:22 ` Minchan Kim
2010-12-17 15:13 ` Minchan Kim
2010-12-17 15:13 ` Minchan Kim
2010-12-17 16:01 ` Paul E. McKenney
2010-12-17 16:01 ` Paul E. McKenney
2010-11-23 7:16 ` [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage KOSAKI Motohiro
2010-11-23 7:16 ` KOSAKI Motohiro
2010-11-23 7:16 ` KOSAKI Motohiro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101217054722.GG2253@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=ab@arunbhanu.com \
--cc=akpm@linux-foundation.org \
--cc=gerald.schaefer@de.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@vger.kernel.org \
--cc=mel@csn.ul.ie \
--cc=miltonm@bga.com \
--cc=minchan.kim@gmail.com \
--cc=schwidefsky@de.ibm.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.