From: Peter Zijlstra <peterz@infradead.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
Russell King - ARM Linux <linux@arm.linux.org.uk>,
Tony Luck <tony.luck@intel.com>
Subject: Re: Dirty/Access bits vs. page content
Date: Tue, 22 Apr 2014 09:54:59 +0200 [thread overview]
Message-ID: <20140422075459.GD11182@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <CA+55aFzFxBDJ2rWo9DggdNsq-qBCr11OVXnm64jx04KMSVCBAw@mail.gmail.com>
On Mon, Apr 21, 2014 at 05:44:45PM -0700, Linus Torvalds wrote:
> From d26515fe19d5850aa69881ee6ae193e068f22ba1 Mon Sep 17 00:00:00 2001
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Mon, 21 Apr 2014 17:35:35 -0700
> Subject: [PATCH 2/2] mm: make the generic TLB flush batching correctly dirty
> the page at the end
>
> When unmapping dirty shared mappings, the page should be dirtied after
> doing the TLB flush. This does that by hiding the dirty bit in the low
> bit of the "struct page" pointer in the TLB gather batching array, and
> teaching free_pages_and_swap_cache() to mark the pages dirty at the end.
>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Peter Anvin <hpa@zytor.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: linux-arch@vger.kernel.org
> Cc: linux-mm@kvack.org
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
> mm/memory.c | 5 +----
> mm/swap.c | 8 +++++++-
> mm/swap_state.c | 14 ++++++++++++--
> 3 files changed, 20 insertions(+), 7 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 62fdcd1995f4..174542ab2b90 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -283,11 +283,8 @@ int __tlb_remove_page(struct mmu_gather *tlb, struct page *page, bool dirty)
>
> VM_BUG_ON(!tlb->need_flush);
>
> - /* FIXME! This needs to be batched too */
> - if (dirty)
> - set_page_dirty(page);
> batch = tlb->active;
> - batch->pages[batch->nr++] = page;
> + batch->pages[batch->nr++] = (void *) (dirty + (unsigned long)page);
Space between cast and expression.
> if (batch->nr == batch->max) {
> if (!tlb_next_batch(tlb))
> return 0;
> diff --git a/mm/swap.c b/mm/swap.c
> index 9ce43ba4498b..1a58c58c7f41 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -821,8 +821,14 @@ void release_pages(struct page **pages, int nr, int cold)
> struct lruvec *lruvec;
> unsigned long uninitialized_var(flags);
>
> + /*
> + * NOTE! The low bit of the struct page pointer in
> + * the "pages[]" array is used as a dirty bit, so
> + * we ignore it
> + */
> for (i = 0; i < nr; i++) {
> - struct page *page = pages[i];
> + unsigned long pageval = (unsigned long)pages[i];
> + struct page *page = (void *)(~1ul & pageval);
No space between cast and expression.
Should we create some pointer bitops helpers? We do this casting all
over the place, maybe its time to make it pretty?
static inline void *ptr_or(void *ptr, unsigned long val)
{
WARN_ON(val & ~0x03); /* too bad __alignof__ is 'broken' */
return (void *)((unsigned long)ptr | val);
}
static inline void *ptr_mask(void *ptr)
{
return (void *)((unsigned long)ptr & ~0x03);
}
static inline unsigned long ptr_and(void *ptr, unsigned long val)
{
WARN_ON(val & ~0x03);
return (unsigned long)ptr & val;
}
> if (unlikely(PageCompound(page))) {
> if (zone) {
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index e76ace30d436..bb0b2d675a82 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -258,6 +258,11 @@ void free_page_and_swap_cache(struct page *page)
> /*
> * Passed an array of pages, drop them all from swapcache and then release
> * them. They are removed from the LRU and freed if this is their last use.
> + *
> + * NOTE! The low bit of the "struct page" pointers passed in is a dirty
> + * indicator, saying that the page needs to be marked dirty before freeing.
> + *
> + * release_pages() itself ignores that bit.
> */
> void free_pages_and_swap_cache(struct page **pages, int nr)
> {
> @@ -268,8 +273,13 @@ void free_pages_and_swap_cache(struct page **pages, int nr)
> int todo = min(nr, PAGEVEC_SIZE);
> int i;
>
> - for (i = 0; i < todo; i++)
> - free_swap_cache(pagep[i]);
> + for (i = 0; i < todo; i++) {
> + unsigned long pageval = (unsigned long) pagep[i];
> + struct page *page = (void *)(~1ul & pageval);
> + if (pageval & 1)
> + set_page_dirty(page);
> + free_swap_cache(page);
> + }
> release_pages(pagep, todo, 0);
> pagep += todo;
> nr -= todo;
So PAGE_FLAGS_CHECK_AT_FREE doesn't include PG_dirty, so while we now
properly mark the page dirty, we could continue and simply free the
thing?
I suppose the pagecache has a ref on and there's no window where we
could drop that before doing this free (didn't check).
But my main point was; should we check for the dirty bit when freeing
the page?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
Russell King - ARM Linux <linux@arm.linux.org.uk>,
Tony Luck <tony.luck@intel.com>
Subject: Re: Dirty/Access bits vs. page content
Date: Tue, 22 Apr 2014 09:54:59 +0200 [thread overview]
Message-ID: <20140422075459.GD11182@twins.programming.kicks-ass.net> (raw)
Message-ID: <20140422075459.XY2yNogTB0vAjIlKF5yz6Fx3FD4_W70uOo9YU9iXFR8@z> (raw)
In-Reply-To: <CA+55aFzFxBDJ2rWo9DggdNsq-qBCr11OVXnm64jx04KMSVCBAw@mail.gmail.com>
On Mon, Apr 21, 2014 at 05:44:45PM -0700, Linus Torvalds wrote:
> From d26515fe19d5850aa69881ee6ae193e068f22ba1 Mon Sep 17 00:00:00 2001
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Mon, 21 Apr 2014 17:35:35 -0700
> Subject: [PATCH 2/2] mm: make the generic TLB flush batching correctly dirty
> the page at the end
>
> When unmapping dirty shared mappings, the page should be dirtied after
> doing the TLB flush. This does that by hiding the dirty bit in the low
> bit of the "struct page" pointer in the TLB gather batching array, and
> teaching free_pages_and_swap_cache() to mark the pages dirty at the end.
>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Peter Anvin <hpa@zytor.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: linux-arch@vger.kernel.org
> Cc: linux-mm@kvack.org
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
> mm/memory.c | 5 +----
> mm/swap.c | 8 +++++++-
> mm/swap_state.c | 14 ++++++++++++--
> 3 files changed, 20 insertions(+), 7 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 62fdcd1995f4..174542ab2b90 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -283,11 +283,8 @@ int __tlb_remove_page(struct mmu_gather *tlb, struct page *page, bool dirty)
>
> VM_BUG_ON(!tlb->need_flush);
>
> - /* FIXME! This needs to be batched too */
> - if (dirty)
> - set_page_dirty(page);
> batch = tlb->active;
> - batch->pages[batch->nr++] = page;
> + batch->pages[batch->nr++] = (void *) (dirty + (unsigned long)page);
Space between cast and expression.
> if (batch->nr == batch->max) {
> if (!tlb_next_batch(tlb))
> return 0;
> diff --git a/mm/swap.c b/mm/swap.c
> index 9ce43ba4498b..1a58c58c7f41 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -821,8 +821,14 @@ void release_pages(struct page **pages, int nr, int cold)
> struct lruvec *lruvec;
> unsigned long uninitialized_var(flags);
>
> + /*
> + * NOTE! The low bit of the struct page pointer in
> + * the "pages[]" array is used as a dirty bit, so
> + * we ignore it
> + */
> for (i = 0; i < nr; i++) {
> - struct page *page = pages[i];
> + unsigned long pageval = (unsigned long)pages[i];
> + struct page *page = (void *)(~1ul & pageval);
No space between cast and expression.
Should we create some pointer bitops helpers? We do this casting all
over the place, maybe its time to make it pretty?
static inline void *ptr_or(void *ptr, unsigned long val)
{
WARN_ON(val & ~0x03); /* too bad __alignof__ is 'broken' */
return (void *)((unsigned long)ptr | val);
}
static inline void *ptr_mask(void *ptr)
{
return (void *)((unsigned long)ptr & ~0x03);
}
static inline unsigned long ptr_and(void *ptr, unsigned long val)
{
WARN_ON(val & ~0x03);
return (unsigned long)ptr & val;
}
> if (unlikely(PageCompound(page))) {
> if (zone) {
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index e76ace30d436..bb0b2d675a82 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -258,6 +258,11 @@ void free_page_and_swap_cache(struct page *page)
> /*
> * Passed an array of pages, drop them all from swapcache and then release
> * them. They are removed from the LRU and freed if this is their last use.
> + *
> + * NOTE! The low bit of the "struct page" pointers passed in is a dirty
> + * indicator, saying that the page needs to be marked dirty before freeing.
> + *
> + * release_pages() itself ignores that bit.
> */
> void free_pages_and_swap_cache(struct page **pages, int nr)
> {
> @@ -268,8 +273,13 @@ void free_pages_and_swap_cache(struct page **pages, int nr)
> int todo = min(nr, PAGEVEC_SIZE);
> int i;
>
> - for (i = 0; i < todo; i++)
> - free_swap_cache(pagep[i]);
> + for (i = 0; i < todo; i++) {
> + unsigned long pageval = (unsigned long) pagep[i];
> + struct page *page = (void *)(~1ul & pageval);
> + if (pageval & 1)
> + set_page_dirty(page);
> + free_swap_cache(page);
> + }
> release_pages(pagep, todo, 0);
> pagep += todo;
> nr -= todo;
So PAGE_FLAGS_CHECK_AT_FREE doesn't include PG_dirty, so while we now
properly mark the page dirty, we could continue and simply free the
thing?
I suppose the pagecache has a ref on and there's no window where we
could drop that before doing this free (didn't check).
But my main point was; should we check for the dirty bit when freeing
the page?
next prev parent reply other threads:[~2014-04-22 7:54 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1398032742.19682.11.camel@pasglop>
[not found] ` <CA+55aFz1sK+PF96LYYZY7OB7PBpxZu-uNLWLvPiRz-tJsBqX3w@mail.gmail.com>
[not found] ` <1398054064.19682.32.camel@pasglop>
[not found] ` <1398057630.19682.38.camel@pasglop>
[not found] ` <CA+55aFwWHBtihC3w9E4+j4pz+6w7iTnYhTf4N3ie15BM9thxLQ@mail.gmail.com>
[not found] ` <53558507.9050703@zytor.com>
2014-04-21 22:29 ` Dirty/Access bits vs. page content Linus Torvalds
2014-04-21 22:44 ` Dave Hansen
2014-04-22 0:31 ` Linus Torvalds
2014-04-22 0:44 ` Linus Torvalds
2014-04-22 5:15 ` Tony Luck
2014-04-22 5:15 ` Tony Luck
2014-04-22 14:55 ` Linus Torvalds
2014-04-22 14:55 ` Linus Torvalds
2014-04-22 7:34 ` Peter Zijlstra
2014-04-22 7:34 ` Peter Zijlstra
2014-04-22 7:54 ` Peter Zijlstra [this message]
2014-04-22 7:54 ` Peter Zijlstra
2014-04-22 21:36 ` Linus Torvalds
2014-04-22 21:36 ` Linus Torvalds
2014-04-22 21:46 ` Dave Hansen
2014-04-22 21:46 ` Dave Hansen
2014-04-22 22:08 ` Linus Torvalds
2014-04-22 22:08 ` Linus Torvalds
2014-04-22 22:41 ` Dave Hansen
2014-04-22 22:41 ` Dave Hansen
2014-04-23 2:44 ` Linus Torvalds
2014-04-23 2:44 ` Linus Torvalds
2014-04-23 3:08 ` Hugh Dickins
2014-04-23 3:08 ` Hugh Dickins
2014-04-23 3:08 ` Hugh Dickins
2014-04-23 4:23 ` Linus Torvalds
2014-04-23 4:23 ` Linus Torvalds
2014-04-23 6:14 ` Benjamin Herrenschmidt
2014-04-23 6:14 ` Benjamin Herrenschmidt
2014-04-23 18:41 ` Jan Kara
2014-04-23 18:41 ` Jan Kara
2014-04-23 19:33 ` Linus Torvalds
2014-04-24 6:51 ` Peter Zijlstra
2014-04-24 6:51 ` Peter Zijlstra
2014-04-24 18:40 ` Hugh Dickins
2014-04-24 18:40 ` Hugh Dickins
2014-04-24 19:45 ` Linus Torvalds
2014-04-24 19:45 ` Linus Torvalds
2014-04-24 20:02 ` Hugh Dickins
2014-04-24 20:02 ` Hugh Dickins
2014-04-24 23:46 ` Linus Torvalds
2014-04-25 1:37 ` Benjamin Herrenschmidt
2014-04-25 1:37 ` Benjamin Herrenschmidt
2014-04-25 2:41 ` Benjamin Herrenschmidt
2014-04-25 2:46 ` Linus Torvalds
2014-04-25 2:46 ` Linus Torvalds
2014-04-25 2:50 ` H. Peter Anvin
2014-04-25 2:50 ` H. Peter Anvin
2014-04-25 2:50 ` H. Peter Anvin
2014-04-25 3:03 ` Linus Torvalds
2014-04-25 3:03 ` Linus Torvalds
2014-04-25 12:01 ` Hugh Dickins
2014-04-25 12:01 ` Hugh Dickins
2014-04-25 13:51 ` Peter Zijlstra
2014-04-25 13:51 ` Peter Zijlstra
2014-04-25 19:41 ` Hugh Dickins
2014-04-25 19:41 ` Hugh Dickins
2014-04-26 18:07 ` Peter Zijlstra
2014-04-26 18:07 ` Peter Zijlstra
2014-04-27 7:20 ` Peter Zijlstra
2014-04-27 7:20 ` Peter Zijlstra
2014-04-27 12:20 ` Hugh Dickins
2014-04-27 12:20 ` Hugh Dickins
2014-04-27 19:33 ` Peter Zijlstra
2014-04-27 19:33 ` Peter Zijlstra
2014-04-27 19:47 ` Linus Torvalds
2014-04-27 19:47 ` Linus Torvalds
2014-04-27 20:09 ` Hugh Dickins
2014-04-27 20:09 ` Hugh Dickins
2014-04-28 9:25 ` Peter Zijlstra
2014-04-28 9:25 ` Peter Zijlstra
2014-04-28 10:14 ` Peter Zijlstra
2014-04-28 10:14 ` Peter Zijlstra
2014-04-27 16:21 ` Linus Torvalds
2014-04-27 16:21 ` Linus Torvalds
2014-04-27 23:13 ` Benjamin Herrenschmidt
2014-04-25 16:54 ` Dave Hansen
2014-04-25 16:54 ` Dave Hansen
2014-04-25 16:54 ` Dave Hansen
2014-04-25 18:41 ` Hugh Dickins
2014-04-25 18:41 ` Hugh Dickins
2014-04-25 22:00 ` Dave Hansen
2014-04-25 22:00 ` Dave Hansen
2014-04-25 22:00 ` Dave Hansen
2014-04-26 3:11 ` Hugh Dickins
2014-04-26 3:11 ` Hugh Dickins
2014-04-26 3:48 ` Linus Torvalds
2014-04-26 3:48 ` Linus Torvalds
2014-04-25 17:56 ` Linus Torvalds
2014-04-25 17:56 ` Linus Torvalds
2014-04-25 19:13 ` Hugh Dickins
2014-04-25 19:13 ` Hugh Dickins
2014-04-25 16:30 ` Dave Hansen
2014-04-25 16:30 ` Dave Hansen
2014-04-25 16:30 ` Dave Hansen
2014-04-23 20:11 ` Hugh Dickins
2014-04-23 20:11 ` Hugh Dickins
2014-04-24 8:49 ` Jan Kara
2014-04-24 8:49 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140422075459.GD11182@twins.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=benh@kernel.crashing.org \
--cc=dave.hansen@intel.com \
--cc=hpa@zytor.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@arm.linux.org.uk \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.