All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>,
	Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-S390 <linux-s390@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] s390: mm: rmap: Transfer storage key to struct page under the page lock
Date: Mon, 16 Apr 2012 17:50:40 +0200	[thread overview]
Message-ID: <20120416175040.0e33b37f@de.ibm.com> (raw)
In-Reply-To: <20120416141423.GD2359@suse.de>

On Mon, 16 Apr 2012 15:14:23 +0100
Mel Gorman <mgorman@suse.de> wrote:

> This patch is horribly ugly and there has to be a better way of doing
> it. I'm looking for suggestions on what s390 can do here that is not
> painful or broken. 
> 
> However, s390 needs a better way of guarding against
> PageSwapCache pages being removed from the radix tree while set_page_dirty()
> is being called. The patch would be marginally better if in the PageSwapCache
> case we simply tried to lock once and in the contended case just fail to
> propogate the storage key. I lack familiarity with the s390 architecture
> to be certain if this is safe or not. Suggestions on a better fix?

One though that crossed my mind is that maybe a better approach would be
to move the page_test_and_clear_dirty check out of page_remove_rmap.
What we need to look out for are code sequences of the form:

	if (pte_dirty(pte))
		set_page_dirty(page);
	...
	page_remove_rmap(page);

There are four of those as far as I can see: in try_to_unmap_one,
try_to_unmap_cluster, zap_pte, and zap_pte_range.

A valid implementation for s390 would be to test and clear the changed
bit in the storage key for every of those pte_dirty() calls.

	if (pte_dirty(pte) || page_test_and_clear_dirty(page))
		set_page_dirty(page);
	...
	page_remove_rmap(page); /* w/o page_test_clear_dirty */

Trouble is that the ISKE and SSKE instructions are very expensive, that
is why we currently have the operation in page_remove_rmap after the
map counter dropped to zero (which is wrong as we now have learned the
hard way). The additional check for (!PageAnon || PageSwapCache) is
just another variation of avoiding ISKE/SSKE.

Thinking about a function like this:

static inline int page_test_dirty_eco(struct page *page)
{
	if (page_mapcount(page) > 1)
		return 0;
	if (PageAnon(page) && !PageSwapCache(page))
		return 0;
	return page_test_and_clear_dirty(page);
}

and use it alongside the pte_dirty() check. The worry I have is the
map counter. What guarantees us that the map counter is not decremented
concurrently? Which is probably a problem with the current patch as
well, checking atomic_add_negative(-1, &page->_mapcount) against zero
works, doing (page_mapcount(page) == 1) followed by the decrement 
can race. And we better not forget a dirty bit ..

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>,
	Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-S390 <linux-s390@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] s390: mm: rmap: Transfer storage key to struct page under the page lock
Date: Mon, 16 Apr 2012 17:50:40 +0200	[thread overview]
Message-ID: <20120416175040.0e33b37f@de.ibm.com> (raw)
In-Reply-To: <20120416141423.GD2359@suse.de>

On Mon, 16 Apr 2012 15:14:23 +0100
Mel Gorman <mgorman@suse.de> wrote:

> This patch is horribly ugly and there has to be a better way of doing
> it. I'm looking for suggestions on what s390 can do here that is not
> painful or broken. 
> 
> However, s390 needs a better way of guarding against
> PageSwapCache pages being removed from the radix tree while set_page_dirty()
> is being called. The patch would be marginally better if in the PageSwapCache
> case we simply tried to lock once and in the contended case just fail to
> propogate the storage key. I lack familiarity with the s390 architecture
> to be certain if this is safe or not. Suggestions on a better fix?

One though that crossed my mind is that maybe a better approach would be
to move the page_test_and_clear_dirty check out of page_remove_rmap.
What we need to look out for are code sequences of the form:

	if (pte_dirty(pte))
		set_page_dirty(page);
	...
	page_remove_rmap(page);

There are four of those as far as I can see: in try_to_unmap_one,
try_to_unmap_cluster, zap_pte, and zap_pte_range.

A valid implementation for s390 would be to test and clear the changed
bit in the storage key for every of those pte_dirty() calls.

	if (pte_dirty(pte) || page_test_and_clear_dirty(page))
		set_page_dirty(page);
	...
	page_remove_rmap(page); /* w/o page_test_clear_dirty */

Trouble is that the ISKE and SSKE instructions are very expensive, that
is why we currently have the operation in page_remove_rmap after the
map counter dropped to zero (which is wrong as we now have learned the
hard way). The additional check for (!PageAnon || PageSwapCache) is
just another variation of avoiding ISKE/SSKE.

Thinking about a function like this:

static inline int page_test_dirty_eco(struct page *page)
{
	if (page_mapcount(page) > 1)
		return 0;
	if (PageAnon(page) && !PageSwapCache(page))
		return 0;
	return page_test_and_clear_dirty(page);
}

and use it alongside the pte_dirty() check. The worry I have is the
map counter. What guarantees us that the map counter is not decremented
concurrently? Which is probably a problem with the current patch as
well, checking atomic_add_negative(-1, &page->_mapcount) against zero
works, doing (page_mapcount(page) == 1) followed by the decrement 
can race. And we better not forget a dirty bit ..

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


  parent reply	other threads:[~2012-04-16 15:50 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-16 14:14 [RFC PATCH] s390: mm: rmap: Transfer storage key to struct page under the page lock Mel Gorman
2012-04-16 14:14 ` Mel Gorman
2012-04-16 14:53 ` Rik van Riel
2012-04-16 14:53   ` Rik van Riel
2012-04-16 15:02   ` Mel Gorman
2012-04-16 15:02     ` Mel Gorman
2012-04-16 15:50 ` Martin Schwidefsky [this message]
2012-04-16 15:50   ` Martin Schwidefsky
2012-04-17 12:29   ` Mel Gorman
2012-04-17 12:29     ` Mel Gorman
2012-04-17 13:02     ` Martin Schwidefsky
2012-04-17 13:02       ` Martin Schwidefsky
2012-04-18  4:00       ` Hugh Dickins
2012-04-18  4:00         ` Hugh Dickins
2012-04-16 21:14 ` Hugh Dickins
2012-04-16 21:14   ` Hugh Dickins
2012-04-17 12:22   ` Mel Gorman
2012-04-17 12:22     ` Mel Gorman
2012-04-18  3:52     ` Hugh Dickins
2012-04-18  3:52       ` Hugh Dickins
2012-04-18 15:28       ` Mel Gorman
2012-04-18 15:28         ` Mel Gorman
2012-04-18 17:09         ` Hugh Dickins
2012-04-18 17:09           ` Hugh Dickins
2012-04-23 12:41           ` Mel Gorman
2012-04-23 12:41             ` Mel Gorman
2012-04-23 18:09             ` Hugh Dickins
2012-04-23 18:09               ` Hugh Dickins
2012-04-23 18:14               ` [PATCH] mm: fix s390 BUG by __set_page_dirty_no_writeback on swap Hugh Dickins
2012-04-23 18:14                 ` Hugh Dickins
2012-04-18 18:29         ` [RFC PATCH] s390: mm: rmap: Transfer storage key to struct page under the page lock Martin Schwidefsky
2012-04-18 18:29           ` Martin Schwidefsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120416175040.0e33b37f@de.ibm.com \
    --to=schwidefsky@de.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.