From: Nick Piggin <npiggin@suse.de>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Nick Piggin <npiggin@suse.de>,
Linux Memory Management <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Hugh Dickins <hugh@veritas.com>, Andrew Morton <akpm@osdl.org>,
Andrea Arcangeli <andrea@suse.de>,
David Miller <davem@davemloft.net>
Subject: Re: [patch 0/4] mm: de-skew page refcount
Date: Thu, 19 Jan 2006 18:06:56 +0100 [thread overview]
Message-ID: <20060119170656.GA9904@wotan.suse.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0601190756390.3240@g5.osdl.org>
On Thu, Jan 19, 2006 at 08:36:14AM -0800, Linus Torvalds wrote:
>
> So I _think_ that at least the case in "isolate_lru_page()", you'd
> actually be better off doing the "test-and-clear" instead of separate
> "test" and "clear-bit" ops, no? In that one, it would seem that 99+% of
> the time, the bit is set (because we tested it just before getting the
> lock).
>
> No?
>
Well in isolate_lru_page, the test operation is actually optional
(ie. it is the conditional for a BUG). And I have plans for making
some of those configurable....
But at least on the G5, test_and_clear can be noticable (although
IIRC it was in the noise for _this_ articular case) because of the
memory barriers required.
>
> Now, that whole "we might touch the page count" thing does actually worry
> me a bit. The locking rules are subtle (but they -seem- safe: before we
> actually really put the page on the free-list in the freeing path, we'll
> have locked the LRU list if it was on one).
>
Yes, I think Andrew did his homework. I thought it through quite a bit
before sending the patches and again after your feedback. Subtle though,
no doubt.
> But if you were to change _that_ one to a
>
> atomic_add_unless(&page->counter, 1, -1);
>
> I think that would be a real cleanup. And at that point I won't even
> complain that "atomic_inc_test()" is faster - that "get_page_testone()"
> thing is just fundamentally a bit scary, so I'd applaud it regardless.
>
Hmm... this is what the de-skew patch _did_ (although it was wrapped
in a function called get_page_unless_zero), in fact the main aim was
to prevent this twiddling and the de-skewing was just a nice side effect
(I guess the patch title is misleading).
So I'm confused...
> (The difference: the "counter skewing" may be unexpected, but it's just a
> simple trick. In contrast, the "touch the count after the page may be
> already in the freeing stage" is a scary subtle thing. Even if I can't
> see any actual bug in it, it just worries me in a way that offsetting a
> counter by one does not..)
>
Don't worry, you'll be seeing that patch again -- it is required for
lockless pagecache.
Nick
WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <npiggin@suse.de>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Nick Piggin <npiggin@suse.de>,
Linux Memory Management <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Hugh Dickins <hugh@veritas.com>, Andrew Morton <akpm@osdl.org>,
Andrea Arcangeli <andrea@suse.de>,
David Miller <davem@davemloft.net>
Subject: Re: [patch 0/4] mm: de-skew page refcount
Date: Thu, 19 Jan 2006 18:06:56 +0100 [thread overview]
Message-ID: <20060119170656.GA9904@wotan.suse.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0601190756390.3240@g5.osdl.org>
On Thu, Jan 19, 2006 at 08:36:14AM -0800, Linus Torvalds wrote:
>
> So I _think_ that at least the case in "isolate_lru_page()", you'd
> actually be better off doing the "test-and-clear" instead of separate
> "test" and "clear-bit" ops, no? In that one, it would seem that 99+% of
> the time, the bit is set (because we tested it just before getting the
> lock).
>
> No?
>
Well in isolate_lru_page, the test operation is actually optional
(ie. it is the conditional for a BUG). And I have plans for making
some of those configurable....
But at least on the G5, test_and_clear can be noticable (although
IIRC it was in the noise for _this_ articular case) because of the
memory barriers required.
>
> Now, that whole "we might touch the page count" thing does actually worry
> me a bit. The locking rules are subtle (but they -seem- safe: before we
> actually really put the page on the free-list in the freeing path, we'll
> have locked the LRU list if it was on one).
>
Yes, I think Andrew did his homework. I thought it through quite a bit
before sending the patches and again after your feedback. Subtle though,
no doubt.
> But if you were to change _that_ one to a
>
> atomic_add_unless(&page->counter, 1, -1);
>
> I think that would be a real cleanup. And at that point I won't even
> complain that "atomic_inc_test()" is faster - that "get_page_testone()"
> thing is just fundamentally a bit scary, so I'd applaud it regardless.
>
Hmm... this is what the de-skew patch _did_ (although it was wrapped
in a function called get_page_unless_zero), in fact the main aim was
to prevent this twiddling and the de-skewing was just a nice side effect
(I guess the patch title is misleading).
So I'm confused...
> (The difference: the "counter skewing" may be unexpected, but it's just a
> simple trick. In contrast, the "touch the count after the page may be
> already in the freeing stage" is a scary subtle thing. Even if I can't
> see any actual bug in it, it just worries me in a way that offsetting a
> counter by one does not..)
>
Don't worry, you'll be seeing that patch again -- it is required for
lockless pagecache.
Nick
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-01-19 17:07 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-01-18 10:40 [patch 0/4] mm: de-skew page refcount Nick Piggin
2006-01-18 10:40 ` Nick Piggin
2006-01-18 10:40 ` [patch 1/4] mm: page refcount use atomic primitives Nick Piggin
2006-01-18 10:40 ` Nick Piggin
2006-01-18 10:40 ` [patch 2/4] mm: PageLRU no testset Nick Piggin
2006-01-18 10:40 ` Nick Piggin
2006-01-19 17:48 ` Nikita Danilov
2006-01-19 18:10 ` Nick Piggin
2006-01-18 10:40 ` [patch 3/3] mm: PageActive " Nick Piggin
2006-01-18 10:40 ` Nick Piggin
2006-01-18 14:13 ` Marcelo Tosatti
2006-01-18 14:13 ` Marcelo Tosatti
2006-01-19 14:50 ` Nick Piggin
2006-01-19 14:50 ` Nick Piggin
2006-01-19 16:52 ` Marcelo Tosatti
2006-01-19 16:52 ` Marcelo Tosatti
2006-01-19 20:02 ` Nick Piggin
2006-01-19 20:02 ` Nick Piggin
2006-01-19 21:41 ` Marcelo Tosatti
2006-01-19 21:41 ` Marcelo Tosatti
2006-01-18 10:41 ` [patch 4/4] mm: less atomic ops Nick Piggin
2006-01-18 10:41 ` Nick Piggin
2006-01-18 16:38 ` [patch 0/4] mm: de-skew page refcount Linus Torvalds
2006-01-18 16:38 ` Linus Torvalds
2006-01-18 17:05 ` Nick Piggin
2006-01-18 17:05 ` Nick Piggin
2006-01-18 19:27 ` Linus Torvalds
2006-01-18 19:27 ` Linus Torvalds
2006-01-19 14:00 ` Nick Piggin
2006-01-19 14:00 ` Nick Piggin
2006-01-19 16:36 ` Linus Torvalds
2006-01-19 16:36 ` Linus Torvalds
2006-01-19 17:06 ` Nick Piggin [this message]
2006-01-19 17:06 ` Nick Piggin
2006-01-19 17:27 ` Linus Torvalds
2006-01-19 17:27 ` Linus Torvalds
2006-01-19 17:38 ` Nick Piggin
2006-01-19 17:38 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060119170656.GA9904@wotan.suse.de \
--to=npiggin@suse.de \
--cc=akpm@osdl.org \
--cc=andrea@suse.de \
--cc=davem@davemloft.net \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.