From: Andrew Morton <akpm@linux-foundation.org>
To: Hugh Dickins <hughd@google.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 1/12] radix_tree: exceptional entries and indices
Date: Tue, 12 Jul 2011 16:24:31 -0700 [thread overview]
Message-ID: <20110712162431.75bfe77b.akpm@linux-foundation.org> (raw)
In-Reply-To: <alpine.LSU.2.00.1107121536100.2112@sister.anvils>
<tries to remember what this is all about>
l 2011 15:56:14 -0700 (PDT)
Hugh Dickins <hughd@google.com> wrote:
> On Sat, 18 Jun 2011, Andrew Morton wrote:
> > On Fri, 17 Jun 2011 17:13:38 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
> > > On Fri, 17 Jun 2011, Andrew Morton wrote:
> > > > On Tue, 14 Jun 2011 03:42:27 -0700 (PDT)
> > > > Hugh Dickins <hughd@google.com> wrote:
> > > >
> > > > > The low bit of a radix_tree entry is already used to denote an indirect
> > > > > pointer, for internal use, and the unlikely radix_tree_deref_retry() case.
> > > > > Define the next bit as denoting an exceptional entry, and supply inline
> > > > > functions radix_tree_exception() to return non-0 in either unlikely case,
> > > > > and radix_tree_exceptional_entry() to return non-0 in the second case.
> > > >
> > > > Yes, the RADIX_TREE_INDIRECT_PTR hack is internal-use-only, and doesn't
> > > > operate on (and hence doesn't corrupt) client-provided items.
> > > >
> > > > This patch uses bit 1 and uses it against client items, so for
> > > > practical purpoese it can only be used when the client is storing
> > > > addresses. And it needs new APIs to access that flag.
> > > >
> > > > All a bit ugly. Why not just add another tag for this? Or reuse an
> > > > existing tag if the current tags aren't all used for these types of
> > > > pages?
> > >
> > > I couldn't see how to use tags without losing the "lockless" lookups:
> >
> > So lockless pagecache broke the radix-tree tag-versus-item coherency as
> > well as the address_space nrpages-vs-radix-tree coherency.
>
> I don't think that remark is fair to lockless pagecache at all. If we
> want the scalability advantage of lockless lookup, yes, we don't have
> strict coherency with tagging at that time. But those places that need
> to worry about that coherency, can lock to do so.
Nobody thought about these issues, afaik. Things have broken and the
code has become significantly more complex/fragile.
Does the locking in mapping_tagged() make any sense?
> > Isn't it fun learning these things.
> >
> > > because the tag is a separate bit from the entry itself, unless you're
> > > under tree_lock, there would be races when changing from page pointer
> > > to swap entry or back, when slot was updated but tag not or vice versa.
> >
> > So... take tree_lock?
>
> I wouldn't call that an improvement...
I wouldn't call the proposed changes to radix-tree.c an improvement,
either. It's an expedient, once-off, single-caller hack.
If the cost of adding locking is negligible then that is a superior fix.
> > What effect does that have?
>
> ... but admit I have not measured: I rather assume that if we now change
> tmpfs from lockless to locked lookup, someone else will soon come up with
> the regression numbers.
>
> > It'd better be
> > "really bad", because this patchset does nothing at all to improve core
> > MM maintainability :(
>
> I was aiming to improve shmem.c maintainability; and you have good grounds
> to accuse me of hurting shmem.c maintainability when I highmem-ized the
> swap vector nine years ago.
>
> I was not aiming to improve core MM maintainability, nor to harm it.
> I am extending the use to which the radix-tree can be put, but is that
> so bad?
I find it hard to believe that this wart added to the side of the
radix-tree code will find any other users. And the wart spreads
contagion into core filemap pagecache lookup.
It's pretty nasty stuff. Please, what is a better way of doing all this?
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Hugh Dickins <hughd@google.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 1/12] radix_tree: exceptional entries and indices
Date: Tue, 12 Jul 2011 16:24:31 -0700 [thread overview]
Message-ID: <20110712162431.75bfe77b.akpm@linux-foundation.org> (raw)
In-Reply-To: <alpine.LSU.2.00.1107121536100.2112@sister.anvils>
<tries to remember what this is all about>
l 2011 15:56:14 -0700 (PDT)
Hugh Dickins <hughd@google.com> wrote:
> On Sat, 18 Jun 2011, Andrew Morton wrote:
> > On Fri, 17 Jun 2011 17:13:38 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
> > > On Fri, 17 Jun 2011, Andrew Morton wrote:
> > > > On Tue, 14 Jun 2011 03:42:27 -0700 (PDT)
> > > > Hugh Dickins <hughd@google.com> wrote:
> > > >
> > > > > The low bit of a radix_tree entry is already used to denote an indirect
> > > > > pointer, for internal use, and the unlikely radix_tree_deref_retry() case.
> > > > > Define the next bit as denoting an exceptional entry, and supply inline
> > > > > functions radix_tree_exception() to return non-0 in either unlikely case,
> > > > > and radix_tree_exceptional_entry() to return non-0 in the second case.
> > > >
> > > > Yes, the RADIX_TREE_INDIRECT_PTR hack is internal-use-only, and doesn't
> > > > operate on (and hence doesn't corrupt) client-provided items.
> > > >
> > > > This patch uses bit 1 and uses it against client items, so for
> > > > practical purpoese it can only be used when the client is storing
> > > > addresses. And it needs new APIs to access that flag.
> > > >
> > > > All a bit ugly. Why not just add another tag for this? Or reuse an
> > > > existing tag if the current tags aren't all used for these types of
> > > > pages?
> > >
> > > I couldn't see how to use tags without losing the "lockless" lookups:
> >
> > So lockless pagecache broke the radix-tree tag-versus-item coherency as
> > well as the address_space nrpages-vs-radix-tree coherency.
>
> I don't think that remark is fair to lockless pagecache at all. If we
> want the scalability advantage of lockless lookup, yes, we don't have
> strict coherency with tagging at that time. But those places that need
> to worry about that coherency, can lock to do so.
Nobody thought about these issues, afaik. Things have broken and the
code has become significantly more complex/fragile.
Does the locking in mapping_tagged() make any sense?
> > Isn't it fun learning these things.
> >
> > > because the tag is a separate bit from the entry itself, unless you're
> > > under tree_lock, there would be races when changing from page pointer
> > > to swap entry or back, when slot was updated but tag not or vice versa.
> >
> > So... take tree_lock?
>
> I wouldn't call that an improvement...
I wouldn't call the proposed changes to radix-tree.c an improvement,
either. It's an expedient, once-off, single-caller hack.
If the cost of adding locking is negligible then that is a superior fix.
> > What effect does that have?
>
> ... but admit I have not measured: I rather assume that if we now change
> tmpfs from lockless to locked lookup, someone else will soon come up with
> the regression numbers.
>
> > It'd better be
> > "really bad", because this patchset does nothing at all to improve core
> > MM maintainability :(
>
> I was aiming to improve shmem.c maintainability; and you have good grounds
> to accuse me of hurting shmem.c maintainability when I highmem-ized the
> swap vector nine years ago.
>
> I was not aiming to improve core MM maintainability, nor to harm it.
> I am extending the use to which the radix-tree can be put, but is that
> so bad?
I find it hard to believe that this wart added to the side of the
radix-tree code will find any other users. And the wart spreads
contagion into core filemap pagecache lookup.
It's pretty nasty stuff. Please, what is a better way of doing all this?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-07-12 23:25 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-14 10:40 [PATCH 0/12] tmpfs: convert from old swap vector to radix tree Hugh Dickins
2011-06-14 10:40 ` Hugh Dickins
2011-06-14 10:42 ` [PATCH 1/12] radix_tree: exceptional entries and indices Hugh Dickins
2011-06-14 10:42 ` Hugh Dickins
2011-06-14 11:22 ` Pekka Enberg
2011-06-14 11:22 ` Pekka Enberg
2011-06-15 0:24 ` Hugh Dickins
2011-06-17 23:38 ` Andrew Morton
2011-06-17 23:38 ` Andrew Morton
2011-06-18 0:07 ` Randy Dunlap
2011-06-18 0:07 ` Randy Dunlap
2011-06-18 0:12 ` Randy Dunlap
2011-06-18 0:12 ` Randy Dunlap
2011-06-18 1:52 ` Hugh Dickins
2011-06-18 1:52 ` Hugh Dickins
2011-07-19 22:36 ` Hugh Dickins
2011-07-19 22:36 ` Hugh Dickins
2011-07-19 23:28 ` Randy Dunlap
2011-07-19 23:28 ` Randy Dunlap
2011-06-18 0:13 ` Hugh Dickins
2011-06-18 0:13 ` Hugh Dickins
2011-06-18 21:48 ` Andrew Morton
2011-06-18 21:48 ` Andrew Morton
2011-07-12 22:56 ` Hugh Dickins
2011-07-12 22:56 ` Hugh Dickins
2011-07-12 23:24 ` Andrew Morton [this message]
2011-07-12 23:24 ` Andrew Morton
2011-07-13 22:27 ` Hugh Dickins
2011-07-13 22:27 ` Hugh Dickins
2011-06-14 10:43 ` [PATCH 2/12] mm: let swap use exceptional entries Hugh Dickins
2011-06-14 10:43 ` Hugh Dickins
2011-06-18 21:52 ` Andrew Morton
2011-06-18 21:52 ` Andrew Morton
2011-07-12 22:08 ` Hugh Dickins
2011-07-12 22:08 ` Hugh Dickins
2011-07-13 23:11 ` Andrew Morton
2011-07-13 23:11 ` Andrew Morton
2011-07-19 22:46 ` Hugh Dickins
2011-07-19 22:46 ` Hugh Dickins
2011-06-18 21:55 ` Andrew Morton
2011-06-18 21:55 ` Andrew Morton
2011-07-12 22:35 ` Hugh Dickins
2011-07-12 22:35 ` Hugh Dickins
2011-06-14 10:45 ` [PATCH 3/12] tmpfs: demolish old swap vector support Hugh Dickins
2011-06-14 10:45 ` Hugh Dickins
2011-06-14 10:48 ` [PATCH 4/12] tmpfs: miscellaneous trivial cleanups Hugh Dickins
2011-06-14 10:48 ` Hugh Dickins
2011-06-14 10:49 ` [PATCH 5/12] tmpfs: copy truncate_inode_pages_range Hugh Dickins
2011-06-14 10:49 ` Hugh Dickins
2011-06-14 10:51 ` [PATCH 6/12] tmpfs: convert shmem_truncate_range to radix-swap Hugh Dickins
2011-06-14 10:51 ` Hugh Dickins
2011-06-14 10:52 ` [PATCH 7/12] tmpfs: convert shmem_unuse_inode " Hugh Dickins
2011-06-14 10:52 ` Hugh Dickins
2011-06-14 10:53 ` [PATCH 8/12] tmpfs: convert shmem_getpage_gfp " Hugh Dickins
2011-06-14 10:53 ` Hugh Dickins
2011-06-14 10:54 ` [PATCH 9/12] tmpfs: convert mem_cgroup shmem " Hugh Dickins
2011-06-14 10:54 ` Hugh Dickins
2011-06-14 10:56 ` [PATCH 10/12] tmpfs: convert shmem_writepage and enable swap Hugh Dickins
2011-06-14 10:56 ` Hugh Dickins
2011-06-14 10:57 ` [PATCH 11/12] tmpfs: use kmemdup for short symlinks Hugh Dickins
2011-06-14 10:57 ` Hugh Dickins
2011-06-14 11:16 ` Pekka Enberg
2011-06-14 11:16 ` Pekka Enberg
2011-06-14 10:59 ` [PATCH 12/12] mm: a few small updates for radix-swap Hugh Dickins
2011-06-14 10:59 ` Hugh Dickins
2011-06-15 0:49 ` [PATCH v2 " Hugh Dickins
2011-06-15 0:49 ` Hugh Dickins
2011-06-14 17:29 ` [PATCH 0/12] tmpfs: convert from old swap vector to radix tree Linus Torvalds
2011-06-14 17:29 ` Linus Torvalds
2011-06-14 18:20 ` Rik van Riel
2011-06-14 18:20 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110712162431.75bfe77b.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.