From: Andrew Morton <akpm@linux-foundation.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>,
LKML <linux-kernel@vger.kernel.org>,
Nick Piggin <npiggin@suse.de>,
Stewart Smith <stewart@flamingspork.com>,
linux-mm@kvack.org, linux-arch@vger.kernel.org
Subject: Re: [patch 1/2] mm: fincore()
Date: Fri, 15 Feb 2013 15:42:35 -0800 [thread overview]
Message-ID: <20130215154235.0fb36f53.akpm@linux-foundation.org> (raw)
In-Reply-To: <20130215231304.GB23930@cmpxchg.org>
On Fri, 15 Feb 2013 18:13:04 -0500
Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Fri, Feb 15, 2013 at 01:27:38PM -0800, Andrew Morton wrote:
> > On Fri, 15 Feb 2013 01:34:50 -0500
> > Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > > + * The status is returned in a vector of bytes. The least significant
> > > + * bit of each byte is 1 if the referenced page is in memory, otherwise
> > > + * it is zero.
> >
> > Also, this is going to be dreadfully inefficient for some obvious cases.
> >
> > We could address that by returning the info in some more efficient
> > representation. That will be run-length encoded in some fashion.
> >
> > The obvious way would be to populate an array of
> >
> > struct page_status {
> > u32 present:1;
> > u32 count:31;
> > };
> >
> > or whatever.
>
> I'm having a hard time seeing how this could be extended to more
> status bits without stifling the optimization too much.
See other email: add a syscall arg which specifies the boolean status
which we're searching for.
> If we just
> add more status bits to one page_status, the likelihood of long runs
> where all bits are in agreement decreases. But as the optimization
> becomes less and less effective, we are stuck with an interface that
> is more PITA than just using mmap and mincore again.
>
> The user has to supply a worst-case-sized vector with one struct
> page_status per page in the range, but the per-page item will be
> bigger than with the byte vector because of the additional run length
> variable.
Yes, we'd need to tell the kernel how much storage is available for the
structures.
> However, one struct page_status per run leaves you with a worst case
> of one syscall per page in the range.
Yes.
> I dunno. The byte vector might not be optimal but its worst cases
> seem more attractive, is just as extensible, and dead simple to use.
But I think "which pages from this 4TB file are in core" will not be an
uncommon usage, and writing a gig of memory to find three pages is just
awful.
I wonder what the most common usage would be (one should know this
before merging the syscall :)). I guess "is this relatively-small
range of the file in core" and/or "which pages from this
relatively-small range of the file will I need to read", etc.
The syscall should handle the common usages very well. But it
shouldn't handle uncommon usages very badly!
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>,
LKML <linux-kernel@vger.kernel.org>,
Nick Piggin <npiggin@suse.de>,
Stewart Smith <stewart@flamingspork.com>,
linux-mm@kvack.org, linux-arch@vger.kernel.org
Subject: Re: [patch 1/2] mm: fincore()
Date: Fri, 15 Feb 2013 15:42:35 -0800 [thread overview]
Message-ID: <20130215154235.0fb36f53.akpm@linux-foundation.org> (raw)
In-Reply-To: <20130215231304.GB23930@cmpxchg.org>
On Fri, 15 Feb 2013 18:13:04 -0500
Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Fri, Feb 15, 2013 at 01:27:38PM -0800, Andrew Morton wrote:
> > On Fri, 15 Feb 2013 01:34:50 -0500
> > Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > > + * The status is returned in a vector of bytes. The least significant
> > > + * bit of each byte is 1 if the referenced page is in memory, otherwise
> > > + * it is zero.
> >
> > Also, this is going to be dreadfully inefficient for some obvious cases.
> >
> > We could address that by returning the info in some more efficient
> > representation. That will be run-length encoded in some fashion.
> >
> > The obvious way would be to populate an array of
> >
> > struct page_status {
> > u32 present:1;
> > u32 count:31;
> > };
> >
> > or whatever.
>
> I'm having a hard time seeing how this could be extended to more
> status bits without stifling the optimization too much.
See other email: add a syscall arg which specifies the boolean status
which we're searching for.
> If we just
> add more status bits to one page_status, the likelihood of long runs
> where all bits are in agreement decreases. But as the optimization
> becomes less and less effective, we are stuck with an interface that
> is more PITA than just using mmap and mincore again.
>
> The user has to supply a worst-case-sized vector with one struct
> page_status per page in the range, but the per-page item will be
> bigger than with the byte vector because of the additional run length
> variable.
Yes, we'd need to tell the kernel how much storage is available for the
structures.
> However, one struct page_status per run leaves you with a worst case
> of one syscall per page in the range.
Yes.
> I dunno. The byte vector might not be optimal but its worst cases
> seem more attractive, is just as extensible, and dead simple to use.
But I think "which pages from this 4TB file are in core" will not be an
uncommon usage, and writing a gig of memory to find three pages is just
awful.
I wonder what the most common usage would be (one should know this
before merging the syscall :)). I guess "is this relatively-small
range of the file in core" and/or "which pages from this
relatively-small range of the file will I need to read", etc.
The syscall should handle the common usages very well. But it
shouldn't handle uncommon usages very badly!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-02-15 23:42 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-11 3:13 RFC: mincore: add a bit to indicate a page is dirty Rusty Russell
2013-02-11 16:27 ` Johannes Weiner
2013-02-11 22:12 ` Andrew Morton
2013-02-12 5:44 ` Rusty Russell
2013-02-15 6:34 ` [patch 1/2] mm: fincore() Johannes Weiner
2013-02-15 6:34 ` Johannes Weiner
2013-02-15 20:39 ` David Miller
2013-02-15 20:39 ` David Miller
2013-02-15 21:14 ` Andrew Morton
2013-02-15 21:14 ` Andrew Morton
2013-02-15 22:28 ` Johannes Weiner
2013-02-15 22:28 ` Johannes Weiner
2013-02-15 22:34 ` Andrew Morton
2013-02-15 22:34 ` Andrew Morton
2013-02-15 21:27 ` Andrew Morton
2013-02-15 21:27 ` Andrew Morton
2013-02-15 23:13 ` Johannes Weiner
2013-02-15 23:13 ` Johannes Weiner
2013-02-15 23:42 ` Andrew Morton [this message]
2013-02-15 23:42 ` Andrew Morton
2013-02-16 4:23 ` Rusty Russell
2013-02-16 4:23 ` Rusty Russell
2013-02-17 22:51 ` Johannes Weiner
2013-02-17 22:51 ` Johannes Weiner
2013-02-17 22:54 ` Andrew Morton
2013-02-17 22:54 ` Andrew Morton
2013-05-29 14:53 ` Andres Freund
2013-05-29 14:53 ` Andres Freund
2013-05-29 17:32 ` Johannes Weiner
2013-05-29 17:32 ` Johannes Weiner
2013-05-29 17:52 ` Andres Freund
2013-05-29 17:52 ` Andres Freund
2013-02-18 5:41 ` Rusty Russell
2013-02-18 5:41 ` Rusty Russell
2013-02-19 10:25 ` Simon Jeons
2013-02-19 10:25 ` Simon Jeons
2013-02-15 6:35 ` [patch 2/2] x86-64: hook up fincore() syscall Johannes Weiner
2013-02-15 6:35 ` Johannes Weiner
2013-02-12 5:49 ` RFC: mincore: add a bit to indicate a page is dirty Rusty Russell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130215154235.0fb36f53.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=rusty@rustcorp.com.au \
--cc=stewart@flamingspork.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.