Re: fadvise interferes with readahead

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: fadvise interferes with readahead
       [not found] <CAGTBQpaDR4+V5b1AwAVyuVLu5rkU=Wc1WeUdLu5ag=WOk5oJzQ@mail.gmail.com>
@ 2012-11-20  8:04 ` Fengguang Wu
  2012-11-20 13:20   ` Jaegeuk Hanse
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Fengguang Wu @ 2012-11-20  8:04 UTC (permalink / raw)
  To: Claudio Freire; +Cc: Andrew Morton, linux-kernel, Linux Memory Management List

Hi Claudio,

Thanks for the detailed problem description!

On Fri, Nov 09, 2012 at 04:30:32PM -0300, Claudio Freire wrote:
> Hi. First of all, I'm not subscribed to this list, so I'd suggest all
> replies copy me personally.
> 
> I have been trying to implement some I/O pipelining in Postgres (ie:
> read the next data page asynchronously while working on the current
> page), and stumbled upon some puzzling behavior involving the
> interaction between fadvise and readahead.
> 
> I'm running kernel 3.0.0 (debian testing), on a single-disk system
> which, though unsuitable for database workloads, is slow enough to let
> me experiment with these read-ahead issues.
> 
> Typical random I/O performance is on the order of between 150 r/s to
> 200 r/s (ballpark 7200rpm I'd say), with thoughput around 1.5MB/s.
> Sequential I/O can go up to 60MB/s, though it tends to be around 50.
> 
> Now onto the problem. In order to parallelize I/O with computation,
> I've made postgres fadvise(willneed) the pages it will read next. How
> far ahead is configurable, and I've tested with a number of
> configurations.
> 
> The prefetching logic is aware of the OS and pg-specific cache, so it
> will only fadvise a block once. fadvise calls will stay 1 (or a
> configurable N) real I/O ahead of read calls, and there's no fadvising
> of pages that won't be read eventually, in the same order. I checked
> with strace.
> 
> However, performance when fadvising drops considerably for a specific
> yet common access pattern:
> 
> When a nested loop with two index scans happens, access is random
> locally, but eventually whole ranges of a file get read (in this
> random order). Think block "1 6 8 100 34 299 3 7 68 24" followed by "2
> 4 5 101 298 301". Though random, there are ranges there that can be
> merged in one read-request.
> 
> The kernel seems to do the merge by applying some form of readahead,
> not sure if it's context, ondemand or adaptive readahead on the 3.0.0
> kernel. Anyway, it seems to do readahead, as iostat says:
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda               0.00     4.40  224.20    2.00     4.16     0.03
> 37.86     1.91    8.43    8.00   56.80   4.40  99.44
> 
> (notice the avgrq-sz of 37.8)
> 
> With fadvise calls, the thing looks a lot different:
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda               0.00    18.00  226.80    1.00     1.80     0.07
> 16.81     4.00   17.52   17.23   82.40   4.39  99.92

FYI, there is a readahead tracing/stats patchset that can provide far
more accurate numbers about what's going on with readahead, which will
help eliminate lots of the guess works here. 

https://lwn.net/Articles/472798/

> Notice the avgrq-sz of 16.8. Assuming it's 512-byte sectors, that's
> spot-on with a postgres page (8k). So, fadvise seems to carry out the
> requests verbatim, while read manages to merge at least two of them.
> 
> The random nature of reads makes me think the scheduler is failing to
> merge the requests in both cases (rrqm/s = 0), because it only looks
> at successive requests (I'm only guessing here though).

I guess it's not a merging problem, but that the kernel readahead code
manages to submit larger IO requests in the first place.

> Looking into the kernel code, it seems the problem could be related to
> how fadvise works in conjunction with readahead. fadvise seems to call
> the function in readahead.c that schedules the asynchornous I/O[0]. It
> doesn't seem subject to readahead logic itself[1], which in on itself
> doesn't seem bad. But it does, I assume (not knowing the code that
> well), prevent readahead logic[2] to eventually see the pattern. It
> effectively disables readahead altogether.

You are right. If user space does fadvise() and the fadvised pages
cover all read() pages, the kernel readahead code will not run at all.

So the title is actually a bit misleading. The kernel readahead won't
interfere with user space prefetching at all. ;)

> This, I theorize, may be because after the fadvise call starts an
> async I/O on the page, further reads won't hit readahead code because
> of the page cache[3] (!PageUptodate I imagine). Whether this is
> desirable or not is not really obvious. In this particular case, doing
> fadvise calls in what would seem an optimum way, results in terribly
> worse performance. So I'd suggest it's not really that advisable.

Yes. The kernel readahead code by design will outperform simple
fadvise in the case of clustered random reads. Imagine the access
pattern 1, 3, 2, 6, 4, 9. fadvise will trigger 6 IOs literally. While
kernel readahead will likely trigger 3 IOs for 1, 3, 2-9. Because on
the page miss for 2, it will detect the existence of history page 1
and do readahead properly. For hard disks, it's mainly the number of
IOs that matters. So even if kernel readahead loses some opportunities
to do async IO and possibly loads some extra pages that will never be
used, it still manges to perform much better.

> The fix would lay in fadvise, I think. It should update readahead
> tracking structures. Alternatively, one could try to do it in
> do_generic_file_read, updating readahead on !PageUptodate or even on
> page cache hits. I really don't have the expertise or time to go
> modifying, building and testing the supposedly quite simple patch that
> would fix this. It's mostly about the testing, in fact. So if someone
> can comment or try by themselves, I guess it would really benefit
> those relying on fadvise to fix this behavior.

One possible solution is to try the context readahead at fadvise time
to check the existence of history pages and do readahead accordingly.

However it will introduce *real interferences* between kernel
readahead and user prefetching. The original scheme is, once user
space starts its own informed prefetching, kernel readahead will
automatically stand out of the way.

Thanks,
Fengguang

> Additionally, I would welcome any suggestions for ways to mitigate
> this problem on current kernels, as the patch I'm working I'd like to
> deploy with older kernels. Even if the latest kernel had this behavior
> fixed, I'd still welcome some workarounds.
> 
> More details on the benchmarks I've run can be found in the postgresql
> dev ML archive[4].
> 
> [0] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/fadvise.c#l95
> [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/readahead.c#l211
> [2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/readahead.c#l398
> [3] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/filemap.c#l1081
> [4] http://archives.postgresql.org/pgsql-hackers/2012-10/msg01139.php
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fadvise interferes with readahead
  2012-11-20  8:04 ` fadvise interferes with readahead Fengguang Wu
@ 2012-11-20 13:20   ` Jaegeuk Hanse
  2012-11-20 14:28     ` Fengguang Wu
  2012-11-20 13:34   ` Claudio Freire
  2012-11-20 14:11   ` Jaegeuk Hanse
  2 siblings, 1 reply; 12+ messages in thread
From: Jaegeuk Hanse @ 2012-11-20 13:20 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Claudio Freire, Andrew Morton, linux-kernel,
	Linux Memory Management List

On 11/20/2012 04:04 PM, Fengguang Wu wrote:
> Hi Claudio,
>
> Thanks for the detailed problem description!
>
> On Fri, Nov 09, 2012 at 04:30:32PM -0300, Claudio Freire wrote:
>> Hi. First of all, I'm not subscribed to this list, so I'd suggest all
>> replies copy me personally.
>>
>> I have been trying to implement some I/O pipelining in Postgres (ie:
>> read the next data page asynchronously while working on the current
>> page), and stumbled upon some puzzling behavior involving the
>> interaction between fadvise and readahead.
>>
>> I'm running kernel 3.0.0 (debian testing), on a single-disk system
>> which, though unsuitable for database workloads, is slow enough to let
>> me experiment with these read-ahead issues.
>>
>> Typical random I/O performance is on the order of between 150 r/s to
>> 200 r/s (ballpark 7200rpm I'd say), with thoughput around 1.5MB/s.
>> Sequential I/O can go up to 60MB/s, though it tends to be around 50.
>>
>> Now onto the problem. In order to parallelize I/O with computation,
>> I've made postgres fadvise(willneed) the pages it will read next. How
>> far ahead is configurable, and I've tested with a number of
>> configurations.
>>
>> The prefetching logic is aware of the OS and pg-specific cache, so it
>> will only fadvise a block once. fadvise calls will stay 1 (or a
>> configurable N) real I/O ahead of read calls, and there's no fadvising
>> of pages that won't be read eventually, in the same order. I checked
>> with strace.
>>
>> However, performance when fadvising drops considerably for a specific
>> yet common access pattern:
>>
>> When a nested loop with two index scans happens, access is random
>> locally, but eventually whole ranges of a file get read (in this
>> random order). Think block "1 6 8 100 34 299 3 7 68 24" followed by "2
>> 4 5 101 298 301". Though random, there are ranges there that can be
>> merged in one read-request.
>>
>> The kernel seems to do the merge by applying some form of readahead,
>> not sure if it's context, ondemand or adaptive readahead on the 3.0.0
>> kernel. Anyway, it seems to do readahead, as iostat says:
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> sda               0.00     4.40  224.20    2.00     4.16     0.03
>> 37.86     1.91    8.43    8.00   56.80   4.40  99.44
>>
>> (notice the avgrq-sz of 37.8)
>>
>> With fadvise calls, the thing looks a lot different:
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> sda               0.00    18.00  226.80    1.00     1.80     0.07
>> 16.81     4.00   17.52   17.23   82.40   4.39  99.92
> FYI, there is a readahead tracing/stats patchset that can provide far
> more accurate numbers about what's going on with readahead, which will
> help eliminate lots of the guess works here.
>
> https://lwn.net/Articles/472798/
>
>> Notice the avgrq-sz of 16.8. Assuming it's 512-byte sectors, that's
>> spot-on with a postgres page (8k). So, fadvise seems to carry out the
>> requests verbatim, while read manages to merge at least two of them.
>>
>> The random nature of reads makes me think the scheduler is failing to
>> merge the requests in both cases (rrqm/s = 0), because it only looks
>> at successive requests (I'm only guessing here though).
> I guess it's not a merging problem, but that the kernel readahead code
> manages to submit larger IO requests in the first place.
>
>> Looking into the kernel code, it seems the problem could be related to
>> how fadvise works in conjunction with readahead. fadvise seems to call
>> the function in readahead.c that schedules the asynchornous I/O[0]. It
>> doesn't seem subject to readahead logic itself[1], which in on itself
>> doesn't seem bad. But it does, I assume (not knowing the code that
>> well), prevent readahead logic[2] to eventually see the pattern. It
>> effectively disables readahead altogether.
> You are right. If user space does fadvise() and the fadvised pages
> cover all read() pages, the kernel readahead code will not run at all.
>
> So the title is actually a bit misleading. The kernel readahead won't
> interfere with user space prefetching at all. ;)
>
>> This, I theorize, may be because after the fadvise call starts an
>> async I/O on the page, further reads won't hit readahead code because
>> of the page cache[3] (!PageUptodate I imagine). Whether this is
>> desirable or not is not really obvious. In this particular case, doing
>> fadvise calls in what would seem an optimum way, results in terribly
>> worse performance. So I'd suggest it's not really that advisable.
> Yes. The kernel readahead code by design will outperform simple
> fadvise in the case of clustered random reads. Imagine the access
> pattern 1, 3, 2, 6, 4, 9. fadvise will trigger 6 IOs literally. While

You mean it will trigger 6 IOs in the POSIX_FADV_RANDOM case or 
POSIX_FADV_WILLNEED case?

> kernel readahead will likely trigger 3 IOs for 1, 3, 2-9. Because on
> the page miss for 2, it will detect the existence of history page 1
> and do readahead properly. For hard disks, it's mainly the number of

If the first IO read 1, it will call page_cache_sync_read() since cache 
miss,
if (offset - (ra->prev_pos) >> PAGE_CACHE_SHIFT) <= 1UL)
     goto initial_readahead;
If the initial_readahead will be called? Because offset is equal to 1 
and ra->prev_pos is equal to 0. If my assume is true, 2 also will be 
readahead.

> IOs that matters. So even if kernel readahead loses some opportunities
> to do async IO and possibly loads some extra pages that will never be
> used, it still manges to perform much better.
>
>> The fix would lay in fadvise, I think. It should update readahead
>> tracking structures. Alternatively, one could try to do it in
>> do_generic_file_read, updating readahead on !PageUptodate or even on
>> page cache hits. I really don't have the expertise or time to go
>> modifying, building and testing the supposedly quite simple patch that
>> would fix this. It's mostly about the testing, in fact. So if someone
>> can comment or try by themselves, I guess it would really benefit
>> those relying on fadvise to fix this behavior.
> One possible solution is to try the context readahead at fadvise time
> to check the existence of history pages and do readahead accordingly.
>
> However it will introduce *real interferences* between kernel
> readahead and user prefetching. The original scheme is, once user
> space starts its own informed prefetching, kernel readahead will
> automatically stand out of the way.
>
> Thanks,
> Fengguang
>
>> Additionally, I would welcome any suggestions for ways to mitigate
>> this problem on current kernels, as the patch I'm working I'd like to
>> deploy with older kernels. Even if the latest kernel had this behavior
>> fixed, I'd still welcome some workarounds.
>>
>> More details on the benchmarks I've run can be found in the postgresql
>> dev ML archive[4].
>>
>> [0] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/fadvise.c#l95
>> [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/readahead.c#l211
>> [2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/readahead.c#l398
>> [3] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/filemap.c#l1081
>> [4] http://archives.postgresql.org/pgsql-hackers/2012-10/msg01139.php
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fadvise interferes with readahead
  2012-11-20 13:20   ` Jaegeuk Hanse
@ 2012-11-20 14:28     ` Fengguang Wu
  0 siblings, 0 replies; 12+ messages in thread
From: Fengguang Wu @ 2012-11-20 14:28 UTC (permalink / raw)
  To: Jaegeuk Hanse
  Cc: Claudio Freire, Andrew Morton, linux-kernel,
	Linux Memory Management List

> >Yes. The kernel readahead code by design will outperform simple
> >fadvise in the case of clustered random reads. Imagine the access
> >pattern 1, 3, 2, 6, 4, 9. fadvise will trigger 6 IOs literally. While
> 
> You mean it will trigger 6 IOs in the POSIX_FADV_RANDOM case or
> POSIX_FADV_WILLNEED case?

Yes. However note that I'm assuming 1-page sized and prefetch depth
fadvise(POSIX_FADV_WILLNEED) calls in this example. Given more
prefetch depth or good timing, there will be possibility for IO
requests (eg. 3 and 2) be merged at block layer.

> >kernel readahead will likely trigger 3 IOs for 1, 3, 2-9. Because on
> >the page miss for 2, it will detect the existence of history page 1
> >and do readahead properly. For hard disks, it's mainly the number of
> 
> If the first IO read 1, it will call page_cache_sync_read() since
> cache miss,
> if (offset - (ra->prev_pos) >> PAGE_CACHE_SHIFT) <= 1UL)
>     goto initial_readahead;
> If the initial_readahead will be called? Because offset is equal to
> 1 and ra->prev_pos is equal to 0. If my assume is true, 2 also will
> be readahead.

ra->prev_pos is initialized to -1 in file_ra_state_init(), so that if
the very first read is on page 0, it will trigger readahead.

Sorry I gave a confusing example. We may as well use 1001, 1003, 1002,
1006, 1004, 1009 as the example numbers.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fadvise interferes with readahead
  2012-11-20  8:04 ` fadvise interferes with readahead Fengguang Wu
  2012-11-20 13:20   ` Jaegeuk Hanse
@ 2012-11-20 13:34   ` Claudio Freire
  2012-11-20 14:58     ` Fengguang Wu
  2012-11-20 14:11   ` Jaegeuk Hanse
  2 siblings, 1 reply; 12+ messages in thread
From: Claudio Freire @ 2012-11-20 13:34 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: Andrew Morton, linux-kernel, Linux Memory Management List

On Tue, Nov 20, 2012 at 5:04 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
> Yes. The kernel readahead code by design will outperform simple
> fadvise in the case of clustered random reads. Imagine the access
> pattern 1, 3, 2, 6, 4, 9. fadvise will trigger 6 IOs literally. While
> kernel readahead will likely trigger 3 IOs for 1, 3, 2-9. Because on
> the page miss for 2, it will detect the existence of history page 1
> and do readahead properly. For hard disks, it's mainly the number of
> IOs that matters. So even if kernel readahead loses some opportunities
> to do async IO and possibly loads some extra pages that will never be
> used, it still manges to perform much better.
>
>> The fix would lay in fadvise, I think. It should update readahead
>> tracking structures. Alternatively, one could try to do it in
>> do_generic_file_read, updating readahead on !PageUptodate or even on
>> page cache hits. I really don't have the expertise or time to go
>> modifying, building and testing the supposedly quite simple patch that
>> would fix this. It's mostly about the testing, in fact. So if someone
>> can comment or try by themselves, I guess it would really benefit
>> those relying on fadvise to fix this behavior.
>
> One possible solution is to try the context readahead at fadvise time
> to check the existence of history pages and do readahead accordingly.
>
> However it will introduce *real interferences* between kernel
> readahead and user prefetching. The original scheme is, once user
> space starts its own informed prefetching, kernel readahead will
> automatically stand out of the way.

I understand that would seem like a reasonable design, but in this
particular case it doesn't seem to be. I propose that in most cases it
doesn't really work well as a design decision, to make fadvise work as
direct I/O. Precisely because fadvise is supposed to be a hint to let
the kernel make better decisions, and not a request to make the kernel
stop making decisions.

Any interference so introduced wouldn't be any worse than the
interference introduced by readahead over reads. I agree, if fadvise
were to trigger readahead, it could be bad for applications that don't
read what they say the will. But if cache hits were to simply update
readahead state, it would only mean that read calls behave the same
regardless of fadvise calls. I think that's worth pursuing.

I ought to try to prepare a patch for this to illustrate my point. Not
sure I'll be able to though.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fadvise interferes with readahead
  2012-11-20 13:34   ` Claudio Freire
@ 2012-11-20 14:58     ` Fengguang Wu
  2012-11-20 15:05       ` Claudio Freire
  2012-11-21  7:51       ` Jaegeuk Hanse
  0 siblings, 2 replies; 12+ messages in thread
From: Fengguang Wu @ 2012-11-20 14:58 UTC (permalink / raw)
  To: Claudio Freire; +Cc: Andrew Morton, linux-kernel, Linux Memory Management List

On Tue, Nov 20, 2012 at 10:34:11AM -0300, Claudio Freire wrote:
> On Tue, Nov 20, 2012 at 5:04 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
> > Yes. The kernel readahead code by design will outperform simple
> > fadvise in the case of clustered random reads. Imagine the access
> > pattern 1, 3, 2, 6, 4, 9. fadvise will trigger 6 IOs literally. While
> > kernel readahead will likely trigger 3 IOs for 1, 3, 2-9. Because on
> > the page miss for 2, it will detect the existence of history page 1
> > and do readahead properly. For hard disks, it's mainly the number of
> > IOs that matters. So even if kernel readahead loses some opportunities
> > to do async IO and possibly loads some extra pages that will never be
> > used, it still manges to perform much better.
> >
> >> The fix would lay in fadvise, I think. It should update readahead
> >> tracking structures. Alternatively, one could try to do it in
> >> do_generic_file_read, updating readahead on !PageUptodate or even on
> >> page cache hits. I really don't have the expertise or time to go
> >> modifying, building and testing the supposedly quite simple patch that
> >> would fix this. It's mostly about the testing, in fact. So if someone
> >> can comment or try by themselves, I guess it would really benefit
> >> those relying on fadvise to fix this behavior.
> >
> > One possible solution is to try the context readahead at fadvise time
> > to check the existence of history pages and do readahead accordingly.
> >
> > However it will introduce *real interferences* between kernel
> > readahead and user prefetching. The original scheme is, once user
> > space starts its own informed prefetching, kernel readahead will
> > automatically stand out of the way.
> 
> I understand that would seem like a reasonable design, but in this
> particular case it doesn't seem to be. I propose that in most cases it
> doesn't really work well as a design decision, to make fadvise work as
> direct I/O. Precisely because fadvise is supposed to be a hint to let
> the kernel make better decisions, and not a request to make the kernel
> stop making decisions.
> 
> Any interference so introduced wouldn't be any worse than the
> interference introduced by readahead over reads. I agree, if fadvise
> were to trigger readahead, it could be bad for applications that don't
> read what they say the will.

Right.

> But if cache hits were to simply update
> readahead state, it would only mean that read calls behave the same
> regardless of fadvise calls. I think that's worth pursuing.

Here you are describing an alternative solution that will somehow trap
into the readahead code even when, for example, the application is
accessing once and again an already cached file?  I'm afraid this will
add non-trivial overheads and is less attractive than the "readahead
on fadvise" solution.

> I ought to try to prepare a patch for this to illustrate my point. Not
> sure I'll be able to though.

I'd be glad to materialize the readahead on fadvise proposal, if there
are no obvious negative examples/cases.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fadvise interferes with readahead
  2012-11-20 14:58     ` Fengguang Wu
@ 2012-11-20 15:05       ` Claudio Freire
  2012-11-21  7:51       ` Jaegeuk Hanse
  1 sibling, 0 replies; 12+ messages in thread
From: Claudio Freire @ 2012-11-20 15:05 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: linux-kernel, Linux Memory Management List

On Tue, Nov 20, 2012 at 11:58 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
>
>> But if cache hits were to simply update
>> readahead state, it would only mean that read calls behave the same
>> regardless of fadvise calls. I think that's worth pursuing.
>
> Here you are describing an alternative solution that will somehow trap
> into the readahead code even when, for example, the application is
> accessing once and again an already cached file?  I'm afraid this will
> add non-trivial overheads and is less attractive than the "readahead
> on fadvise" solution.

Not for all cache hits, only those in state !PageUptodate, which are
I/O in progress, the case that hurts.

>> I ought to try to prepare a patch for this to illustrate my point. Not
>> sure I'll be able to though.
>
> I'd be glad to materialize the readahead on fadvise proposal, if there
> are no obvious negative examples/cases.

I don't expect a significant performance hit if only !PageUptodate
hits invoke readahead code. But I'm no kernel expert either.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fadvise interferes with readahead
  2012-11-20 14:58     ` Fengguang Wu
  2012-11-20 15:05       ` Claudio Freire
@ 2012-11-21  7:51       ` Jaegeuk Hanse
  2012-11-21  7:57         ` Fengguang Wu
  1 sibling, 1 reply; 12+ messages in thread
From: Jaegeuk Hanse @ 2012-11-21  7:51 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Claudio Freire, Andrew Morton, linux-kernel,
	Linux Memory Management List

On 11/20/2012 10:58 PM, Fengguang Wu wrote:
> On Tue, Nov 20, 2012 at 10:34:11AM -0300, Claudio Freire wrote:
>> On Tue, Nov 20, 2012 at 5:04 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
>>> Yes. The kernel readahead code by design will outperform simple
>>> fadvise in the case of clustered random reads. Imagine the access
>>> pattern 1, 3, 2, 6, 4, 9. fadvise will trigger 6 IOs literally. While
>>> kernel readahead will likely trigger 3 IOs for 1, 3, 2-9. Because on
>>> the page miss for 2, it will detect the existence of history page 1
>>> and do readahead properly. For hard disks, it's mainly the number of
>>> IOs that matters. So even if kernel readahead loses some opportunities
>>> to do async IO and possibly loads some extra pages that will never be
>>> used, it still manges to perform much better.
>>>
>>>> The fix would lay in fadvise, I think. It should update readahead
>>>> tracking structures. Alternatively, one could try to do it in
>>>> do_generic_file_read, updating readahead on !PageUptodate or even on
>>>> page cache hits. I really don't have the expertise or time to go
>>>> modifying, building and testing the supposedly quite simple patch that
>>>> would fix this. It's mostly about the testing, in fact. So if someone
>>>> can comment or try by themselves, I guess it would really benefit
>>>> those relying on fadvise to fix this behavior.
>>> One possible solution is to try the context readahead at fadvise time
>>> to check the existence of history pages and do readahead accordingly.
>>>
>>> However it will introduce *real interferences* between kernel
>>> readahead and user prefetching. The original scheme is, once user
>>> space starts its own informed prefetching, kernel readahead will
>>> automatically stand out of the way.
>> I understand that would seem like a reasonable design, but in this
>> particular case it doesn't seem to be. I propose that in most cases it
>> doesn't really work well as a design decision, to make fadvise work as
>> direct I/O. Precisely because fadvise is supposed to be a hint to let
>> the kernel make better decisions, and not a request to make the kernel
>> stop making decisions.
>>
>> Any interference so introduced wouldn't be any worse than the
>> interference introduced by readahead over reads. I agree, if fadvise
>> were to trigger readahead, it could be bad for applications that don't
>> read what they say the will.
> Right.
>
>> But if cache hits were to simply update
>> readahead state, it would only mean that read calls behave the same
>> regardless of fadvise calls. I think that's worth pursuing.
> Here you are describing an alternative solution that will somehow trap
> into the readahead code even when, for example, the application is
> accessing once and again an already cached file?  I'm afraid this will
> add non-trivial overheads and is less attractive than the "readahead
> on fadvise" solution.

Hi Fengguang,

Page cache sync readahead only triggered when cache miss, but if file 
has already cached, how can readahead be trigged again if the 
application is accessing once and again an already cached file.

Regards,
Jaegeuk

>
>> I ought to try to prepare a patch for this to illustrate my point. Not
>> sure I'll be able to though.
> I'd be glad to materialize the readahead on fadvise proposal, if there
> are no obvious negative examples/cases.
>
> Thanks,
> Fengguang
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fadvise interferes with readahead
  2012-11-21  7:51       ` Jaegeuk Hanse
@ 2012-11-21  7:57         ` Fengguang Wu
  0 siblings, 0 replies; 12+ messages in thread
From: Fengguang Wu @ 2012-11-21  7:57 UTC (permalink / raw)
  To: Jaegeuk Hanse
  Cc: Claudio Freire, Andrew Morton, linux-kernel,
	Linux Memory Management List

On Wed, Nov 21, 2012 at 03:51:03PM +0800, Jaegeuk Hanse wrote:
> On 11/20/2012 10:58 PM, Fengguang Wu wrote:
> >On Tue, Nov 20, 2012 at 10:34:11AM -0300, Claudio Freire wrote:
> >>On Tue, Nov 20, 2012 at 5:04 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
> >>>Yes. The kernel readahead code by design will outperform simple
> >>>fadvise in the case of clustered random reads. Imagine the access
> >>>pattern 1, 3, 2, 6, 4, 9. fadvise will trigger 6 IOs literally. While
> >>>kernel readahead will likely trigger 3 IOs for 1, 3, 2-9. Because on
> >>>the page miss for 2, it will detect the existence of history page 1
> >>>and do readahead properly. For hard disks, it's mainly the number of
> >>>IOs that matters. So even if kernel readahead loses some opportunities
> >>>to do async IO and possibly loads some extra pages that will never be
> >>>used, it still manges to perform much better.
> >>>
> >>>>The fix would lay in fadvise, I think. It should update readahead
> >>>>tracking structures. Alternatively, one could try to do it in
> >>>>do_generic_file_read, updating readahead on !PageUptodate or even on
> >>>>page cache hits. I really don't have the expertise or time to go
> >>>>modifying, building and testing the supposedly quite simple patch that
> >>>>would fix this. It's mostly about the testing, in fact. So if someone
> >>>>can comment or try by themselves, I guess it would really benefit
> >>>>those relying on fadvise to fix this behavior.
> >>>One possible solution is to try the context readahead at fadvise time
> >>>to check the existence of history pages and do readahead accordingly.
> >>>
> >>>However it will introduce *real interferences* between kernel
> >>>readahead and user prefetching. The original scheme is, once user
> >>>space starts its own informed prefetching, kernel readahead will
> >>>automatically stand out of the way.
> >>I understand that would seem like a reasonable design, but in this
> >>particular case it doesn't seem to be. I propose that in most cases it
> >>doesn't really work well as a design decision, to make fadvise work as
> >>direct I/O. Precisely because fadvise is supposed to be a hint to let
> >>the kernel make better decisions, and not a request to make the kernel
> >>stop making decisions.
> >>
> >>Any interference so introduced wouldn't be any worse than the
> >>interference introduced by readahead over reads. I agree, if fadvise
> >>were to trigger readahead, it could be bad for applications that don't
> >>read what they say the will.
> >Right.
> >
> >>But if cache hits were to simply update
> >>readahead state, it would only mean that read calls behave the same
> >>regardless of fadvise calls. I think that's worth pursuing.
> >Here you are describing an alternative solution that will somehow trap
> >into the readahead code even when, for example, the application is
> >accessing once and again an already cached file?  I'm afraid this will
> >add non-trivial overheads and is less attractive than the "readahead
> >on fadvise" solution.
> 
> Hi Fengguang,
> 
> Page cache sync readahead only triggered when cache miss, but if
> file has already cached, how can readahead be trigged again if the
> application is accessing once and again an already cached file.

The answer is opposite to your expectation: for an already cached
file, kernel readahead code won't be triggered at all, which is good
for avoid pointless overheads for the common repeated memory hot
accesses.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fadvise interferes with readahead
  2012-11-20  8:04 ` fadvise interferes with readahead Fengguang Wu
  2012-11-20 13:20   ` Jaegeuk Hanse
  2012-11-20 13:34   ` Claudio Freire
@ 2012-11-20 14:11   ` Jaegeuk Hanse
  2012-11-20 15:15     ` Fengguang Wu
  2 siblings, 1 reply; 12+ messages in thread
From: Jaegeuk Hanse @ 2012-11-20 14:11 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Claudio Freire, Andrew Morton, linux-kernel,
	Linux Memory Management List

On 11/20/2012 04:04 PM, Fengguang Wu wrote:
> Hi Claudio,
>
> Thanks for the detailed problem description!

Hi Fengguang,

Another question, thanks in advance.

What's the meaning of interleaved reads? If the first process readahead 
from start ~ start + size - async_size, another process read start + 
size - aysnc_size + 1, then what will happen? It seems that variable 
hit_readahead_marker is false, and related codes can't run, where I miss?

Regards,
Jaegeuk

>
> On Fri, Nov 09, 2012 at 04:30:32PM -0300, Claudio Freire wrote:
>> Hi. First of all, I'm not subscribed to this list, so I'd suggest all
>> replies copy me personally.
>>
>> I have been trying to implement some I/O pipelining in Postgres (ie:
>> read the next data page asynchronously while working on the current
>> page), and stumbled upon some puzzling behavior involving the
>> interaction between fadvise and readahead.
>>
>> I'm running kernel 3.0.0 (debian testing), on a single-disk system
>> which, though unsuitable for database workloads, is slow enough to let
>> me experiment with these read-ahead issues.
>>
>> Typical random I/O performance is on the order of between 150 r/s to
>> 200 r/s (ballpark 7200rpm I'd say), with thoughput around 1.5MB/s.
>> Sequential I/O can go up to 60MB/s, though it tends to be around 50.
>>
>> Now onto the problem. In order to parallelize I/O with computation,
>> I've made postgres fadvise(willneed) the pages it will read next. How
>> far ahead is configurable, and I've tested with a number of
>> configurations.
>>
>> The prefetching logic is aware of the OS and pg-specific cache, so it
>> will only fadvise a block once. fadvise calls will stay 1 (or a
>> configurable N) real I/O ahead of read calls, and there's no fadvising
>> of pages that won't be read eventually, in the same order. I checked
>> with strace.
>>
>> However, performance when fadvising drops considerably for a specific
>> yet common access pattern:
>>
>> When a nested loop with two index scans happens, access is random
>> locally, but eventually whole ranges of a file get read (in this
>> random order). Think block "1 6 8 100 34 299 3 7 68 24" followed by "2
>> 4 5 101 298 301". Though random, there are ranges there that can be
>> merged in one read-request.
>>
>> The kernel seems to do the merge by applying some form of readahead,
>> not sure if it's context, ondemand or adaptive readahead on the 3.0.0
>> kernel. Anyway, it seems to do readahead, as iostat says:
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> sda               0.00     4.40  224.20    2.00     4.16     0.03
>> 37.86     1.91    8.43    8.00   56.80   4.40  99.44
>>
>> (notice the avgrq-sz of 37.8)
>>
>> With fadvise calls, the thing looks a lot different:
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> sda               0.00    18.00  226.80    1.00     1.80     0.07
>> 16.81     4.00   17.52   17.23   82.40   4.39  99.92
> FYI, there is a readahead tracing/stats patchset that can provide far
> more accurate numbers about what's going on with readahead, which will
> help eliminate lots of the guess works here.
>
> https://lwn.net/Articles/472798/
>
>> Notice the avgrq-sz of 16.8. Assuming it's 512-byte sectors, that's
>> spot-on with a postgres page (8k). So, fadvise seems to carry out the
>> requests verbatim, while read manages to merge at least two of them.
>>
>> The random nature of reads makes me think the scheduler is failing to
>> merge the requests in both cases (rrqm/s = 0), because it only looks
>> at successive requests (I'm only guessing here though).
> I guess it's not a merging problem, but that the kernel readahead code
> manages to submit larger IO requests in the first place.
>
>> Looking into the kernel code, it seems the problem could be related to
>> how fadvise works in conjunction with readahead. fadvise seems to call
>> the function in readahead.c that schedules the asynchornous I/O[0]. It
>> doesn't seem subject to readahead logic itself[1], which in on itself
>> doesn't seem bad. But it does, I assume (not knowing the code that
>> well), prevent readahead logic[2] to eventually see the pattern. It
>> effectively disables readahead altogether.
> You are right. If user space does fadvise() and the fadvised pages
> cover all read() pages, the kernel readahead code will not run at all.
>
> So the title is actually a bit misleading. The kernel readahead won't
> interfere with user space prefetching at all. ;)
>
>> This, I theorize, may be because after the fadvise call starts an
>> async I/O on the page, further reads won't hit readahead code because
>> of the page cache[3] (!PageUptodate I imagine). Whether this is
>> desirable or not is not really obvious. In this particular case, doing
>> fadvise calls in what would seem an optimum way, results in terribly
>> worse performance. So I'd suggest it's not really that advisable.
> Yes. The kernel readahead code by design will outperform simple
> fadvise in the case of clustered random reads. Imagine the access
> pattern 1, 3, 2, 6, 4, 9. fadvise will trigger 6 IOs literally. While
> kernel readahead will likely trigger 3 IOs for 1, 3, 2-9. Because on
> the page miss for 2, it will detect the existence of history page 1
> and do readahead properly. For hard disks, it's mainly the number of
> IOs that matters. So even if kernel readahead loses some opportunities
> to do async IO and possibly loads some extra pages that will never be
> used, it still manges to perform much better.
>
>> The fix would lay in fadvise, I think. It should update readahead
>> tracking structures. Alternatively, one could try to do it in
>> do_generic_file_read, updating readahead on !PageUptodate or even on
>> page cache hits. I really don't have the expertise or time to go
>> modifying, building and testing the supposedly quite simple patch that
>> would fix this. It's mostly about the testing, in fact. So if someone
>> can comment or try by themselves, I guess it would really benefit
>> those relying on fadvise to fix this behavior.
> One possible solution is to try the context readahead at fadvise time
> to check the existence of history pages and do readahead accordingly.
>
> However it will introduce *real interferences* between kernel
> readahead and user prefetching. The original scheme is, once user
> space starts its own informed prefetching, kernel readahead will
> automatically stand out of the way.
>
> Thanks,
> Fengguang
>
>> Additionally, I would welcome any suggestions for ways to mitigate
>> this problem on current kernels, as the patch I'm working I'd like to
>> deploy with older kernels. Even if the latest kernel had this behavior
>> fixed, I'd still welcome some workarounds.
>>
>> More details on the benchmarks I've run can be found in the postgresql
>> dev ML archive[4].
>>
>> [0] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/fadvise.c#l95
>> [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/readahead.c#l211
>> [2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/readahead.c#l398
>> [3] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/filemap.c#l1081
>> [4] http://archives.postgresql.org/pgsql-hackers/2012-10/msg01139.php
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fadvise interferes with readahead
  2012-11-20 14:11   ` Jaegeuk Hanse
@ 2012-11-20 15:15     ` Fengguang Wu
  2012-11-21  6:51       ` Jaegeuk Hanse
  0 siblings, 1 reply; 12+ messages in thread
From: Fengguang Wu @ 2012-11-20 15:15 UTC (permalink / raw)
  To: Jaegeuk Hanse
  Cc: Claudio Freire, Andrew Morton, linux-kernel,
	Linux Memory Management List

On Tue, Nov 20, 2012 at 10:11:54PM +0800, Jaegeuk Hanse wrote:
> On 11/20/2012 04:04 PM, Fengguang Wu wrote:
> >Hi Claudio,
> >
> >Thanks for the detailed problem description!
> 
> Hi Fengguang,
> 
> Another question, thanks in advance.
> 
> What's the meaning of interleaved reads? If the first process

It's access patterns like

        1, 1001, 2, 1002, 3, 1003, ...

in which there are two (or more) mixed sequential read streams.

> readahead from start ~ start + size - async_size, another process
> read start + size - aysnc_size + 1, then what will happen? It seems
> that variable hit_readahead_marker is false, and related codes can't
> run, where I miss?

Yes hit_readahead_marker will be false. However on reading 1002,
hit_readahead_marker()/count_history_pages() will find the previous
page 1001 already in page cache and trigger context readahead.

Thanks,
Fengguang

> >On Fri, Nov 09, 2012 at 04:30:32PM -0300, Claudio Freire wrote:
> >>Hi. First of all, I'm not subscribed to this list, so I'd suggest all
> >>replies copy me personally.
> >>
> >>I have been trying to implement some I/O pipelining in Postgres (ie:
> >>read the next data page asynchronously while working on the current
> >>page), and stumbled upon some puzzling behavior involving the
> >>interaction between fadvise and readahead.
> >>
> >>I'm running kernel 3.0.0 (debian testing), on a single-disk system
> >>which, though unsuitable for database workloads, is slow enough to let
> >>me experiment with these read-ahead issues.
> >>
> >>Typical random I/O performance is on the order of between 150 r/s to
> >>200 r/s (ballpark 7200rpm I'd say), with thoughput around 1.5MB/s.
> >>Sequential I/O can go up to 60MB/s, though it tends to be around 50.
> >>
> >>Now onto the problem. In order to parallelize I/O with computation,
> >>I've made postgres fadvise(willneed) the pages it will read next. How
> >>far ahead is configurable, and I've tested with a number of
> >>configurations.
> >>
> >>The prefetching logic is aware of the OS and pg-specific cache, so it
> >>will only fadvise a block once. fadvise calls will stay 1 (or a
> >>configurable N) real I/O ahead of read calls, and there's no fadvising
> >>of pages that won't be read eventually, in the same order. I checked
> >>with strace.
> >>
> >>However, performance when fadvising drops considerably for a specific
> >>yet common access pattern:
> >>
> >>When a nested loop with two index scans happens, access is random
> >>locally, but eventually whole ranges of a file get read (in this
> >>random order). Think block "1 6 8 100 34 299 3 7 68 24" followed by "2
> >>4 5 101 298 301". Though random, there are ranges there that can be
> >>merged in one read-request.
> >>
> >>The kernel seems to do the merge by applying some form of readahead,
> >>not sure if it's context, ondemand or adaptive readahead on the 3.0.0
> >>kernel. Anyway, it seems to do readahead, as iostat says:
> >>
> >>Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> >>avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> >>sda               0.00     4.40  224.20    2.00     4.16     0.03
> >>37.86     1.91    8.43    8.00   56.80   4.40  99.44
> >>
> >>(notice the avgrq-sz of 37.8)
> >>
> >>With fadvise calls, the thing looks a lot different:
> >>
> >>Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> >>avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> >>sda               0.00    18.00  226.80    1.00     1.80     0.07
> >>16.81     4.00   17.52   17.23   82.40   4.39  99.92
> >FYI, there is a readahead tracing/stats patchset that can provide far
> >more accurate numbers about what's going on with readahead, which will
> >help eliminate lots of the guess works here.
> >
> >https://lwn.net/Articles/472798/
> >
> >>Notice the avgrq-sz of 16.8. Assuming it's 512-byte sectors, that's
> >>spot-on with a postgres page (8k). So, fadvise seems to carry out the
> >>requests verbatim, while read manages to merge at least two of them.
> >>
> >>The random nature of reads makes me think the scheduler is failing to
> >>merge the requests in both cases (rrqm/s = 0), because it only looks
> >>at successive requests (I'm only guessing here though).
> >I guess it's not a merging problem, but that the kernel readahead code
> >manages to submit larger IO requests in the first place.
> >
> >>Looking into the kernel code, it seems the problem could be related to
> >>how fadvise works in conjunction with readahead. fadvise seems to call
> >>the function in readahead.c that schedules the asynchornous I/O[0]. It
> >>doesn't seem subject to readahead logic itself[1], which in on itself
> >>doesn't seem bad. But it does, I assume (not knowing the code that
> >>well), prevent readahead logic[2] to eventually see the pattern. It
> >>effectively disables readahead altogether.
> >You are right. If user space does fadvise() and the fadvised pages
> >cover all read() pages, the kernel readahead code will not run at all.
> >
> >So the title is actually a bit misleading. The kernel readahead won't
> >interfere with user space prefetching at all. ;)
> >
> >>This, I theorize, may be because after the fadvise call starts an
> >>async I/O on the page, further reads won't hit readahead code because
> >>of the page cache[3] (!PageUptodate I imagine). Whether this is
> >>desirable or not is not really obvious. In this particular case, doing
> >>fadvise calls in what would seem an optimum way, results in terribly
> >>worse performance. So I'd suggest it's not really that advisable.
> >Yes. The kernel readahead code by design will outperform simple
> >fadvise in the case of clustered random reads. Imagine the access
> >pattern 1, 3, 2, 6, 4, 9. fadvise will trigger 6 IOs literally. While
> >kernel readahead will likely trigger 3 IOs for 1, 3, 2-9. Because on
> >the page miss for 2, it will detect the existence of history page 1
> >and do readahead properly. For hard disks, it's mainly the number of
> >IOs that matters. So even if kernel readahead loses some opportunities
> >to do async IO and possibly loads some extra pages that will never be
> >used, it still manges to perform much better.
> >
> >>The fix would lay in fadvise, I think. It should update readahead
> >>tracking structures. Alternatively, one could try to do it in
> >>do_generic_file_read, updating readahead on !PageUptodate or even on
> >>page cache hits. I really don't have the expertise or time to go
> >>modifying, building and testing the supposedly quite simple patch that
> >>would fix this. It's mostly about the testing, in fact. So if someone
> >>can comment or try by themselves, I guess it would really benefit
> >>those relying on fadvise to fix this behavior.
> >One possible solution is to try the context readahead at fadvise time
> >to check the existence of history pages and do readahead accordingly.
> >
> >However it will introduce *real interferences* between kernel
> >readahead and user prefetching. The original scheme is, once user
> >space starts its own informed prefetching, kernel readahead will
> >automatically stand out of the way.
> >
> >Thanks,
> >Fengguang
> >
> >>Additionally, I would welcome any suggestions for ways to mitigate
> >>this problem on current kernels, as the patch I'm working I'd like to
> >>deploy with older kernels. Even if the latest kernel had this behavior
> >>fixed, I'd still welcome some workarounds.
> >>
> >>More details on the benchmarks I've run can be found in the postgresql
> >>dev ML archive[4].
> >>
> >>[0] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/fadvise.c#l95
> >>[1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/readahead.c#l211
> >>[2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/readahead.c#l398
> >>[3] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/filemap.c#l1081
> >>[4] http://archives.postgresql.org/pgsql-hackers/2012-10/msg01139.php
> >>--
> >>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >>the body of a message to majordomo@vger.kernel.org
> >>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>Please read the FAQ at  http://www.tux.org/lkml/
> >--
> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >the body to majordomo@kvack.org.  For more info on Linux MM,
> >see: http://www.linux-mm.org/ .
> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fadvise interferes with readahead
  2012-11-20 15:15     ` Fengguang Wu
@ 2012-11-21  6:51       ` Jaegeuk Hanse
  2012-11-21  7:46         ` Fengguang Wu
  0 siblings, 1 reply; 12+ messages in thread
From: Jaegeuk Hanse @ 2012-11-21  6:51 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Claudio Freire, Andrew Morton, linux-kernel,
	Linux Memory Management List

On 11/20/2012 11:15 PM, Fengguang Wu wrote:
> On Tue, Nov 20, 2012 at 10:11:54PM +0800, Jaegeuk Hanse wrote:
>> On 11/20/2012 04:04 PM, Fengguang Wu wrote:
>>> Hi Claudio,
>>>
>>> Thanks for the detailed problem description!
>> Hi Fengguang,
>>
>> Another question, thanks in advance.
>>
>> What's the meaning of interleaved reads? If the first process
> It's access patterns like
>
>          1, 1001, 2, 1002, 3, 1003, ...
>
> in which there are two (or more) mixed sequential read streams.
>
>> readahead from start ~ start + size - async_size, another process
>> read start + size - aysnc_size + 1, then what will happen? It seems
>> that variable hit_readahead_marker is false, and related codes can't
>> run, where I miss?
> Yes hit_readahead_marker will be false. However on reading 1002,
> hit_readahead_marker()/count_history_pages() will find the previous
> page 1001 already in page cache and trigger context readahead.

Hi Fengguang,

Thanks for your explaination, the comment in function 
ondemand_readahead, "Hit a marked page without valid readahead state". 
What's the meaning of "without valid readahead state"?

Regards,
Jaegeuk

>
> Thanks,
> Fengguang
>
>>> On Fri, Nov 09, 2012 at 04:30:32PM -0300, Claudio Freire wrote:
>>>> Hi. First of all, I'm not subscribed to this list, so I'd suggest all
>>>> replies copy me personally.
>>>>
>>>> I have been trying to implement some I/O pipelining in Postgres (ie:
>>>> read the next data page asynchronously while working on the current
>>>> page), and stumbled upon some puzzling behavior involving the
>>>> interaction between fadvise and readahead.
>>>>
>>>> I'm running kernel 3.0.0 (debian testing), on a single-disk system
>>>> which, though unsuitable for database workloads, is slow enough to let
>>>> me experiment with these read-ahead issues.
>>>>
>>>> Typical random I/O performance is on the order of between 150 r/s to
>>>> 200 r/s (ballpark 7200rpm I'd say), with thoughput around 1.5MB/s.
>>>> Sequential I/O can go up to 60MB/s, though it tends to be around 50.
>>>>
>>>> Now onto the problem. In order to parallelize I/O with computation,
>>>> I've made postgres fadvise(willneed) the pages it will read next. How
>>>> far ahead is configurable, and I've tested with a number of
>>>> configurations.
>>>>
>>>> The prefetching logic is aware of the OS and pg-specific cache, so it
>>>> will only fadvise a block once. fadvise calls will stay 1 (or a
>>>> configurable N) real I/O ahead of read calls, and there's no fadvising
>>>> of pages that won't be read eventually, in the same order. I checked
>>>> with strace.
>>>>
>>>> However, performance when fadvising drops considerably for a specific
>>>> yet common access pattern:
>>>>
>>>> When a nested loop with two index scans happens, access is random
>>>> locally, but eventually whole ranges of a file get read (in this
>>>> random order). Think block "1 6 8 100 34 299 3 7 68 24" followed by "2
>>>> 4 5 101 298 301". Though random, there are ranges there that can be
>>>> merged in one read-request.
>>>>
>>>> The kernel seems to do the merge by applying some form of readahead,
>>>> not sure if it's context, ondemand or adaptive readahead on the 3.0.0
>>>> kernel. Anyway, it seems to do readahead, as iostat says:
>>>>
>>>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>>>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>>> sda               0.00     4.40  224.20    2.00     4.16     0.03
>>>> 37.86     1.91    8.43    8.00   56.80   4.40  99.44
>>>>
>>>> (notice the avgrq-sz of 37.8)
>>>>
>>>> With fadvise calls, the thing looks a lot different:
>>>>
>>>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>>>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>>> sda               0.00    18.00  226.80    1.00     1.80     0.07
>>>> 16.81     4.00   17.52   17.23   82.40   4.39  99.92
>>> FYI, there is a readahead tracing/stats patchset that can provide far
>>> more accurate numbers about what's going on with readahead, which will
>>> help eliminate lots of the guess works here.
>>>
>>> https://lwn.net/Articles/472798/
>>>
>>>> Notice the avgrq-sz of 16.8. Assuming it's 512-byte sectors, that's
>>>> spot-on with a postgres page (8k). So, fadvise seems to carry out the
>>>> requests verbatim, while read manages to merge at least two of them.
>>>>
>>>> The random nature of reads makes me think the scheduler is failing to
>>>> merge the requests in both cases (rrqm/s = 0), because it only looks
>>>> at successive requests (I'm only guessing here though).
>>> I guess it's not a merging problem, but that the kernel readahead code
>>> manages to submit larger IO requests in the first place.
>>>
>>>> Looking into the kernel code, it seems the problem could be related to
>>>> how fadvise works in conjunction with readahead. fadvise seems to call
>>>> the function in readahead.c that schedules the asynchornous I/O[0]. It
>>>> doesn't seem subject to readahead logic itself[1], which in on itself
>>>> doesn't seem bad. But it does, I assume (not knowing the code that
>>>> well), prevent readahead logic[2] to eventually see the pattern. It
>>>> effectively disables readahead altogether.
>>> You are right. If user space does fadvise() and the fadvised pages
>>> cover all read() pages, the kernel readahead code will not run at all.
>>>
>>> So the title is actually a bit misleading. The kernel readahead won't
>>> interfere with user space prefetching at all. ;)
>>>
>>>> This, I theorize, may be because after the fadvise call starts an
>>>> async I/O on the page, further reads won't hit readahead code because
>>>> of the page cache[3] (!PageUptodate I imagine). Whether this is
>>>> desirable or not is not really obvious. In this particular case, doing
>>>> fadvise calls in what would seem an optimum way, results in terribly
>>>> worse performance. So I'd suggest it's not really that advisable.
>>> Yes. The kernel readahead code by design will outperform simple
>>> fadvise in the case of clustered random reads. Imagine the access
>>> pattern 1, 3, 2, 6, 4, 9. fadvise will trigger 6 IOs literally. While
>>> kernel readahead will likely trigger 3 IOs for 1, 3, 2-9. Because on
>>> the page miss for 2, it will detect the existence of history page 1
>>> and do readahead properly. For hard disks, it's mainly the number of
>>> IOs that matters. So even if kernel readahead loses some opportunities
>>> to do async IO and possibly loads some extra pages that will never be
>>> used, it still manges to perform much better.
>>>
>>>> The fix would lay in fadvise, I think. It should update readahead
>>>> tracking structures. Alternatively, one could try to do it in
>>>> do_generic_file_read, updating readahead on !PageUptodate or even on
>>>> page cache hits. I really don't have the expertise or time to go
>>>> modifying, building and testing the supposedly quite simple patch that
>>>> would fix this. It's mostly about the testing, in fact. So if someone
>>>> can comment or try by themselves, I guess it would really benefit
>>>> those relying on fadvise to fix this behavior.
>>> One possible solution is to try the context readahead at fadvise time
>>> to check the existence of history pages and do readahead accordingly.
>>>
>>> However it will introduce *real interferences* between kernel
>>> readahead and user prefetching. The original scheme is, once user
>>> space starts its own informed prefetching, kernel readahead will
>>> automatically stand out of the way.
>>>
>>> Thanks,
>>> Fengguang
>>>
>>>> Additionally, I would welcome any suggestions for ways to mitigate
>>>> this problem on current kernels, as the patch I'm working I'd like to
>>>> deploy with older kernels. Even if the latest kernel had this behavior
>>>> fixed, I'd still welcome some workarounds.
>>>>
>>>> More details on the benchmarks I've run can be found in the postgresql
>>>> dev ML archive[4].
>>>>
>>>> [0] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/fadvise.c#l95
>>>> [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/readahead.c#l211
>>>> [2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/readahead.c#l398
>>>> [3] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/filemap.c#l1081
>>>> [4] http://archives.postgresql.org/pgsql-hackers/2012-10/msg01139.php
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> Please read the FAQ at  http://www.tux.org/lkml/
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fadvise interferes with readahead
  2012-11-21  6:51       ` Jaegeuk Hanse
@ 2012-11-21  7:46         ` Fengguang Wu
  0 siblings, 0 replies; 12+ messages in thread
From: Fengguang Wu @ 2012-11-21  7:46 UTC (permalink / raw)
  To: Jaegeuk Hanse
  Cc: Claudio Freire, Andrew Morton, linux-kernel,
	Linux Memory Management List

On Wed, Nov 21, 2012 at 02:51:41PM +0800, Jaegeuk Hanse wrote:
> On 11/20/2012 11:15 PM, Fengguang Wu wrote:
> >On Tue, Nov 20, 2012 at 10:11:54PM +0800, Jaegeuk Hanse wrote:
> >>On 11/20/2012 04:04 PM, Fengguang Wu wrote:
> >>>Hi Claudio,
> >>>
> >>>Thanks for the detailed problem description!
> >>Hi Fengguang,
> >>
> >>Another question, thanks in advance.
> >>
> >>What's the meaning of interleaved reads? If the first process
> >It's access patterns like
> >
> >         1, 1001, 2, 1002, 3, 1003, ...
> >
> >in which there are two (or more) mixed sequential read streams.
> >
> >>readahead from start ~ start + size - async_size, another process
> >>read start + size - aysnc_size + 1, then what will happen? It seems
> >>that variable hit_readahead_marker is false, and related codes can't
> >>run, where I miss?
> >Yes hit_readahead_marker will be false. However on reading 1002,
> >hit_readahead_marker()/count_history_pages() will find the previous
> >page 1001 already in page cache and trigger context readahead.
> 
> Hi Fengguang,
> 
> Thanks for your explaination, the comment in function
> ondemand_readahead, "Hit a marked page without valid readahead
> state". What's the meaning of "without valid readahead state"?

It normally happens in interleaved (or clustered random) reads. When
there are two read streams for one struct file, the one file_ra_state
won't be able to track state for the two streams. When the readahead
code is triggered for stream A, the file_ra_state may contain the
previous readahead window information for stream B. In this case
stream B's readahead state (ra->start, ra->size etc.) is invalid for
the current stream A that we are working on.

Thanks,
Fengguang

> >>>On Fri, Nov 09, 2012 at 04:30:32PM -0300, Claudio Freire wrote:
> >>>>Hi. First of all, I'm not subscribed to this list, so I'd suggest all
> >>>>replies copy me personally.
> >>>>
> >>>>I have been trying to implement some I/O pipelining in Postgres (ie:
> >>>>read the next data page asynchronously while working on the current
> >>>>page), and stumbled upon some puzzling behavior involving the
> >>>>interaction between fadvise and readahead.
> >>>>
> >>>>I'm running kernel 3.0.0 (debian testing), on a single-disk system
> >>>>which, though unsuitable for database workloads, is slow enough to let
> >>>>me experiment with these read-ahead issues.
> >>>>
> >>>>Typical random I/O performance is on the order of between 150 r/s to
> >>>>200 r/s (ballpark 7200rpm I'd say), with thoughput around 1.5MB/s.
> >>>>Sequential I/O can go up to 60MB/s, though it tends to be around 50.
> >>>>
> >>>>Now onto the problem. In order to parallelize I/O with computation,
> >>>>I've made postgres fadvise(willneed) the pages it will read next. How
> >>>>far ahead is configurable, and I've tested with a number of
> >>>>configurations.
> >>>>
> >>>>The prefetching logic is aware of the OS and pg-specific cache, so it
> >>>>will only fadvise a block once. fadvise calls will stay 1 (or a
> >>>>configurable N) real I/O ahead of read calls, and there's no fadvising
> >>>>of pages that won't be read eventually, in the same order. I checked
> >>>>with strace.
> >>>>
> >>>>However, performance when fadvising drops considerably for a specific
> >>>>yet common access pattern:
> >>>>
> >>>>When a nested loop with two index scans happens, access is random
> >>>>locally, but eventually whole ranges of a file get read (in this
> >>>>random order). Think block "1 6 8 100 34 299 3 7 68 24" followed by "2
> >>>>4 5 101 298 301". Though random, there are ranges there that can be
> >>>>merged in one read-request.
> >>>>
> >>>>The kernel seems to do the merge by applying some form of readahead,
> >>>>not sure if it's context, ondemand or adaptive readahead on the 3.0.0
> >>>>kernel. Anyway, it seems to do readahead, as iostat says:
> >>>>
> >>>>Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> >>>>avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> >>>>sda               0.00     4.40  224.20    2.00     4.16     0.03
> >>>>37.86     1.91    8.43    8.00   56.80   4.40  99.44
> >>>>
> >>>>(notice the avgrq-sz of 37.8)
> >>>>
> >>>>With fadvise calls, the thing looks a lot different:
> >>>>
> >>>>Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> >>>>avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> >>>>sda               0.00    18.00  226.80    1.00     1.80     0.07
> >>>>16.81     4.00   17.52   17.23   82.40   4.39  99.92
> >>>FYI, there is a readahead tracing/stats patchset that can provide far
> >>>more accurate numbers about what's going on with readahead, which will
> >>>help eliminate lots of the guess works here.
> >>>
> >>>https://lwn.net/Articles/472798/
> >>>
> >>>>Notice the avgrq-sz of 16.8. Assuming it's 512-byte sectors, that's
> >>>>spot-on with a postgres page (8k). So, fadvise seems to carry out the
> >>>>requests verbatim, while read manages to merge at least two of them.
> >>>>
> >>>>The random nature of reads makes me think the scheduler is failing to
> >>>>merge the requests in both cases (rrqm/s = 0), because it only looks
> >>>>at successive requests (I'm only guessing here though).
> >>>I guess it's not a merging problem, but that the kernel readahead code
> >>>manages to submit larger IO requests in the first place.
> >>>
> >>>>Looking into the kernel code, it seems the problem could be related to
> >>>>how fadvise works in conjunction with readahead. fadvise seems to call
> >>>>the function in readahead.c that schedules the asynchornous I/O[0]. It
> >>>>doesn't seem subject to readahead logic itself[1], which in on itself
> >>>>doesn't seem bad. But it does, I assume (not knowing the code that
> >>>>well), prevent readahead logic[2] to eventually see the pattern. It
> >>>>effectively disables readahead altogether.
> >>>You are right. If user space does fadvise() and the fadvised pages
> >>>cover all read() pages, the kernel readahead code will not run at all.
> >>>
> >>>So the title is actually a bit misleading. The kernel readahead won't
> >>>interfere with user space prefetching at all. ;)
> >>>
> >>>>This, I theorize, may be because after the fadvise call starts an
> >>>>async I/O on the page, further reads won't hit readahead code because
> >>>>of the page cache[3] (!PageUptodate I imagine). Whether this is
> >>>>desirable or not is not really obvious. In this particular case, doing
> >>>>fadvise calls in what would seem an optimum way, results in terribly
> >>>>worse performance. So I'd suggest it's not really that advisable.
> >>>Yes. The kernel readahead code by design will outperform simple
> >>>fadvise in the case of clustered random reads. Imagine the access
> >>>pattern 1, 3, 2, 6, 4, 9. fadvise will trigger 6 IOs literally. While
> >>>kernel readahead will likely trigger 3 IOs for 1, 3, 2-9. Because on
> >>>the page miss for 2, it will detect the existence of history page 1
> >>>and do readahead properly. For hard disks, it's mainly the number of
> >>>IOs that matters. So even if kernel readahead loses some opportunities
> >>>to do async IO and possibly loads some extra pages that will never be
> >>>used, it still manges to perform much better.
> >>>
> >>>>The fix would lay in fadvise, I think. It should update readahead
> >>>>tracking structures. Alternatively, one could try to do it in
> >>>>do_generic_file_read, updating readahead on !PageUptodate or even on
> >>>>page cache hits. I really don't have the expertise or time to go
> >>>>modifying, building and testing the supposedly quite simple patch that
> >>>>would fix this. It's mostly about the testing, in fact. So if someone
> >>>>can comment or try by themselves, I guess it would really benefit
> >>>>those relying on fadvise to fix this behavior.
> >>>One possible solution is to try the context readahead at fadvise time
> >>>to check the existence of history pages and do readahead accordingly.
> >>>
> >>>However it will introduce *real interferences* between kernel
> >>>readahead and user prefetching. The original scheme is, once user
> >>>space starts its own informed prefetching, kernel readahead will
> >>>automatically stand out of the way.
> >>>
> >>>Thanks,
> >>>Fengguang
> >>>
> >>>>Additionally, I would welcome any suggestions for ways to mitigate
> >>>>this problem on current kernels, as the patch I'm working I'd like to
> >>>>deploy with older kernels. Even if the latest kernel had this behavior
> >>>>fixed, I'd still welcome some workarounds.
> >>>>
> >>>>More details on the benchmarks I've run can be found in the postgresql
> >>>>dev ML archive[4].
> >>>>
> >>>>[0] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/fadvise.c#l95
> >>>>[1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/readahead.c#l211
> >>>>[2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/readahead.c#l398
> >>>>[3] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=mm/filemap.c#l1081
> >>>>[4] http://archives.postgresql.org/pgsql-hackers/2012-10/msg01139.php
> >>>>--
> >>>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >>>>the body of a message to majordomo@vger.kernel.org
> >>>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>Please read the FAQ at  http://www.tux.org/lkml/
> >>>--
> >>>To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >>>the body to majordomo@kvack.org.  For more info on Linux MM,
> >>>see: http://www.linux-mm.org/ .
> >>>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-11-21  7:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAGTBQpaDR4+V5b1AwAVyuVLu5rkU=Wc1WeUdLu5ag=WOk5oJzQ@mail.gmail.com>
2012-11-20  8:04 ` fadvise interferes with readahead Fengguang Wu
2012-11-20 13:20   ` Jaegeuk Hanse
2012-11-20 14:28     ` Fengguang Wu
2012-11-20 13:34   ` Claudio Freire
2012-11-20 14:58     ` Fengguang Wu
2012-11-20 15:05       ` Claudio Freire
2012-11-21  7:51       ` Jaegeuk Hanse
2012-11-21  7:57         ` Fengguang Wu
2012-11-20 14:11   ` Jaegeuk Hanse
2012-11-20 15:15     ` Fengguang Wu
2012-11-21  6:51       ` Jaegeuk Hanse
2012-11-21  7:46         ` Fengguang Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).