* [PATCH] mm readahead: Fix sys_readahead breakage by reverting 2MB limit (bug 79111) @ 2014-07-03 13:02 Raghavendra K T 2014-07-03 15:41 ` Linus Torvalds 0 siblings, 1 reply; 14+ messages in thread From: Raghavendra K T @ 2014-07-03 13:02 UTC (permalink / raw) To: Andrew Morton, Fengguang Wu, David Cohen, Al Viro, Damien Ramonda, Jan Kara, rientjes, Linus, nacc Cc: linux-mm, linux-kernel, Raghavendra K T commit 6d2be915 (mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead pages) imposed 2MB limits to readahed that yielded good performance since it avoided unnecessay page caching. However it broke sys_readahead semantics: 'readahead() blocks until the specified data has been read' This patch still retains the fix for memoryless nodes which used to return zero and limits its readahead to 2MB to avoid unnecessary page cache thrashing but reverts to old sanitized readahead for cpu with memory nodes. link: https://bugzilla.kernel.org/show_bug.cgi?id=79111 Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> --- mm/readahead.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/mm/readahead.c b/mm/readahead.c index 0ca36a7..4514cf6 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -239,6 +239,24 @@ int force_page_cache_readahead(struct address_space *mapping, struct file *filp, */ unsigned long max_sane_readahead(unsigned long nr) { + unsigned long local_free_page; + int nid; + + nid = numa_node_id(); + if (node_present_pages(nid)) { + /* + * We sanitize readahead size depending on free memory in + * the local node. + */ + local_free_page = node_page_state(nid, NR_INACTIVE_FILE) + + node_page_state(nid, NR_FREE_PAGES); + return min(nr, local_free_page / 2); + } + /* + * Readahead onto remote memory is better than no readahead when local + * numa node does not have memory. We limit the readahead to 2MB to + * avoid trashing page cache. + */ return min(nr, MAX_READAHEAD); } -- 1.7.11.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] mm readahead: Fix sys_readahead breakage by reverting 2MB limit (bug 79111) 2014-07-03 13:02 [PATCH] mm readahead: Fix sys_readahead breakage by reverting 2MB limit (bug 79111) Raghavendra K T @ 2014-07-03 15:41 ` Linus Torvalds 2014-07-03 18:11 ` Raghavendra K T 0 siblings, 1 reply; 14+ messages in thread From: Linus Torvalds @ 2014-07-03 15:41 UTC (permalink / raw) To: Raghavendra K T Cc: Andrew Morton, Fengguang Wu, David Cohen, Al Viro, Damien Ramonda, Jan Kara, David Rientjes, Nishanth Aravamudan, linux-mm, Linux Kernel Mailing List On Thu, Jul 3, 2014 at 6:02 AM, Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> wrote: > > However it broke sys_readahead semantics: 'readahead() blocks until the specified > data has been read' What? Where did you find that insane sentence? And where did you find an application that depends on that totally insane semantics that sure as hell was never intentional. If this comes from some man-page, then the man-page is just full of sh*t, and is being crazy. The whole and *only* point of readahead() is that it does *not* block, and you can do it across multiple files. So NAK NAK NAK. This is insane and completely wrong. And the bugzilla is crazy too. Why would anybody think that readahead() is the same as read()? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm readahead: Fix sys_readahead breakage by reverting 2MB limit (bug 79111) 2014-07-03 15:41 ` Linus Torvalds @ 2014-07-03 18:11 ` Raghavendra K T 2014-07-03 18:22 ` Linus Torvalds 0 siblings, 1 reply; 14+ messages in thread From: Raghavendra K T @ 2014-07-03 18:11 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Fengguang Wu, David Cohen, Al Viro, Damien Ramonda, Jan Kara, David Rientjes, Nishanth Aravamudan, linux-mm, Linux Kernel Mailing List On 07/03/2014 09:11 PM, Linus Torvalds wrote: > On Thu, Jul 3, 2014 at 6:02 AM, Raghavendra K T > <raghavendra.kt@linux.vnet.ibm.com> wrote: >> >> However it broke sys_readahead semantics: 'readahead() blocks until the specified >> data has been read' > > What? Where did you find that insane sentence? And where did you find > an application that depends on that totally insane semantics that sure > as hell was never intentional. > > If this comes from some man-page, Yes it is. then the man-page is just full of > sh*t, and is being crazy. The whole and *only* point of readahead() is > that it does *not* block, and you can do it across multiple files. Entirely agree. Infact I also had the strong opinion that we should rather change man page instead of making Linux performing badly by doing large unnecessary readahead when there is no actual read, and performance numbers have proved that. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm readahead: Fix sys_readahead breakage by reverting 2MB limit (bug 79111) 2014-07-03 18:11 ` Raghavendra K T @ 2014-07-03 18:22 ` Linus Torvalds 2014-07-03 18:29 ` Linus Torvalds 2014-07-03 18:35 ` Raghavendra K T 0 siblings, 2 replies; 14+ messages in thread From: Linus Torvalds @ 2014-07-03 18:22 UTC (permalink / raw) To: Raghavendra K T Cc: Andrew Morton, Fengguang Wu, David Cohen, Al Viro, Damien Ramonda, Jan Kara, David Rientjes, Nishanth Aravamudan, linux-mm, Linux Kernel Mailing List On Thu, Jul 3, 2014 at 11:11 AM, Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> wrote: >> >> What? Where did you find that insane sentence? And where did you find >> an application that depends on that totally insane semantics that sure >> as hell was never intentional. >> >> If this comes from some man-page, > > Yes it is. Ok, googling actually finds a fairly recent patch to fix it http://www.spinics.net/lists/linux-mm/msg70517.html and several much older "that's not true" comments. I wonder how it ever happened, because it has never actually been true that readahead() has been synchronous. It *has* been true that large read-aheads have started so much IO that just the act of starting more would wait for request allocations etc to free up, so it's not like it has ever been entirely asynchonous either, but it definitely has *never* been synchronous afaik. The new behavior just means that you can't trigger the "request queues are all so full that we end up blocking waiting for new request allocations" quite as easily. That said, the bugzilla entry you mentioned does mention "can't boot 3.14 now". I'm not sure what the meaning of that sentence is, though. Does it mean "can't boot 3.14 to test it because the machine is busy", or is it a typo and really meant 3.15, and that some bootup script *depended* on readahead()? I don't know. It seems strange. It also seems like it would be very hard to even show this semantically (aside from timing, and looking at how much of the cache is used like the test-program does). So the bugzilla entry worries me a bit - we definitely do not want to regress in case somebody really relied on timing - but without more specific information I still think the real bug is just in the man-page. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm readahead: Fix sys_readahead breakage by reverting 2MB limit (bug 79111) 2014-07-03 18:22 ` Linus Torvalds @ 2014-07-03 18:29 ` Linus Torvalds 2014-07-03 18:38 ` Raghavendra K T 2014-07-03 19:43 ` John Stoffel 2014-07-03 18:35 ` Raghavendra K T 1 sibling, 2 replies; 14+ messages in thread From: Linus Torvalds @ 2014-07-03 18:29 UTC (permalink / raw) To: Raghavendra K T Cc: Andrew Morton, Fengguang Wu, David Cohen, Al Viro, Damien Ramonda, Jan Kara, David Rientjes, Nishanth Aravamudan, linux-mm, Linux Kernel Mailing List On Thu, Jul 3, 2014 at 11:22 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So the bugzilla entry worries me a bit - we definitely do not want to > regress in case somebody really relied on timing - but without more > specific information I still think the real bug is just in the > man-page. Side note: the 2MB limit may be too small. 2M is peanuts on modern machines, even for fairly slow IO, and there are lots of files (like glibc etc) that people might want to read-ahead during boot. We already do bigger read-ahead if people just do "read()" system calls. So I could certainly imagine that we should increase it. I do *not* think we should bow down to insane man-pages that have always been wrong, though, and I don't think we should increase it to "let's just read-ahead a whole ISO image" kind of sizes.. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm readahead: Fix sys_readahead breakage by reverting 2MB limit (bug 79111) 2014-07-03 18:29 ` Linus Torvalds @ 2014-07-03 18:38 ` Raghavendra K T 2014-07-03 18:50 ` Raghavendra K T 2014-07-03 18:53 ` Linus Torvalds 2014-07-03 19:43 ` John Stoffel 1 sibling, 2 replies; 14+ messages in thread From: Raghavendra K T @ 2014-07-03 18:38 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Fengguang Wu, David Cohen, Al Viro, Damien Ramonda, Jan Kara, David Rientjes, Nishanth Aravamudan, linux-mm, Linux Kernel Mailing List On 07/03/2014 11:59 PM, Linus Torvalds wrote: > On Thu, Jul 3, 2014 at 11:22 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> So the bugzilla entry worries me a bit - we definitely do not want to >> regress in case somebody really relied on timing - but without more >> specific information I still think the real bug is just in the >> man-page. > > Side note: the 2MB limit may be too small. 2M is peanuts on modern > machines, even for fairly slow IO, and there are lots of files (like > glibc etc) that people might want to read-ahead during boot. We > already do bigger read-ahead if people just do "read()" system calls. > So I could certainly imagine that we should increase it. > > I do *not* think we should bow down to insane man-pages that have > always been wrong, though, and I don't think we should increase it to > "let's just read-ahead a whole ISO image" kind of sizes.. Okay, how about something like 256MB? I would be happy to send a patch for that change. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm readahead: Fix sys_readahead breakage by reverting 2MB limit (bug 79111) 2014-07-03 18:38 ` Raghavendra K T @ 2014-07-03 18:50 ` Raghavendra K T 2014-07-03 18:53 ` Linus Torvalds 1 sibling, 0 replies; 14+ messages in thread From: Raghavendra K T @ 2014-07-03 18:50 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Fengguang Wu, David Cohen, Al Viro, Damien Ramonda, Jan Kara, David Rientjes, Nishanth Aravamudan, linux-mm, Linux Kernel Mailing List On 07/04/2014 12:08 AM, Raghavendra K T wrote: > On 07/03/2014 11:59 PM, Linus Torvalds wrote: >> On Thu, Jul 3, 2014 at 11:22 AM, Linus Torvalds >> <torvalds@linux-foundation.org> wrote: [...] t. >> >> I do *not* think we should bow down to insane man-pages that have >> always been wrong, though, and I don't think we should increase it to >> "let's just read-ahead a whole ISO image" kind of sizes.. > > Okay, how about something like 256MB? I would be happy to send a patch > for that change. Sorry I was too fast. I think some thing like 16MB? I 'll send patch with that (unless you different size in mind). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm readahead: Fix sys_readahead breakage by reverting 2MB limit (bug 79111) 2014-07-03 18:38 ` Raghavendra K T 2014-07-03 18:50 ` Raghavendra K T @ 2014-07-03 18:53 ` Linus Torvalds 2014-07-03 18:56 ` Raghavendra K T 2014-07-03 19:05 ` Dave Jones 1 sibling, 2 replies; 14+ messages in thread From: Linus Torvalds @ 2014-07-03 18:53 UTC (permalink / raw) To: Raghavendra K T Cc: Andrew Morton, Fengguang Wu, David Cohen, Al Viro, Damien Ramonda, Jan Kara, David Rientjes, Nishanth Aravamudan, linux-mm, Linux Kernel Mailing List On Thu, Jul 3, 2014 at 11:38 AM, Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> wrote: > > Okay, how about something like 256MB? I would be happy to send a patch > for that change. I'd like to see some performance numbers. I know at least Fedora uses "readahead()" in the startup scripts, do we have any performance numbers for that? Also, I think 256MB is actually excessive. People still do have really slow devices out there. USB-2 is still common, and drives that read at 15MB/s are not unusual. Do we really want to do readahead() that can take tens of seconds (and *will* take tens of seconds sycnhronously, because the IO requests fill up). So I wouldn't go from 2 to 256. That seems like an excessive jump. I was more thinking in the 4-8MB range. But even then, I think we should always have technical reasons (ie preferably numbers) for the change, not just randomly change it. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm readahead: Fix sys_readahead breakage by reverting 2MB limit (bug 79111) 2014-07-03 18:53 ` Linus Torvalds @ 2014-07-03 18:56 ` Raghavendra K T 2014-07-03 19:05 ` Dave Jones 1 sibling, 0 replies; 14+ messages in thread From: Raghavendra K T @ 2014-07-03 18:56 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Fengguang Wu, David Cohen, Al Viro, Damien Ramonda, Jan Kara, David Rientjes, Nishanth Aravamudan, linux-mm, Linux Kernel Mailing List On 07/04/2014 12:23 AM, Linus Torvalds wrote: > On Thu, Jul 3, 2014 at 11:38 AM, Raghavendra K T > <raghavendra.kt@linux.vnet.ibm.com> wrote: >> >> Okay, how about something like 256MB? I would be happy to send a patch >> for that change. > > I'd like to see some performance numbers. I know at least Fedora uses > "readahead()" in the startup scripts, do we have any performance > numbers for that? > > Also, I think 256MB is actually excessive. People still do have really > slow devices out there. USB-2 is still common, and drives that read at > 15MB/s are not unusual. Do we really want to do readahead() that can > take tens of seconds (and *will* take tens of seconds sycnhronously, > because the IO requests fill up). > > So I wouldn't go from 2 to 256. That seems like an excessive jump. I > was more thinking in the 4-8MB range. But even then, I think we should > always have technical reasons (ie preferably numbers) for the change, > not just randomly change it. Okay. I 'll take some time to do the analysis. I think we also should keep in mind of possible remote readahead that would cause unnecessary penalty. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm readahead: Fix sys_readahead breakage by reverting 2MB limit (bug 79111) 2014-07-03 18:53 ` Linus Torvalds 2014-07-03 18:56 ` Raghavendra K T @ 2014-07-03 19:05 ` Dave Jones 1 sibling, 0 replies; 14+ messages in thread From: Dave Jones @ 2014-07-03 19:05 UTC (permalink / raw) To: Linus Torvalds Cc: Raghavendra K T, Andrew Morton, Fengguang Wu, David Cohen, Al Viro, Damien Ramonda, Jan Kara, David Rientjes, Nishanth Aravamudan, linux-mm, Linux Kernel Mailing List On Thu, Jul 03, 2014 at 11:53:57AM -0700, Linus Torvalds wrote: > On Thu, Jul 3, 2014 at 11:38 AM, Raghavendra K T > <raghavendra.kt@linux.vnet.ibm.com> wrote: > > > > Okay, how about something like 256MB? I would be happy to send a patch > > for that change. > > I'd like to see some performance numbers. I know at least Fedora uses > "readahead()" in the startup scripts, do we have any performance > numbers for that? this got rolled up into systemd a while ago, so it's not just Fedora. re: numbers, systemd-analyze and systemd-bootchart look like the way to figure that out.. https://wiki.archlinux.org/index.php/Improve_boot_performance Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm readahead: Fix sys_readahead breakage by reverting 2MB limit (bug 79111) 2014-07-03 18:29 ` Linus Torvalds 2014-07-03 18:38 ` Raghavendra K T @ 2014-07-03 19:43 ` John Stoffel 2014-07-03 21:58 ` Linus Torvalds 1 sibling, 1 reply; 14+ messages in thread From: John Stoffel @ 2014-07-03 19:43 UTC (permalink / raw) To: Linus Torvalds Cc: Raghavendra K T, Andrew Morton, Fengguang Wu, David Cohen, Al Viro, Damien Ramonda, Jan Kara, David Rientjes, Nishanth Aravamudan, linux-mm, Linux Kernel Mailing List >>>>> "Linus" == Linus Torvalds <torvalds@linux-foundation.org> writes: Linus> On Thu, Jul 3, 2014 at 11:22 AM, Linus Torvalds Linus> <torvalds@linux-foundation.org> wrote: >> >> So the bugzilla entry worries me a bit - we definitely do not want to >> regress in case somebody really relied on timing - but without more >> specific information I still think the real bug is just in the >> man-page. Linus> Side note: the 2MB limit may be too small. 2M is peanuts on modern Linus> machines, even for fairly slow IO, and there are lots of files (like Linus> glibc etc) that people might want to read-ahead during boot. We Linus> already do bigger read-ahead if people just do "read()" system calls. Linus> So I could certainly imagine that we should increase it. Linus> I do *not* think we should bow down to insane man-pages that have Linus> always been wrong, though, and I don't think we should increase it to Linus> "let's just read-ahead a whole ISO image" kind of sizes.. This is one of those perenial questions of how to tune this. I agree we should increase the number, but shouldn't it be based on both the amount of memory in the machine, number of devices (or is it all just one big pool?) and the speed of the actual device doing readahead? Doesn't make sense to do 32mb of readahead on a USB 1.1 thumb drive or even a CDROM. But maybe it does for USB3 thumb drives? John -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm readahead: Fix sys_readahead breakage by reverting 2MB limit (bug 79111) 2014-07-03 19:43 ` John Stoffel @ 2014-07-03 21:58 ` Linus Torvalds 2014-10-03 20:57 ` Rafael Aquini 0 siblings, 1 reply; 14+ messages in thread From: Linus Torvalds @ 2014-07-03 21:58 UTC (permalink / raw) To: John Stoffel Cc: Raghavendra K T, Andrew Morton, Fengguang Wu, David Cohen, Al Viro, Damien Ramonda, Jan Kara, David Rientjes, Nishanth Aravamudan, linux-mm, Linux Kernel Mailing List On Thu, Jul 3, 2014 at 12:43 PM, John Stoffel <john@stoffel.org> wrote: > > This is one of those perenial questions of how to tune this. I agree > we should increase the number, but shouldn't it be based on both the > amount of memory in the machine, number of devices (or is it all just > one big pool?) and the speed of the actual device doing readahead? Sure. But I don't trust the throughput data for the backing device at all, especially early at boot. We're supposed to work it out for writeback over time (based on device contention etc), but I never saw that working, and for reading I don't think we have even any code to do so. And trying to be clever and basing the read-ahead size on the node memory size was what caused problems to begin with (with memory-less nodes) that then made us just hardcode the maximum. So there are certainly better options - in theory. In practice, I think we don't really care enough, and the better options are questionably implementable. I _suspect_ the right number is in that 2-8MB range, and I would prefer to keep it at the low end at least until somebody really has numbers (and preferably from different real-life situations). I also suspect that read-ahead is less of an issue with non-rotational storage in general, since the only real reason for it tends to be latency reduction (particularly the "readahead()" kind of big-hammer thing that is really just useful for priming caches). So there's some argument to say that it's getting less and less important. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm readahead: Fix sys_readahead breakage by reverting 2MB limit (bug 79111) 2014-07-03 21:58 ` Linus Torvalds @ 2014-10-03 20:57 ` Rafael Aquini 0 siblings, 0 replies; 14+ messages in thread From: Rafael Aquini @ 2014-10-03 20:57 UTC (permalink / raw) To: Linus Torvalds Cc: John Stoffel, Raghavendra K T, Andrew Morton, Fengguang Wu, David Cohen, Al Viro, Damien Ramonda, Jan Kara, David Rientjes, Nishanth Aravamudan, linux-mm, linux-kernel, Larry Woodman On Thu, Jul 03, 2014 at 02:58:54PM -0700, Linus Torvalds wrote: > On Thu, Jul 3, 2014 at 12:43 PM, John Stoffel <john@stoffel.org> wrote: > > > > This is one of those perenial questions of how to tune this. I agree > > we should increase the number, but shouldn't it be based on both the > > amount of memory in the machine, number of devices (or is it all just > > one big pool?) and the speed of the actual device doing readahead? > > Sure. But I don't trust the throughput data for the backing device at > all, especially early at boot. We're supposed to work it out for > writeback over time (based on device contention etc), but I never saw > that working, and for reading I don't think we have even any code to > do so. > > And trying to be clever and basing the read-ahead size on the node > memory size was what caused problems to begin with (with memory-less > nodes) that then made us just hardcode the maximum. > > So there are certainly better options - in theory. In practice, I > think we don't really care enough, and the better options are > questionably implementable. > > I _suspect_ the right number is in that 2-8MB range, and I would > prefer to keep it at the low end at least until somebody really has > numbers (and preferably from different real-life situations). > > I also suspect that read-ahead is less of an issue with non-rotational > storage in general, since the only real reason for it tends to be > latency reduction (particularly the "readahead()" kind of big-hammer > thing that is really just useful for priming caches). So there's some > argument to say that it's getting less and less important. > I believe you're right, but yet we sort of broke the expectation for deliberately issued readaheads, and that is what I believe that fellow complained (poorly worded) at the bugzilla ticket. We recently got the following report: https://bugzilla.redhat.com/show_bug.cgi?id=1103240 which pretty much is the same thing reported at kernel's BZ. I did some empirical tests with iozone (forcing madv_willneed behaviour) as well as I double-checked numbers our performance team got while running their regression tests and I, honestly, couldn't see any change for better or worse that could be directly related to the change in question. Other than setting a hard ceiling of to 2MB for any issued readahead, which might be seen as trouble for certain users, there seems to be no other measurable loss here. OTOH, the tangible gain after the change is having the readahead working for NUMA layouts where some CPUs are within a memoryless node. I believe we could take David's (Rientjes) early suggestion and, instead of fixing a hard limit on max_sane_readahead(), change it to replace numa_node_id() by numa_mem_id() calls and follow up the CONFIG_HAVE_MEMORYLESS_NODES requirements on PPC to have it working properly (which seems to be the reason that approach was left aside). Best regards, -- Rafael -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm readahead: Fix sys_readahead breakage by reverting 2MB limit (bug 79111) 2014-07-03 18:22 ` Linus Torvalds 2014-07-03 18:29 ` Linus Torvalds @ 2014-07-03 18:35 ` Raghavendra K T 1 sibling, 0 replies; 14+ messages in thread From: Raghavendra K T @ 2014-07-03 18:35 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Fengguang Wu, David Cohen, Al Viro, Damien Ramonda, Jan Kara, David Rientjes, Nishanth Aravamudan, linux-mm, Linux Kernel Mailing List On 07/03/2014 11:52 PM, Linus Torvalds wrote: > On Thu, Jul 3, 2014 at 11:11 AM, Raghavendra K T > <raghavendra.kt@linux.vnet.ibm.com> wrote: >>> >>> If this comes from some man-page, >> >> Yes it is. > > Ok, googling actually finds a fairly recent patch to fix it > > http://www.spinics.net/lists/linux-mm/msg70517.html > > and several much older "that's not true" comments. Thanks. I had missed that. > > That said, the bugzilla entry you mentioned does mention "can't boot > 3.14 now". I'm not sure what the meaning of that sentence is, though. > Does it mean "can't boot 3.14 to test it because the machine is busy", > or is it a typo and really meant 3.15, and that some bootup script > *depended* on readahead()? I don't know. It seems strange. I think your guess is right, it meant to say "I can't boot it anymore since I already upgraded to 3.15", because eventually bootup script (if it is) should have to read IIUC. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-10-03 20:57 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-03 13:02 [PATCH] mm readahead: Fix sys_readahead breakage by reverting 2MB limit (bug 79111) Raghavendra K T 2014-07-03 15:41 ` Linus Torvalds 2014-07-03 18:11 ` Raghavendra K T 2014-07-03 18:22 ` Linus Torvalds 2014-07-03 18:29 ` Linus Torvalds 2014-07-03 18:38 ` Raghavendra K T 2014-07-03 18:50 ` Raghavendra K T 2014-07-03 18:53 ` Linus Torvalds 2014-07-03 18:56 ` Raghavendra K T 2014-07-03 19:05 ` Dave Jones 2014-07-03 19:43 ` John Stoffel 2014-07-03 21:58 ` Linus Torvalds 2014-10-03 20:57 ` Rafael Aquini 2014-07-03 18:35 ` Raghavendra K T
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).