* why are some low-level MM routines being exported? @ 2010-04-04 15:27 Robert P. J. Day 2010-04-04 15:59 ` Minchan Kim 0 siblings, 1 reply; 17+ messages in thread From: Robert P. J. Day @ 2010-04-04 15:27 UTC (permalink / raw) To: linux-mm perusing the code in mm/filemap.c and i'm curious as to why routines like, for example, add_to_page_cache_lru() are being exported. is it really expected that loadable modules might access routines like that directly? rday -- ======================================================================== Robert P. J. Day Waterloo, Ontario, CANADA Linux Consulting, Training and Kernel Pedantry. Web page: http://crashcourse.ca Twitter: http://twitter.com/rpjday ======================================================================== -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: why are some low-level MM routines being exported? 2010-04-04 15:27 why are some low-level MM routines being exported? Robert P. J. Day @ 2010-04-04 15:59 ` Minchan Kim 2010-04-04 16:03 ` Evgeniy Polyakov 0 siblings, 1 reply; 17+ messages in thread From: Minchan Kim @ 2010-04-04 15:59 UTC (permalink / raw) To: Robert P. J. Day; +Cc: linux-mm, Joern Engel, Evgeniy Polyakov On Sun, 2010-04-04 at 11:27 -0400, Robert P. J. Day wrote: > perusing the code in mm/filemap.c and i'm curious as to why routines > like, for example, add_to_page_cache_lru() are being exported. is it > really expected that loadable modules might access routines like that > directly? It is added by 18bc0bbd162e3 for pohmelfs and now used by logfs, too. I didn't noticed that at that time. With git log, any mm guys didn't add Signed-off-by or Reviewed-by. I think it's not good for file system or module to use it directly. It would make LRU management harder. Is it really needed? Let's think again. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: why are some low-level MM routines being exported? 2010-04-04 15:59 ` Minchan Kim @ 2010-04-04 16:03 ` Evgeniy Polyakov 2010-04-04 16:17 ` Minchan Kim 2010-04-04 16:21 ` Minchan Kim 0 siblings, 2 replies; 17+ messages in thread From: Evgeniy Polyakov @ 2010-04-04 16:03 UTC (permalink / raw) To: Minchan Kim; +Cc: Robert P. J. Day, linux-mm, Joern Engel On Mon, Apr 05, 2010 at 12:59:44AM +0900, Minchan Kim (minchan.kim@gmail.com) wrote: > > perusing the code in mm/filemap.c and i'm curious as to why routines > > like, for example, add_to_page_cache_lru() are being exported. is it > > really expected that loadable modules might access routines like that > > directly? > > It is added by 18bc0bbd162e3 for pohmelfs and now used by logfs, too. > I didn't noticed that at that time. > With git log, any mm guys didn't add Signed-off-by or Reviewed-by. > > I think it's not good for file system or module to use it directly. > It would make LRU management harder. How come? > Is it really needed? Let's think again. Yes, it is really needed. It is not a some king of low-level mm magic to export, but a useful interface to work with LRU lists instead of copy-paste it into own machinery. -- Evgeniy Polyakov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: why are some low-level MM routines being exported? 2010-04-04 16:03 ` Evgeniy Polyakov @ 2010-04-04 16:17 ` Minchan Kim 2010-04-04 16:21 ` Minchan Kim 1 sibling, 0 replies; 17+ messages in thread From: Minchan Kim @ 2010-04-04 16:17 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: Robert P. J. Day, linux-mm, Joern Engel On Sun, 2010-04-04 at 20:03 +0400, Evgeniy Polyakov wrote: > On Mon, Apr 05, 2010 at 12:59:44AM +0900, Minchan Kim (minchan.kim@gmail.com) wrote: > > > perusing the code in mm/filemap.c and i'm curious as to why routines > > > like, for example, add_to_page_cache_lru() are being exported. is it > > > really expected that loadable modules might access routines like that > > > directly? > > > > It is added by 18bc0bbd162e3 for pohmelfs and now used by logfs, too. > > I didn't noticed that at that time. > > With git log, any mm guys didn't add Signed-off-by or Reviewed-by. > > > > I think it's not good for file system or module to use it directly. > > It would make LRU management harder. > > How come? > > > Is it really needed? Let's think again. > > Yes, it is really needed. It is not a some king of low-level mm magic to > export, but a useful interface to work with LRU lists instead of > copy-paste it into own machinery. > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: why are some low-level MM routines being exported? 2010-04-04 16:03 ` Evgeniy Polyakov 2010-04-04 16:17 ` Minchan Kim @ 2010-04-04 16:21 ` Minchan Kim 2010-04-04 18:15 ` Evgeniy Polyakov 2010-04-04 19:55 ` Jörn Engel 1 sibling, 2 replies; 17+ messages in thread From: Minchan Kim @ 2010-04-04 16:21 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: Robert P. J. Day, linux-mm, Joern Engel Sorry for mistake of previous reply. On Sun, 2010-04-04 at 20:03 +0400, Evgeniy Polyakov wrote: > On Mon, Apr 05, 2010 at 12:59:44AM +0900, Minchan Kim (minchan.kim@gmail.com) wrote: > > > perusing the code in mm/filemap.c and i'm curious as to why routines > > > like, for example, add_to_page_cache_lru() are being exported. is it > > > really expected that loadable modules might access routines like that > > > directly? > > > > It is added by 18bc0bbd162e3 for pohmelfs and now used by logfs, too. > > I didn't noticed that at that time. > > With git log, any mm guys didn't add Signed-off-by or Reviewed-by. > > > > I think it's not good for file system or module to use it directly. > > It would make LRU management harder. > > How come? What I have a concern is that if file systems or some modules start to overuse it to manage pages LRU directly, some mistake of them would make system global LRU stupid and make system wrong. > > > Is it really needed? Let's think again. > > Yes, it is really needed. It is not a some king of low-level mm magic to > export, but a useful interface to work with LRU lists instead of > copy-paste it into own machinery. > Until now, other file system don't need it. Why do you need? I don't oppose it. Let's think again with other guys if we really need it. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: why are some low-level MM routines being exported? 2010-04-04 16:21 ` Minchan Kim @ 2010-04-04 18:15 ` Evgeniy Polyakov 2010-04-05 0:36 ` Minchan Kim 2010-04-04 19:55 ` Jörn Engel 1 sibling, 1 reply; 17+ messages in thread From: Evgeniy Polyakov @ 2010-04-04 18:15 UTC (permalink / raw) To: Minchan Kim; +Cc: Robert P. J. Day, linux-mm, Joern Engel On Mon, Apr 05, 2010 at 01:21:52AM +0900, Minchan Kim (minchan.kim@gmail.com) wrote: > > > It is added by 18bc0bbd162e3 for pohmelfs and now used by logfs, too. > > > I didn't noticed that at that time. > > > With git log, any mm guys didn't add Signed-off-by or Reviewed-by. > > > > > > I think it's not good for file system or module to use it directly. > > > It would make LRU management harder. > > > > How come? > > What I have a concern is that if file systems or some modules start to > overuse it to manage pages LRU directly, some mistake of them would make > system global LRU stupid and make system wrong. All filesystems already call it through find_or_create_page() or grab_page() invoked via read path. In some cases fs has more than one page grabbed via its internal path where data to be read is already placed, so it may want just to add those pages into mm lru. -- Evgeniy Polyakov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: why are some low-level MM routines being exported? 2010-04-04 18:15 ` Evgeniy Polyakov @ 2010-04-05 0:36 ` Minchan Kim 2010-04-05 12:47 ` Evgeniy Polyakov 0 siblings, 1 reply; 17+ messages in thread From: Minchan Kim @ 2010-04-05 0:36 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: Robert P. J. Day, linux-mm, Joern Engel On Mon, Apr 5, 2010 at 3:15 AM, Evgeniy Polyakov <zbr@ioremap.net> wrote: > On Mon, Apr 05, 2010 at 01:21:52AM +0900, Minchan Kim (minchan.kim@gmail.com) wrote: >> > > It is added by 18bc0bbd162e3 for pohmelfs and now used by logfs, too. >> > > I didn't noticed that at that time. >> > > With git log, any mm guys didn't add Signed-off-by or Reviewed-by. >> > > >> > > I think it's not good for file system or module to use it directly. >> > > It would make LRU management harder. >> > >> > How come? >> >> What I have a concern is that if file systems or some modules start to >> overuse it to manage pages LRU directly, some mistake of them would make >> system global LRU stupid and make system wrong. > > All filesystems already call it through find_or_create_page() or > grab_page() invoked via read path. In some cases fs has more than > one page grabbed via its internal path where data to be read is > already placed, so it may want just to add those pages into mm lru. > I understood why it does need that in pohmelfs. AFAIU, other file system using general functions(ex, mpage_readpages or read_cache_pages) don't need direct LRU handling since it's hided. But pohmelfs doesn't use general functions. Isn't pagevec_lru_add_file enough like other file system(ex, nfs, cifs)? -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: why are some low-level MM routines being exported? 2010-04-05 0:36 ` Minchan Kim @ 2010-04-05 12:47 ` Evgeniy Polyakov 2010-04-05 14:31 ` Minchan Kim 0 siblings, 1 reply; 17+ messages in thread From: Evgeniy Polyakov @ 2010-04-05 12:47 UTC (permalink / raw) To: Minchan Kim; +Cc: Robert P. J. Day, linux-mm, Joern Engel On Mon, Apr 05, 2010 at 09:36:00AM +0900, Minchan Kim (minchan.kim@gmail.com) wrote: > > All filesystems already call it through find_or_create_page() or > > grab_page() invoked via read path. In some cases fs has more than > > one page grabbed via its internal path where data to be read is > > already placed, so it may want just to add those pages into mm lru. > > I understood why it does need that in pohmelfs. > AFAIU, other file system using general functions(ex, mpage_readpages or > read_cache_pages) don't need direct LRU handling since it's hided. > But pohmelfs doesn't use general functions. > > Isn't pagevec_lru_add_file enough like other file system(ex, nfs, cifs)? This will force to reinvent add_to_page_cache_lru() by doing private function which will call add_to_page_cache() and pagevec_lru_add_file(), which is effectively what is being done for file backed pages in add_to_page_cache_lru(). -- Evgeniy Polyakov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: why are some low-level MM routines being exported? 2010-04-05 12:47 ` Evgeniy Polyakov @ 2010-04-05 14:31 ` Minchan Kim 0 siblings, 0 replies; 17+ messages in thread From: Minchan Kim @ 2010-04-05 14:31 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: Robert P. J. Day, linux-mm, Joern Engel On Mon, Apr 5, 2010 at 9:47 PM, Evgeniy Polyakov <zbr@ioremap.net> wrote: > On Mon, Apr 05, 2010 at 09:36:00AM +0900, Minchan Kim (minchan.kim@gmail.com) wrote: >> > All filesystems already call it through find_or_create_page() or >> > grab_page() invoked via read path. In some cases fs has more than >> > one page grabbed via its internal path where data to be read is >> > already placed, so it may want just to add those pages into mm lru. >> >> I understood why it does need that in pohmelfs. >> AFAIU, other file system using general functions(ex, mpage_readpages or >> read_cache_pages) don't need direct LRU handling since it's hided. >> But pohmelfs doesn't use general functions. >> >> Isn't pagevec_lru_add_file enough like other file system(ex, nfs, cifs)? > > This will force to reinvent add_to_page_cache_lru() by doing private > function which will call add_to_page_cache() and pagevec_lru_add_file(), > which is effectively what is being done for file backed pages in > add_to_page_cache_lru(). > > -- > Evgeniy Polyakov Hmm. I found that. http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg04472.html Recently, Nick replaced it with add_to_page_cache_lru in btrfs, too. It means other mm guy already knew that and allowed it. Maybe I seem to get paranoid. Sorry for bothering you, Evgeniy and joern. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: why are some low-level MM routines being exported? 2010-04-04 16:21 ` Minchan Kim 2010-04-04 18:15 ` Evgeniy Polyakov @ 2010-04-04 19:55 ` Jörn Engel 2010-04-05 0:59 ` Minchan Kim 1 sibling, 1 reply; 17+ messages in thread From: Jörn Engel @ 2010-04-04 19:55 UTC (permalink / raw) To: Minchan Kim; +Cc: Evgeniy Polyakov, Robert P. J. Day, linux-mm On Mon, 5 April 2010 01:21:52 +0900, Minchan Kim wrote: > > > Until now, other file system don't need it. > Why do you need? To avoid deadlocks. You tell logfs to write out some locked page, logfs determines that it needs to run garbage collection first. Garbage collection can read any page. If it called find_or_create_page() for the locked page, you have a deadlock. I don't know how (or if) jffs2 and ubifs can avoid this particular scenario. The other filesystems lack garbage collection, so the problem does not exist. JA?rn -- Joern's library part 5: http://www.faqs.org/faqs/compression-faq/part2/section-9.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: why are some low-level MM routines being exported? 2010-04-04 19:55 ` Jörn Engel @ 2010-04-05 0:59 ` Minchan Kim 2010-04-05 5:30 ` Jörn Engel 0 siblings, 1 reply; 17+ messages in thread From: Minchan Kim @ 2010-04-05 0:59 UTC (permalink / raw) To: Jörn Engel; +Cc: Evgeniy Polyakov, Robert P. J. Day, linux-mm On Mon, Apr 5, 2010 at 4:55 AM, Jörn Engel <joern@logfs.org> wrote: > On Mon, 5 April 2010 01:21:52 +0900, Minchan Kim wrote: >> > >> Until now, other file system don't need it. >> Why do you need? > > To avoid deadlocks. You tell logfs to write out some locked page, logfs > determines that it needs to run garbage collection first. Garbage > collection can read any page. If it called find_or_create_page() for > the locked page, you have a deadlock. Could you do it with add_to_page_cache and pagevec_lru_add_file? -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: why are some low-level MM routines being exported? 2010-04-05 0:59 ` Minchan Kim @ 2010-04-05 5:30 ` Jörn Engel 2010-04-05 6:20 ` Minchan Kim 0 siblings, 1 reply; 17+ messages in thread From: Jörn Engel @ 2010-04-05 5:30 UTC (permalink / raw) To: Minchan Kim; +Cc: Evgeniy Polyakov, Robert P. J. Day, linux-mm On Mon, 5 April 2010 09:59:18 +0900, Minchan Kim wrote: > On Mon, Apr 5, 2010 at 4:55 AM, JA?rn Engel <joern@logfs.org> wrote: > > On Mon, 5 April 2010 01:21:52 +0900, Minchan Kim wrote: > >> > > >> Until now, other file system don't need it. > >> Why do you need? > > > > To avoid deadlocks. A You tell logfs to write out some locked page, logfs > > determines that it needs to run garbage collection first. A Garbage > > collection can read any page. A If it called find_or_create_page() for > > the locked page, you have a deadlock. > > Could you do it with add_to_page_cache and pagevec_lru_add_file? Maybe. But how would that be an improvement? As I see it, logfs needs a variant of find_or_create_page() that does not block on any pages waiting for logfs GC. Currently that variant lives under fs/logfs/ and uses add_to_page_cache_lru(). If there are valid reasons against exporting add_to_page_cache_lru(), the right solution is to move the logfs variant to mm/, not to rewrite it. If you want to change the implementation from using add_to_page_cache_lru() to using add_to_page_cache() and pagevec_lru_add_file(), then you should have a better reason than not exporting add_to_page_cache_lru(). If the new implementation was any better, I would gladly take it. JA?rn -- Money can buy bandwidth, but latency is forever. -- John R. Mashey -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: why are some low-level MM routines being exported? 2010-04-05 5:30 ` Jörn Engel @ 2010-04-05 6:20 ` Minchan Kim 2010-04-05 6:22 ` Minchan Kim 2010-04-05 7:13 ` Jörn Engel 0 siblings, 2 replies; 17+ messages in thread From: Minchan Kim @ 2010-04-05 6:20 UTC (permalink / raw) To: Jörn Engel; +Cc: Evgeniy Polyakov, Robert P. J. Day, linux-mm On Mon, Apr 5, 2010 at 2:30 PM, Jörn Engel <joern@logfs.org> wrote: > On Mon, 5 April 2010 09:59:18 +0900, Minchan Kim wrote: >> On Mon, Apr 5, 2010 at 4:55 AM, Jörn Engel <joern@logfs.org> wrote: >> > On Mon, 5 April 2010 01:21:52 +0900, Minchan Kim wrote: >> >> > >> >> Until now, other file system don't need it. >> >> Why do you need? >> > >> > To avoid deadlocks. You tell logfs to write out some locked page, logfs >> > determines that it needs to run garbage collection first. Garbage >> > collection can read any page. If it called find_or_create_page() for >> > the locked page, you have a deadlock. >> >> Could you do it with add_to_page_cache and pagevec_lru_add_file? > > Maybe. But how would that be an improvement? > > As I see it, logfs needs a variant of find_or_create_page() that does > not block on any pages waiting for logfs GC. Currently that variant > lives under fs/logfs/ and uses add_to_page_cache_lru(). If there are > valid reasons against exporting add_to_page_cache_lru(), the right > solution is to move the logfs variant to mm/, not to rewrite it. > > If you want to change the implementation from using > add_to_page_cache_lru() to using add_to_page_cache() and > pagevec_lru_add_file(), then you should have a better reason than not > exporting add_to_page_cache_lru(). If the new implementation was any > better, I would gladly take it. Previously I said, what I have a concern is that if file systems or some modules abuses add_to_page_cache_lru, it might system LRU list wrong so then system go to hell. Of course, if we use it carefully, it can be good but how do you make sure it? I am not a file system expert but as I read comment of read_cache_pages "Hides the details of the LRU cache etc from the filesystem", I thought it is not good that file system handle LRU list directly. At least, we have been trying for years. If we can do it with current functions without big cost, I think it's rather good than exporting new function. Until 18bc0bbd162e3, we didn't export that but all file systems works well. In addition, when the patch is merged, any mm guys seem to be not reviewed it, too. I hope just ring at the bell to remain record to justify why we need exporting new function although we can do it with existing functions. If any other mm guys don't oppose it, I would be not against that, either. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: why are some low-level MM routines being exported? 2010-04-05 6:20 ` Minchan Kim @ 2010-04-05 6:22 ` Minchan Kim 2010-04-05 7:13 ` Jörn Engel 1 sibling, 0 replies; 17+ messages in thread From: Minchan Kim @ 2010-04-05 6:22 UTC (permalink / raw) To: Jörn Engel Cc: Evgeniy Polyakov, Robert P. J. Day, linux-mm, Rik van Riel, KOSAKI Motohiro, KAMEZAWA Hiroyuki, Nick Piggin Cced mm guys. On Mon, Apr 5, 2010 at 3:20 PM, Minchan Kim <minchan.kim@gmail.com> wrote: > On Mon, Apr 5, 2010 at 2:30 PM, Jörn Engel <joern@logfs.org> wrote: >> On Mon, 5 April 2010 09:59:18 +0900, Minchan Kim wrote: >>> On Mon, Apr 5, 2010 at 4:55 AM, Jörn Engel <joern@logfs.org> wrote: >>> > On Mon, 5 April 2010 01:21:52 +0900, Minchan Kim wrote: >>> >> > >>> >> Until now, other file system don't need it. >>> >> Why do you need? >>> > >>> > To avoid deadlocks. You tell logfs to write out some locked page, logfs >>> > determines that it needs to run garbage collection first. Garbage >>> > collection can read any page. If it called find_or_create_page() for >>> > the locked page, you have a deadlock. >>> >>> Could you do it with add_to_page_cache and pagevec_lru_add_file? >> >> Maybe. But how would that be an improvement? >> >> As I see it, logfs needs a variant of find_or_create_page() that does >> not block on any pages waiting for logfs GC. Currently that variant >> lives under fs/logfs/ and uses add_to_page_cache_lru(). If there are >> valid reasons against exporting add_to_page_cache_lru(), the right >> solution is to move the logfs variant to mm/, not to rewrite it. >> >> If you want to change the implementation from using >> add_to_page_cache_lru() to using add_to_page_cache() and >> pagevec_lru_add_file(), then you should have a better reason than not >> exporting add_to_page_cache_lru(). If the new implementation was any >> better, I would gladly take it. > > Previously I said, what I have a concern is that if file systems or > some modules abuses > add_to_page_cache_lru, it might system LRU list wrong so then system > go to hell. > Of course, if we use it carefully, it can be good but how do you make sure it? > > I am not a file system expert but as I read comment of read_cache_pages > "Hides the details of the LRU cache etc from the filesystem", I > thought it is not good that > file system handle LRU list directly. At least, we have been trying for years. > > If we can do it with current functions without big cost, I think it's > rather good than exporting > new function. Until 18bc0bbd162e3, we didn't export that but all file > systems works well. > In addition, when the patch is merged, any mm guys seem to be not > reviewed it, too. > > I hope just ring at the bell to remain record to justify why we need > exporting new function > although we can do it with existing functions. > > If any other mm guys don't oppose it, I would be not against that, either. > > -- > Kind regards, > Minchan Kim > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: why are some low-level MM routines being exported? 2010-04-05 6:20 ` Minchan Kim 2010-04-05 6:22 ` Minchan Kim @ 2010-04-05 7:13 ` Jörn Engel 2010-04-05 8:26 ` Minchan Kim 1 sibling, 1 reply; 17+ messages in thread From: Jörn Engel @ 2010-04-05 7:13 UTC (permalink / raw) To: Minchan Kim; +Cc: Evgeniy Polyakov, Robert P. J. Day, linux-mm On Mon, 5 April 2010 15:20:36 +0900, Minchan Kim wrote: > > Previously I said, what I have a concern is that if file systems or > some modules abuses > add_to_page_cache_lru, it might system LRU list wrong so then system > go to hell. > Of course, if we use it carefully, it can be good but how do you make sure it? Having access to the source code means you only have to read all callers. This is not java, we don't have to add layers of anti-abuse wrappers. We can simply flame the first offender to a crisp. :) > I am not a file system expert but as I read comment of read_cache_pages > "Hides the details of the LRU cache etc from the filesystem", I > thought it is not good that > file system handle LRU list directly. At least, we have been trying for years. Only speaking for logfs, I need some variant of find_or_create_page where I can replace lock_page() with a custom function. Whether that function lives in fs/logfs/ or mm/filemap.c doesn't matter much. What we could do something roughly like the patch below, at least semantically. I know the patch is crap in its current form, but it illustrates the general idea. JA?rn -- The key to performance is elegance, not battalions of special cases. -- Jon Bentley and Doug McIlroy diff --git a/mm/filemap.c b/mm/filemap.c index 045b31c..6d452eb 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -646,27 +646,19 @@ repeat: } EXPORT_SYMBOL(find_get_page); -/** - * find_lock_page - locate, pin and lock a pagecache page - * @mapping: the address_space to search - * @offset: the page index - * - * Locates the desired pagecache page, locks it, increments its reference - * count and returns its address. - * - * Returns zero if the page was not present. find_lock_page() may sleep. - */ -struct page *find_lock_page(struct address_space *mapping, pgoff_t offset) +static struct page *__find_lock_page(struct address_space *mapping, + pgoff_t offset, void(*lock)(struct page *), + void(*unlock)(struct page *)) { struct page *page; repeat: page = find_get_page(mapping, offset); if (page) { - lock_page(page); + lock(page); /* Has the page been truncated? */ if (unlikely(page->mapping != mapping)) { - unlock_page(page); + unlock(page); page_cache_release(page); goto repeat; } @@ -674,32 +666,31 @@ repeat: } return page; } -EXPORT_SYMBOL(find_lock_page); /** - * find_or_create_page - locate or add a pagecache page - * @mapping: the page's address_space - * @index: the page's index into the mapping - * @gfp_mask: page allocation mode - * - * Locates a page in the pagecache. If the page is not present, a new page - * is allocated using @gfp_mask and is added to the pagecache and to the VM's - * LRU list. The returned page is locked and has its reference count - * incremented. + * find_lock_page - locate, pin and lock a pagecache page + * @mapping: the address_space to search + * @offset: the page index * - * find_or_create_page() may sleep, even if @gfp_flags specifies an atomic - * allocation! + * Locates the desired pagecache page, locks it, increments its reference + * count and returns its address. * - * find_or_create_page() returns the desired page's address, or zero on - * memory exhaustion. + * Returns zero if the page was not present. find_lock_page() may sleep. */ -struct page *find_or_create_page(struct address_space *mapping, - pgoff_t index, gfp_t gfp_mask) +struct page *find_lock_page(struct address_space *mapping, pgoff_t offset) +{ + return __find_lock_page(mapping, offset, lock_page, unlock_page); +} +EXPORT_SYMBOL(find_lock_page); + +static struct page *__find_or_create_page(struct address_space *mapping, + pgoff_t index, gfp_t gfp_mask, void(*lock)(struct page *), + void(*unlock)(struct page *)) { struct page *page; int err; repeat: - page = find_lock_page(mapping, index); + page = __find_lock_page(mapping, index, lock, unlock); if (!page) { page = __page_cache_alloc(gfp_mask); if (!page) @@ -721,6 +712,31 @@ repeat: } return page; } +EXPORT_SYMBOL(__find_or_create_page); + +/** + * find_or_create_page - locate or add a pagecache page + * @mapping: the page's address_space + * @index: the page's index into the mapping + * @gfp_mask: page allocation mode + * + * Locates a page in the pagecache. If the page is not present, a new page + * is allocated using @gfp_mask and is added to the pagecache and to the VM's + * LRU list. The returned page is locked and has its reference count + * incremented. + * + * find_or_create_page() may sleep, even if @gfp_flags specifies an atomic + * allocation! + * + * find_or_create_page() returns the desired page's address, or zero on + * memory exhaustion. + */ +struct page *find_or_create_page(struct address_space *mapping, + pgoff_t index, gfp_t gfp_mask) +{ + return __find_or_create_page(mapping, index, gfp_mask, lock_page, + unlock_page); +} EXPORT_SYMBOL(find_or_create_page); /** -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: why are some low-level MM routines being exported? 2010-04-05 7:13 ` Jörn Engel @ 2010-04-05 8:26 ` Minchan Kim 2010-04-05 11:19 ` Jörn Engel 0 siblings, 1 reply; 17+ messages in thread From: Minchan Kim @ 2010-04-05 8:26 UTC (permalink / raw) To: Jörn Engel; +Cc: Evgeniy Polyakov, Robert P. J. Day, linux-mm On Mon, Apr 5, 2010 at 4:13 PM, Jörn Engel <joern@logfs.org> wrote: > On Mon, 5 April 2010 15:20:36 +0900, Minchan Kim wrote: >> >> Previously I said, what I have a concern is that if file systems or >> some modules abuses >> add_to_page_cache_lru, it might system LRU list wrong so then system >> go to hell. >> Of course, if we use it carefully, it can be good but how do you make sure it? > > Having access to the source code means you only have to read all > callers. This is not java, we don't have to add layers of anti-abuse > wrappers. We can simply flame the first offender to a crisp. :) > >> I am not a file system expert but as I read comment of read_cache_pages >> "Hides the details of the LRU cache etc from the filesystem", I >> thought it is not good that >> file system handle LRU list directly. At least, we have been trying for years. > > Only speaking for logfs, I need some variant of find_or_create_page > where I can replace lock_page() with a custom function. Whether that > function lives in fs/logfs/ or mm/filemap.c doesn't matter much. > > What we could do something roughly like the patch below, at least > semantically. I know the patch is crap in its current form, but it > illustrates the general idea. > > Jörn > > -- > The key to performance is elegance, not battalions of special cases. > -- Jon Bentley and Doug McIlroy > > diff --git a/mm/filemap.c b/mm/filemap.c > index 045b31c..6d452eb 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -646,27 +646,19 @@ repeat: > } > EXPORT_SYMBOL(find_get_page); > > -/** > - * find_lock_page - locate, pin and lock a pagecache page > - * @mapping: the address_space to search > - * @offset: the page index > - * > - * Locates the desired pagecache page, locks it, increments its reference > - * count and returns its address. > - * > - * Returns zero if the page was not present. find_lock_page() may sleep. > - */ > -struct page *find_lock_page(struct address_space *mapping, pgoff_t offset) > +static struct page *__find_lock_page(struct address_space *mapping, > + pgoff_t offset, void(*lock)(struct page *), > + void(*unlock)(struct page *)) > { > struct page *page; > > repeat: > page = find_get_page(mapping, offset); > if (page) { > - lock_page(page); > + lock(page); > /* Has the page been truncated? */ > if (unlikely(page->mapping != mapping)) { > - unlock_page(page); > + unlock(page); > page_cache_release(page); > goto repeat; > } > @@ -674,32 +666,31 @@ repeat: > } > return page; > } > -EXPORT_SYMBOL(find_lock_page); > > /** > - * find_or_create_page - locate or add a pagecache page > - * @mapping: the page's address_space > - * @index: the page's index into the mapping > - * @gfp_mask: page allocation mode > - * > - * Locates a page in the pagecache. If the page is not present, a new page > - * is allocated using @gfp_mask and is added to the pagecache and to the VM's > - * LRU list. The returned page is locked and has its reference count > - * incremented. > + * find_lock_page - locate, pin and lock a pagecache page > + * @mapping: the address_space to search > + * @offset: the page index > * > - * find_or_create_page() may sleep, even if @gfp_flags specifies an atomic > - * allocation! > + * Locates the desired pagecache page, locks it, increments its reference > + * count and returns its address. > * > - * find_or_create_page() returns the desired page's address, or zero on > - * memory exhaustion. > + * Returns zero if the page was not present. find_lock_page() may sleep. > */ > -struct page *find_or_create_page(struct address_space *mapping, > - pgoff_t index, gfp_t gfp_mask) > +struct page *find_lock_page(struct address_space *mapping, pgoff_t offset) > +{ > + return __find_lock_page(mapping, offset, lock_page, unlock_page); > +} > +EXPORT_SYMBOL(find_lock_page); > + > +static struct page *__find_or_create_page(struct address_space *mapping, > + pgoff_t index, gfp_t gfp_mask, void(*lock)(struct page *), > + void(*unlock)(struct page *)) > { > struct page *page; > int err; > repeat: > - page = find_lock_page(mapping, index); > + page = __find_lock_page(mapping, index, lock, unlock); > if (!page) { > page = __page_cache_alloc(gfp_mask); > if (!page) > @@ -721,6 +712,31 @@ repeat: > } > return page; > } > +EXPORT_SYMBOL(__find_or_create_page); > + > +/** > + * find_or_create_page - locate or add a pagecache page > + * @mapping: the page's address_space > + * @index: the page's index into the mapping > + * @gfp_mask: page allocation mode > + * > + * Locates a page in the pagecache. If the page is not present, a new page > + * is allocated using @gfp_mask and is added to the pagecache and to the VM's > + * LRU list. The returned page is locked and has its reference count > + * incremented. > + * > + * find_or_create_page() may sleep, even if @gfp_flags specifies an atomic > + * allocation! > + * > + * find_or_create_page() returns the desired page's address, or zero on > + * memory exhaustion. > + */ > +struct page *find_or_create_page(struct address_space *mapping, > + pgoff_t index, gfp_t gfp_mask) > +{ > + return __find_or_create_page(mapping, index, gfp_mask, lock_page, > + unlock_page); > +} > EXPORT_SYMBOL(find_or_create_page); > > /** > Seem to be not bad idea. :) But we have to justify new interface before. For doing it, we have to say why we can't do it by current functions(find_get_page, add_to_page_cache and pagevec_lru_add_xxx) Pagevec_lru_add_xxx does batch so that it can reduce calling path and some overhead(ex, page_is_file_cache comparison, get/put_cpu_var(lru_add_pvecs)). At least, it would be rather good than old for performance. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: why are some low-level MM routines being exported? 2010-04-05 8:26 ` Minchan Kim @ 2010-04-05 11:19 ` Jörn Engel 0 siblings, 0 replies; 17+ messages in thread From: Jörn Engel @ 2010-04-05 11:19 UTC (permalink / raw) To: Minchan Kim; +Cc: Evgeniy Polyakov, Robert P. J. Day, linux-mm On Mon, 5 April 2010 17:26:58 +0900, Minchan Kim wrote: > > Seem to be not bad idea. :) > But we have to justify new interface before. For doing it, we have to say > why we can't do it by current functions(find_get_page, > add_to_page_cache and pagevec_lru_add_xxx) I guess we could do that. Whether setting up a vector when only dealing with single pages makes the code more readable or helps performance is a different matter, though. > Pagevec_lru_add_xxx does batch so that it can reduce calling path and > some overhead(ex, page_is_file_cache comparison, > get/put_cpu_var(lru_add_pvecs)). > > At least, it would be rather good than old for performance. ...if we can convert callers to also handle vectors. And if backing device is fast enough that cpu overhead becomes noticeable. And if there were no bigger fish left to catch. JA?rn -- Joern's library part 15: http://www.knosof.co.uk/cbook/accu06a.pdf -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2010-04-05 14:31 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-04 15:27 why are some low-level MM routines being exported? Robert P. J. Day 2010-04-04 15:59 ` Minchan Kim 2010-04-04 16:03 ` Evgeniy Polyakov 2010-04-04 16:17 ` Minchan Kim 2010-04-04 16:21 ` Minchan Kim 2010-04-04 18:15 ` Evgeniy Polyakov 2010-04-05 0:36 ` Minchan Kim 2010-04-05 12:47 ` Evgeniy Polyakov 2010-04-05 14:31 ` Minchan Kim 2010-04-04 19:55 ` Jörn Engel 2010-04-05 0:59 ` Minchan Kim 2010-04-05 5:30 ` Jörn Engel 2010-04-05 6:20 ` Minchan Kim 2010-04-05 6:22 ` Minchan Kim 2010-04-05 7:13 ` Jörn Engel 2010-04-05 8:26 ` Minchan Kim 2010-04-05 11:19 ` Jörn Engel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).