linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* why are some low-level MM routines being exported?
@ 2010-04-04 15:27 Robert P. J. Day
  2010-04-04 15:59 ` Minchan Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Robert P. J. Day @ 2010-04-04 15:27 UTC (permalink / raw)
  To: linux-mm


  perusing the code in mm/filemap.c and i'm curious as to why routines
like, for example, add_to_page_cache_lru() are being exported.  is it
really expected that loadable modules might access routines like that
directly?

rday
--

========================================================================
Robert P. J. Day                               Waterloo, Ontario, CANADA

            Linux Consulting, Training and Kernel Pedantry.

Web page:                                          http://crashcourse.ca
Twitter:                                       http://twitter.com/rpjday
========================================================================

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: why are some low-level MM routines being exported?
  2010-04-04 15:27 why are some low-level MM routines being exported? Robert P. J. Day
@ 2010-04-04 15:59 ` Minchan Kim
  2010-04-04 16:03   ` Evgeniy Polyakov
  0 siblings, 1 reply; 17+ messages in thread
From: Minchan Kim @ 2010-04-04 15:59 UTC (permalink / raw)
  To: Robert P. J. Day; +Cc: linux-mm, Joern Engel, Evgeniy Polyakov

On Sun, 2010-04-04 at 11:27 -0400, Robert P. J. Day wrote:
> perusing the code in mm/filemap.c and i'm curious as to why routines
> like, for example, add_to_page_cache_lru() are being exported.  is it
> really expected that loadable modules might access routines like that
> directly?

It is added by 18bc0bbd162e3 for pohmelfs and now used by logfs, too. 
I didn't noticed that at that time.
With git log, any mm guys didn't add Signed-off-by or Reviewed-by.

I think it's not good for file system or module to use it directly. 
It would make LRU management harder. 

Is it really needed? Let's think again. 

-- 
Kind regards,
Minchan Kim


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: why are some low-level MM routines being exported?
  2010-04-04 15:59 ` Minchan Kim
@ 2010-04-04 16:03   ` Evgeniy Polyakov
  2010-04-04 16:17     ` Minchan Kim
  2010-04-04 16:21     ` Minchan Kim
  0 siblings, 2 replies; 17+ messages in thread
From: Evgeniy Polyakov @ 2010-04-04 16:03 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Robert P. J. Day, linux-mm, Joern Engel

On Mon, Apr 05, 2010 at 12:59:44AM +0900, Minchan Kim (minchan.kim@gmail.com) wrote:
> > perusing the code in mm/filemap.c and i'm curious as to why routines
> > like, for example, add_to_page_cache_lru() are being exported.  is it
> > really expected that loadable modules might access routines like that
> > directly?
> 
> It is added by 18bc0bbd162e3 for pohmelfs and now used by logfs, too. 
> I didn't noticed that at that time.
> With git log, any mm guys didn't add Signed-off-by or Reviewed-by.
> 
> I think it's not good for file system or module to use it directly. 
> It would make LRU management harder. 

How come?

> Is it really needed? Let's think again. 

Yes, it is really needed. It is not a some king of low-level mm magic to
export, but a useful interface to work with LRU lists instead of
copy-paste it into own machinery.

-- 
	Evgeniy Polyakov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: why are some low-level MM routines being exported?
  2010-04-04 16:03   ` Evgeniy Polyakov
@ 2010-04-04 16:17     ` Minchan Kim
  2010-04-04 16:21     ` Minchan Kim
  1 sibling, 0 replies; 17+ messages in thread
From: Minchan Kim @ 2010-04-04 16:17 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Robert P. J. Day, linux-mm, Joern Engel

On Sun, 2010-04-04 at 20:03 +0400, Evgeniy Polyakov wrote:
> On Mon, Apr 05, 2010 at 12:59:44AM +0900, Minchan Kim (minchan.kim@gmail.com) wrote:
> > > perusing the code in mm/filemap.c and i'm curious as to why routines
> > > like, for example, add_to_page_cache_lru() are being exported.  is it
> > > really expected that loadable modules might access routines like that
> > > directly?
> > 
> > It is added by 18bc0bbd162e3 for pohmelfs and now used by logfs, too. 
> > I didn't noticed that at that time.
> > With git log, any mm guys didn't add Signed-off-by or Reviewed-by.
> > 
> > I think it's not good for file system or module to use it directly. 
> > It would make LRU management harder. 
> 
> How come?

> 
> > Is it really needed? Let's think again. 
> 
> Yes, it is really needed. It is not a some king of low-level mm magic to
> export, but a useful interface to work with LRU lists instead of
> copy-paste it into own machinery.
> 





-- 
Kind regards,
Minchan Kim


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: why are some low-level MM routines being exported?
  2010-04-04 16:03   ` Evgeniy Polyakov
  2010-04-04 16:17     ` Minchan Kim
@ 2010-04-04 16:21     ` Minchan Kim
  2010-04-04 18:15       ` Evgeniy Polyakov
  2010-04-04 19:55       ` Jörn Engel
  1 sibling, 2 replies; 17+ messages in thread
From: Minchan Kim @ 2010-04-04 16:21 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Robert P. J. Day, linux-mm, Joern Engel


Sorry for mistake of previous reply. 

On Sun, 2010-04-04 at 20:03 +0400, Evgeniy Polyakov wrote:
> On Mon, Apr 05, 2010 at 12:59:44AM +0900, Minchan Kim (minchan.kim@gmail.com) wrote:
> > > perusing the code in mm/filemap.c and i'm curious as to why routines
> > > like, for example, add_to_page_cache_lru() are being exported.  is it
> > > really expected that loadable modules might access routines like that
> > > directly?
> > 
> > It is added by 18bc0bbd162e3 for pohmelfs and now used by logfs, too. 
> > I didn't noticed that at that time.
> > With git log, any mm guys didn't add Signed-off-by or Reviewed-by.
> > 
> > I think it's not good for file system or module to use it directly. 
> > It would make LRU management harder. 
> 
> How come?

What I have a concern is that if file systems or some modules start to
overuse it to manage pages LRU directly, some mistake of them would make
system global LRU stupid and make system wrong. 

> 
> > Is it really needed? Let's think again. 
> 
> Yes, it is really needed. It is not a some king of low-level mm magic to
> export, but a useful interface to work with LRU lists instead of
> copy-paste it into own machinery.
> 
Until now, other file system don't need it. 
Why do you need?

I don't oppose it. 
Let's think again with other guys if we really need it.

-- 
Kind regards,
Minchan Kim


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: why are some low-level MM routines being exported?
  2010-04-04 16:21     ` Minchan Kim
@ 2010-04-04 18:15       ` Evgeniy Polyakov
  2010-04-05  0:36         ` Minchan Kim
  2010-04-04 19:55       ` Jörn Engel
  1 sibling, 1 reply; 17+ messages in thread
From: Evgeniy Polyakov @ 2010-04-04 18:15 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Robert P. J. Day, linux-mm, Joern Engel

On Mon, Apr 05, 2010 at 01:21:52AM +0900, Minchan Kim (minchan.kim@gmail.com) wrote:
> > > It is added by 18bc0bbd162e3 for pohmelfs and now used by logfs, too. 
> > > I didn't noticed that at that time.
> > > With git log, any mm guys didn't add Signed-off-by or Reviewed-by.
> > > 
> > > I think it's not good for file system or module to use it directly. 
> > > It would make LRU management harder. 
> > 
> > How come?
> 
> What I have a concern is that if file systems or some modules start to
> overuse it to manage pages LRU directly, some mistake of them would make
> system global LRU stupid and make system wrong. 

All filesystems already call it through find_or_create_page() or
grab_page() invoked via read path. In some cases fs has more than
one page grabbed via its internal path where data to be read is
already placed, so it may want just to add those pages into mm lru.

-- 
	Evgeniy Polyakov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: why are some low-level MM routines being exported?
  2010-04-04 16:21     ` Minchan Kim
  2010-04-04 18:15       ` Evgeniy Polyakov
@ 2010-04-04 19:55       ` Jörn Engel
  2010-04-05  0:59         ` Minchan Kim
  1 sibling, 1 reply; 17+ messages in thread
From: Jörn Engel @ 2010-04-04 19:55 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Evgeniy Polyakov, Robert P. J. Day, linux-mm

On Mon, 5 April 2010 01:21:52 +0900, Minchan Kim wrote:
> > 
> Until now, other file system don't need it. 
> Why do you need?

To avoid deadlocks.  You tell logfs to write out some locked page, logfs
determines that it needs to run garbage collection first.  Garbage
collection can read any page.  If it called find_or_create_page() for
the locked page, you have a deadlock.

I don't know how (or if) jffs2 and ubifs can avoid this particular
scenario.  The other filesystems lack garbage collection, so the problem
does not exist.

JA?rn

-- 
Joern's library part 5:
http://www.faqs.org/faqs/compression-faq/part2/section-9.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: why are some low-level MM routines being exported?
  2010-04-04 18:15       ` Evgeniy Polyakov
@ 2010-04-05  0:36         ` Minchan Kim
  2010-04-05 12:47           ` Evgeniy Polyakov
  0 siblings, 1 reply; 17+ messages in thread
From: Minchan Kim @ 2010-04-05  0:36 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Robert P. J. Day, linux-mm, Joern Engel

On Mon, Apr 5, 2010 at 3:15 AM, Evgeniy Polyakov <zbr@ioremap.net> wrote:
> On Mon, Apr 05, 2010 at 01:21:52AM +0900, Minchan Kim (minchan.kim@gmail.com) wrote:
>> > > It is added by 18bc0bbd162e3 for pohmelfs and now used by logfs, too.
>> > > I didn't noticed that at that time.
>> > > With git log, any mm guys didn't add Signed-off-by or Reviewed-by.
>> > >
>> > > I think it's not good for file system or module to use it directly.
>> > > It would make LRU management harder.
>> >
>> > How come?
>>
>> What I have a concern is that if file systems or some modules start to
>> overuse it to manage pages LRU directly, some mistake of them would make
>> system global LRU stupid and make system wrong.
>
> All filesystems already call it through find_or_create_page() or
> grab_page() invoked via read path. In some cases fs has more than
> one page grabbed via its internal path where data to be read is
> already placed, so it may want just to add those pages into mm lru.
>

I understood why it does need that in pohmelfs.
AFAIU, other file system using general functions(ex, mpage_readpages or
read_cache_pages) don't need direct LRU handling since it's hided.
But pohmelfs doesn't use general functions.

Isn't pagevec_lru_add_file enough like other file system(ex, nfs, cifs)?

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: why are some low-level MM routines being exported?
  2010-04-04 19:55       ` Jörn Engel
@ 2010-04-05  0:59         ` Minchan Kim
  2010-04-05  5:30           ` Jörn Engel
  0 siblings, 1 reply; 17+ messages in thread
From: Minchan Kim @ 2010-04-05  0:59 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Evgeniy Polyakov, Robert P. J. Day, linux-mm

On Mon, Apr 5, 2010 at 4:55 AM, Jörn Engel <joern@logfs.org> wrote:
> On Mon, 5 April 2010 01:21:52 +0900, Minchan Kim wrote:
>> >
>> Until now, other file system don't need it.
>> Why do you need?
>
> To avoid deadlocks.  You tell logfs to write out some locked page, logfs
> determines that it needs to run garbage collection first.  Garbage
> collection can read any page.  If it called find_or_create_page() for
> the locked page, you have a deadlock.

Could you do it with add_to_page_cache and pagevec_lru_add_file?

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: why are some low-level MM routines being exported?
  2010-04-05  0:59         ` Minchan Kim
@ 2010-04-05  5:30           ` Jörn Engel
  2010-04-05  6:20             ` Minchan Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Jörn Engel @ 2010-04-05  5:30 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Evgeniy Polyakov, Robert P. J. Day, linux-mm

On Mon, 5 April 2010 09:59:18 +0900, Minchan Kim wrote:
> On Mon, Apr 5, 2010 at 4:55 AM, JA?rn Engel <joern@logfs.org> wrote:
> > On Mon, 5 April 2010 01:21:52 +0900, Minchan Kim wrote:
> >> >
> >> Until now, other file system don't need it.
> >> Why do you need?
> >
> > To avoid deadlocks. A You tell logfs to write out some locked page, logfs
> > determines that it needs to run garbage collection first. A Garbage
> > collection can read any page. A If it called find_or_create_page() for
> > the locked page, you have a deadlock.
> 
> Could you do it with add_to_page_cache and pagevec_lru_add_file?

Maybe.  But how would that be an improvement?

As I see it, logfs needs a variant of find_or_create_page() that does
not block on any pages waiting for logfs GC.  Currently that variant
lives under fs/logfs/ and uses add_to_page_cache_lru().  If there are
valid reasons against exporting add_to_page_cache_lru(), the right
solution is to move the logfs variant to mm/, not to rewrite it.

If you want to change the implementation from using
add_to_page_cache_lru() to using add_to_page_cache() and
pagevec_lru_add_file(), then you should have a better reason than not
exporting add_to_page_cache_lru().  If the new implementation was any
better, I would gladly take it.

JA?rn

-- 
Money can buy bandwidth, but latency is forever.
-- John R. Mashey

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: why are some low-level MM routines being exported?
  2010-04-05  5:30           ` Jörn Engel
@ 2010-04-05  6:20             ` Minchan Kim
  2010-04-05  6:22               ` Minchan Kim
  2010-04-05  7:13               ` Jörn Engel
  0 siblings, 2 replies; 17+ messages in thread
From: Minchan Kim @ 2010-04-05  6:20 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Evgeniy Polyakov, Robert P. J. Day, linux-mm

On Mon, Apr 5, 2010 at 2:30 PM, Jörn Engel <joern@logfs.org> wrote:
> On Mon, 5 April 2010 09:59:18 +0900, Minchan Kim wrote:
>> On Mon, Apr 5, 2010 at 4:55 AM, Jörn Engel <joern@logfs.org> wrote:
>> > On Mon, 5 April 2010 01:21:52 +0900, Minchan Kim wrote:
>> >> >
>> >> Until now, other file system don't need it.
>> >> Why do you need?
>> >
>> > To avoid deadlocks.  You tell logfs to write out some locked page, logfs
>> > determines that it needs to run garbage collection first.  Garbage
>> > collection can read any page.  If it called find_or_create_page() for
>> > the locked page, you have a deadlock.
>>
>> Could you do it with add_to_page_cache and pagevec_lru_add_file?
>
> Maybe.  But how would that be an improvement?
>
> As I see it, logfs needs a variant of find_or_create_page() that does
> not block on any pages waiting for logfs GC.  Currently that variant
> lives under fs/logfs/ and uses add_to_page_cache_lru().  If there are
> valid reasons against exporting add_to_page_cache_lru(), the right
> solution is to move the logfs variant to mm/, not to rewrite it.
>
> If you want to change the implementation from using
> add_to_page_cache_lru() to using add_to_page_cache() and
> pagevec_lru_add_file(), then you should have a better reason than not
> exporting add_to_page_cache_lru().  If the new implementation was any
> better, I would gladly take it.

Previously I said, what I have a concern is that if file systems or
some modules abuses
add_to_page_cache_lru, it might system LRU list wrong so then system
go to hell.
Of course, if we use it carefully, it can be good but how do you make sure it?

I am not a file system expert but as I read comment of read_cache_pages
"Hides the details of the LRU cache etc from the filesystem", I
thought it is not good that
file system handle LRU list directly. At least, we have been trying for years.

If we can do it with current functions without big cost, I think it's
rather good than exporting
new function. Until 18bc0bbd162e3, we didn't export that but all file
systems works well.
In addition, when the patch is merged, any mm guys seem to be not
reviewed it, too.

I hope just ring at the bell to remain record to justify why we need
exporting new function
although we can do it with existing functions.

If any other mm guys don't oppose it, I would be not against that, either.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: why are some low-level MM routines being exported?
  2010-04-05  6:20             ` Minchan Kim
@ 2010-04-05  6:22               ` Minchan Kim
  2010-04-05  7:13               ` Jörn Engel
  1 sibling, 0 replies; 17+ messages in thread
From: Minchan Kim @ 2010-04-05  6:22 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Evgeniy Polyakov, Robert P. J. Day, linux-mm, Rik van Riel,
	KOSAKI Motohiro, KAMEZAWA Hiroyuki, Nick Piggin

Cced mm guys.

On Mon, Apr 5, 2010 at 3:20 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
> On Mon, Apr 5, 2010 at 2:30 PM, Jörn Engel <joern@logfs.org> wrote:
>> On Mon, 5 April 2010 09:59:18 +0900, Minchan Kim wrote:
>>> On Mon, Apr 5, 2010 at 4:55 AM, Jörn Engel <joern@logfs.org> wrote:
>>> > On Mon, 5 April 2010 01:21:52 +0900, Minchan Kim wrote:
>>> >> >
>>> >> Until now, other file system don't need it.
>>> >> Why do you need?
>>> >
>>> > To avoid deadlocks.  You tell logfs to write out some locked page, logfs
>>> > determines that it needs to run garbage collection first.  Garbage
>>> > collection can read any page.  If it called find_or_create_page() for
>>> > the locked page, you have a deadlock.
>>>
>>> Could you do it with add_to_page_cache and pagevec_lru_add_file?
>>
>> Maybe.  But how would that be an improvement?
>>
>> As I see it, logfs needs a variant of find_or_create_page() that does
>> not block on any pages waiting for logfs GC.  Currently that variant
>> lives under fs/logfs/ and uses add_to_page_cache_lru().  If there are
>> valid reasons against exporting add_to_page_cache_lru(), the right
>> solution is to move the logfs variant to mm/, not to rewrite it.
>>
>> If you want to change the implementation from using
>> add_to_page_cache_lru() to using add_to_page_cache() and
>> pagevec_lru_add_file(), then you should have a better reason than not
>> exporting add_to_page_cache_lru().  If the new implementation was any
>> better, I would gladly take it.
>
> Previously I said, what I have a concern is that if file systems or
> some modules abuses
> add_to_page_cache_lru, it might system LRU list wrong so then system
> go to hell.
> Of course, if we use it carefully, it can be good but how do you make sure it?
>
> I am not a file system expert but as I read comment of read_cache_pages
> "Hides the details of the LRU cache etc from the filesystem", I
> thought it is not good that
> file system handle LRU list directly. At least, we have been trying for years.
>
> If we can do it with current functions without big cost, I think it's
> rather good than exporting
> new function. Until 18bc0bbd162e3, we didn't export that but all file
> systems works well.
> In addition, when the patch is merged, any mm guys seem to be not
> reviewed it, too.
>
> I hope just ring at the bell to remain record to justify why we need
> exporting new function
> although we can do it with existing functions.
>
> If any other mm guys don't oppose it, I would be not against that, either.
>
> --
> Kind regards,
> Minchan Kim
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: why are some low-level MM routines being exported?
  2010-04-05  6:20             ` Minchan Kim
  2010-04-05  6:22               ` Minchan Kim
@ 2010-04-05  7:13               ` Jörn Engel
  2010-04-05  8:26                 ` Minchan Kim
  1 sibling, 1 reply; 17+ messages in thread
From: Jörn Engel @ 2010-04-05  7:13 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Evgeniy Polyakov, Robert P. J. Day, linux-mm

On Mon, 5 April 2010 15:20:36 +0900, Minchan Kim wrote:
> 
> Previously I said, what I have a concern is that if file systems or
> some modules abuses
> add_to_page_cache_lru, it might system LRU list wrong so then system
> go to hell.
> Of course, if we use it carefully, it can be good but how do you make sure it?

Having access to the source code means you only have to read all
callers.  This is not java, we don't have to add layers of anti-abuse
wrappers.  We can simply flame the first offender to a crisp. :)

> I am not a file system expert but as I read comment of read_cache_pages
> "Hides the details of the LRU cache etc from the filesystem", I
> thought it is not good that
> file system handle LRU list directly. At least, we have been trying for years.

Only speaking for logfs, I need some variant of find_or_create_page
where I can replace lock_page() with a custom function.  Whether that
function lives in fs/logfs/ or mm/filemap.c doesn't matter much.

What we could do something roughly like the patch below, at least
semantically.  I know the patch is crap in its current form, but it
illustrates the general idea.

JA?rn

-- 
The key to performance is elegance, not battalions of special cases.
-- Jon Bentley and Doug McIlroy

diff --git a/mm/filemap.c b/mm/filemap.c
index 045b31c..6d452eb 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -646,27 +646,19 @@ repeat:
 }
 EXPORT_SYMBOL(find_get_page);
 
-/**
- * find_lock_page - locate, pin and lock a pagecache page
- * @mapping: the address_space to search
- * @offset: the page index
- *
- * Locates the desired pagecache page, locks it, increments its reference
- * count and returns its address.
- *
- * Returns zero if the page was not present. find_lock_page() may sleep.
- */
-struct page *find_lock_page(struct address_space *mapping, pgoff_t offset)
+static struct page *__find_lock_page(struct address_space *mapping,
+		pgoff_t offset, void(*lock)(struct page *),
+		void(*unlock)(struct page *))
 {
 	struct page *page;
 
 repeat:
 	page = find_get_page(mapping, offset);
 	if (page) {
-		lock_page(page);
+		lock(page);
 		/* Has the page been truncated? */
 		if (unlikely(page->mapping != mapping)) {
-			unlock_page(page);
+			unlock(page);
 			page_cache_release(page);
 			goto repeat;
 		}
@@ -674,32 +666,31 @@ repeat:
 	}
 	return page;
 }
-EXPORT_SYMBOL(find_lock_page);
 
 /**
- * find_or_create_page - locate or add a pagecache page
- * @mapping: the page's address_space
- * @index: the page's index into the mapping
- * @gfp_mask: page allocation mode
- *
- * Locates a page in the pagecache.  If the page is not present, a new page
- * is allocated using @gfp_mask and is added to the pagecache and to the VM's
- * LRU list.  The returned page is locked and has its reference count
- * incremented.
+ * find_lock_page - locate, pin and lock a pagecache page
+ * @mapping: the address_space to search
+ * @offset: the page index
  *
- * find_or_create_page() may sleep, even if @gfp_flags specifies an atomic
- * allocation!
+ * Locates the desired pagecache page, locks it, increments its reference
+ * count and returns its address.
  *
- * find_or_create_page() returns the desired page's address, or zero on
- * memory exhaustion.
+ * Returns zero if the page was not present. find_lock_page() may sleep.
  */
-struct page *find_or_create_page(struct address_space *mapping,
-		pgoff_t index, gfp_t gfp_mask)
+struct page *find_lock_page(struct address_space *mapping, pgoff_t offset)
+{
+	return __find_lock_page(mapping, offset, lock_page, unlock_page);
+}
+EXPORT_SYMBOL(find_lock_page);
+
+static struct page *__find_or_create_page(struct address_space *mapping,
+		pgoff_t index, gfp_t gfp_mask, void(*lock)(struct page *),
+		void(*unlock)(struct page *))
 {
 	struct page *page;
 	int err;
 repeat:
-	page = find_lock_page(mapping, index);
+	page = __find_lock_page(mapping, index, lock, unlock);
 	if (!page) {
 		page = __page_cache_alloc(gfp_mask);
 		if (!page)
@@ -721,6 +712,31 @@ repeat:
 	}
 	return page;
 }
+EXPORT_SYMBOL(__find_or_create_page);
+
+/**
+ * find_or_create_page - locate or add a pagecache page
+ * @mapping: the page's address_space
+ * @index: the page's index into the mapping
+ * @gfp_mask: page allocation mode
+ *
+ * Locates a page in the pagecache.  If the page is not present, a new page
+ * is allocated using @gfp_mask and is added to the pagecache and to the VM's
+ * LRU list.  The returned page is locked and has its reference count
+ * incremented.
+ *
+ * find_or_create_page() may sleep, even if @gfp_flags specifies an atomic
+ * allocation!
+ *
+ * find_or_create_page() returns the desired page's address, or zero on
+ * memory exhaustion.
+ */
+struct page *find_or_create_page(struct address_space *mapping,
+		pgoff_t index, gfp_t gfp_mask)
+{
+	return __find_or_create_page(mapping, index, gfp_mask, lock_page,
+			unlock_page);
+}
 EXPORT_SYMBOL(find_or_create_page);
 
 /**

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: why are some low-level MM routines being exported?
  2010-04-05  7:13               ` Jörn Engel
@ 2010-04-05  8:26                 ` Minchan Kim
  2010-04-05 11:19                   ` Jörn Engel
  0 siblings, 1 reply; 17+ messages in thread
From: Minchan Kim @ 2010-04-05  8:26 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Evgeniy Polyakov, Robert P. J. Day, linux-mm

On Mon, Apr 5, 2010 at 4:13 PM, Jörn Engel <joern@logfs.org> wrote:
> On Mon, 5 April 2010 15:20:36 +0900, Minchan Kim wrote:
>>
>> Previously I said, what I have a concern is that if file systems or
>> some modules abuses
>> add_to_page_cache_lru, it might system LRU list wrong so then system
>> go to hell.
>> Of course, if we use it carefully, it can be good but how do you make sure it?
>
> Having access to the source code means you only have to read all
> callers.  This is not java, we don't have to add layers of anti-abuse
> wrappers.  We can simply flame the first offender to a crisp. :)
>
>> I am not a file system expert but as I read comment of read_cache_pages
>> "Hides the details of the LRU cache etc from the filesystem", I
>> thought it is not good that
>> file system handle LRU list directly. At least, we have been trying for years.
>
> Only speaking for logfs, I need some variant of find_or_create_page
> where I can replace lock_page() with a custom function.  Whether that
> function lives in fs/logfs/ or mm/filemap.c doesn't matter much.
>
> What we could do something roughly like the patch below, at least
> semantically.  I know the patch is crap in its current form, but it
> illustrates the general idea.
>
> Jörn
>
> --
> The key to performance is elegance, not battalions of special cases.
> -- Jon Bentley and Doug McIlroy
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 045b31c..6d452eb 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -646,27 +646,19 @@ repeat:
>  }
>  EXPORT_SYMBOL(find_get_page);
>
> -/**
> - * find_lock_page - locate, pin and lock a pagecache page
> - * @mapping: the address_space to search
> - * @offset: the page index
> - *
> - * Locates the desired pagecache page, locks it, increments its reference
> - * count and returns its address.
> - *
> - * Returns zero if the page was not present. find_lock_page() may sleep.
> - */
> -struct page *find_lock_page(struct address_space *mapping, pgoff_t offset)
> +static struct page *__find_lock_page(struct address_space *mapping,
> +               pgoff_t offset, void(*lock)(struct page *),
> +               void(*unlock)(struct page *))
>  {
>        struct page *page;
>
>  repeat:
>        page = find_get_page(mapping, offset);
>        if (page) {
> -               lock_page(page);
> +               lock(page);
>                /* Has the page been truncated? */
>                if (unlikely(page->mapping != mapping)) {
> -                       unlock_page(page);
> +                       unlock(page);
>                        page_cache_release(page);
>                        goto repeat;
>                }
> @@ -674,32 +666,31 @@ repeat:
>        }
>        return page;
>  }
> -EXPORT_SYMBOL(find_lock_page);
>
>  /**
> - * find_or_create_page - locate or add a pagecache page
> - * @mapping: the page's address_space
> - * @index: the page's index into the mapping
> - * @gfp_mask: page allocation mode
> - *
> - * Locates a page in the pagecache.  If the page is not present, a new page
> - * is allocated using @gfp_mask and is added to the pagecache and to the VM's
> - * LRU list.  The returned page is locked and has its reference count
> - * incremented.
> + * find_lock_page - locate, pin and lock a pagecache page
> + * @mapping: the address_space to search
> + * @offset: the page index
>  *
> - * find_or_create_page() may sleep, even if @gfp_flags specifies an atomic
> - * allocation!
> + * Locates the desired pagecache page, locks it, increments its reference
> + * count and returns its address.
>  *
> - * find_or_create_page() returns the desired page's address, or zero on
> - * memory exhaustion.
> + * Returns zero if the page was not present. find_lock_page() may sleep.
>  */
> -struct page *find_or_create_page(struct address_space *mapping,
> -               pgoff_t index, gfp_t gfp_mask)
> +struct page *find_lock_page(struct address_space *mapping, pgoff_t offset)
> +{
> +       return __find_lock_page(mapping, offset, lock_page, unlock_page);
> +}
> +EXPORT_SYMBOL(find_lock_page);
> +
> +static struct page *__find_or_create_page(struct address_space *mapping,
> +               pgoff_t index, gfp_t gfp_mask, void(*lock)(struct page *),
> +               void(*unlock)(struct page *))
>  {
>        struct page *page;
>        int err;
>  repeat:
> -       page = find_lock_page(mapping, index);
> +       page = __find_lock_page(mapping, index, lock, unlock);
>        if (!page) {
>                page = __page_cache_alloc(gfp_mask);
>                if (!page)
> @@ -721,6 +712,31 @@ repeat:
>        }
>        return page;
>  }
> +EXPORT_SYMBOL(__find_or_create_page);
> +
> +/**
> + * find_or_create_page - locate or add a pagecache page
> + * @mapping: the page's address_space
> + * @index: the page's index into the mapping
> + * @gfp_mask: page allocation mode
> + *
> + * Locates a page in the pagecache.  If the page is not present, a new page
> + * is allocated using @gfp_mask and is added to the pagecache and to the VM's
> + * LRU list.  The returned page is locked and has its reference count
> + * incremented.
> + *
> + * find_or_create_page() may sleep, even if @gfp_flags specifies an atomic
> + * allocation!
> + *
> + * find_or_create_page() returns the desired page's address, or zero on
> + * memory exhaustion.
> + */
> +struct page *find_or_create_page(struct address_space *mapping,
> +               pgoff_t index, gfp_t gfp_mask)
> +{
> +       return __find_or_create_page(mapping, index, gfp_mask, lock_page,
> +                       unlock_page);
> +}
>  EXPORT_SYMBOL(find_or_create_page);
>
>  /**
>

Seem to be not bad idea. :)
But we have to justify new interface before. For doing it, we have to say
why we can't do it by current functions(find_get_page,
add_to_page_cache and pagevec_lru_add_xxx)

Pagevec_lru_add_xxx does batch so that it can reduce calling path and
some overhead(ex, page_is_file_cache comparison,
get/put_cpu_var(lru_add_pvecs)).

At least, it would be rather good than old for performance.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: why are some low-level MM routines being exported?
  2010-04-05  8:26                 ` Minchan Kim
@ 2010-04-05 11:19                   ` Jörn Engel
  0 siblings, 0 replies; 17+ messages in thread
From: Jörn Engel @ 2010-04-05 11:19 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Evgeniy Polyakov, Robert P. J. Day, linux-mm

On Mon, 5 April 2010 17:26:58 +0900, Minchan Kim wrote:
> 
> Seem to be not bad idea. :)
> But we have to justify new interface before. For doing it, we have to say
> why we can't do it by current functions(find_get_page,
> add_to_page_cache and pagevec_lru_add_xxx)

I guess we could do that.  Whether setting up a vector when only dealing
with single pages makes the code more readable or helps performance is a
different matter, though.

> Pagevec_lru_add_xxx does batch so that it can reduce calling path and
> some overhead(ex, page_is_file_cache comparison,
> get/put_cpu_var(lru_add_pvecs)).
> 
> At least, it would be rather good than old for performance.

...if we can convert callers to also handle vectors.  And if backing
device is fast enough that cpu overhead becomes noticeable.  And if
there were no bigger fish left to catch.

JA?rn

-- 
Joern's library part 15:
http://www.knosof.co.uk/cbook/accu06a.pdf

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: why are some low-level MM routines being exported?
  2010-04-05  0:36         ` Minchan Kim
@ 2010-04-05 12:47           ` Evgeniy Polyakov
  2010-04-05 14:31             ` Minchan Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Evgeniy Polyakov @ 2010-04-05 12:47 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Robert P. J. Day, linux-mm, Joern Engel

On Mon, Apr 05, 2010 at 09:36:00AM +0900, Minchan Kim (minchan.kim@gmail.com) wrote:
> > All filesystems already call it through find_or_create_page() or
> > grab_page() invoked via read path. In some cases fs has more than
> > one page grabbed via its internal path where data to be read is
> > already placed, so it may want just to add those pages into mm lru.
> 
> I understood why it does need that in pohmelfs.
> AFAIU, other file system using general functions(ex, mpage_readpages or
> read_cache_pages) don't need direct LRU handling since it's hided.
> But pohmelfs doesn't use general functions.
> 
> Isn't pagevec_lru_add_file enough like other file system(ex, nfs, cifs)?

This will force to reinvent add_to_page_cache_lru() by doing private
function which will call add_to_page_cache() and pagevec_lru_add_file(),
which is effectively what is being done for file backed pages in
add_to_page_cache_lru().

-- 
	Evgeniy Polyakov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: why are some low-level MM routines being exported?
  2010-04-05 12:47           ` Evgeniy Polyakov
@ 2010-04-05 14:31             ` Minchan Kim
  0 siblings, 0 replies; 17+ messages in thread
From: Minchan Kim @ 2010-04-05 14:31 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Robert P. J. Day, linux-mm, Joern Engel

On Mon, Apr 5, 2010 at 9:47 PM, Evgeniy Polyakov <zbr@ioremap.net> wrote:
> On Mon, Apr 05, 2010 at 09:36:00AM +0900, Minchan Kim (minchan.kim@gmail.com) wrote:
>> > All filesystems already call it through find_or_create_page() or
>> > grab_page() invoked via read path. In some cases fs has more than
>> > one page grabbed via its internal path where data to be read is
>> > already placed, so it may want just to add those pages into mm lru.
>>
>> I understood why it does need that in pohmelfs.
>> AFAIU, other file system using general functions(ex, mpage_readpages or
>> read_cache_pages) don't need direct LRU handling since it's hided.
>> But pohmelfs doesn't use general functions.
>>
>> Isn't pagevec_lru_add_file enough like other file system(ex, nfs, cifs)?
>
> This will force to reinvent add_to_page_cache_lru() by doing private
> function which will call add_to_page_cache() and pagevec_lru_add_file(),
> which is effectively what is being done for file backed pages in
> add_to_page_cache_lru().
>
> --
>        Evgeniy Polyakov

Hmm. I found that.
http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg04472.html
Recently, Nick replaced it with add_to_page_cache_lru in btrfs, too.
It means other mm guy already knew that and allowed it.

Maybe I seem to get paranoid.
Sorry for bothering you, Evgeniy and joern.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-04-05 14:31 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-04 15:27 why are some low-level MM routines being exported? Robert P. J. Day
2010-04-04 15:59 ` Minchan Kim
2010-04-04 16:03   ` Evgeniy Polyakov
2010-04-04 16:17     ` Minchan Kim
2010-04-04 16:21     ` Minchan Kim
2010-04-04 18:15       ` Evgeniy Polyakov
2010-04-05  0:36         ` Minchan Kim
2010-04-05 12:47           ` Evgeniy Polyakov
2010-04-05 14:31             ` Minchan Kim
2010-04-04 19:55       ` Jörn Engel
2010-04-05  0:59         ` Minchan Kim
2010-04-05  5:30           ` Jörn Engel
2010-04-05  6:20             ` Minchan Kim
2010-04-05  6:22               ` Minchan Kim
2010-04-05  7:13               ` Jörn Engel
2010-04-05  8:26                 ` Minchan Kim
2010-04-05 11:19                   ` Jörn Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).