All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gioh Kim <gioh.kim@lge.com>
To: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	linux-ext4@vger.kernel.org, linux-mm@kvack.org,
	Minchan Kim <minchan@kernel.org>, Joonsoo Kim <js1304@gmail.com>
Subject: Re: [PATCH 0/2] new API to allocate buffer-cache for superblock in non-movable area
Date: Thu, 31 Jul 2014 08:54:40 +0900	[thread overview]
Message-ID: <53D985C0.3070300@lge.com> (raw)
In-Reply-To: <20140730101143.GB19205@quack.suse.cz>



2014-07-30 오후 7:11, Jan Kara 쓴 글:
> On Wed 30-07-14 16:44:24, Gioh Kim wrote:
>> 2014-07-22 오후 6:38, Jan Kara 쓴 글:
>>> On Tue 22-07-14 09:30:05, Peter Zijlstra wrote:
>>>> On Tue, Jul 22, 2014 at 02:18:47PM +0900, Gioh Kim wrote:
>>>>> Hello,
>>>>>
>>>>> This patch try to solve problem that a long-lasting page cache of
>>>>> ext4 superblock disturbs page migration.
>>>>>
>>>>> I've been testing CMA feature on my ARM-based platform
>>>>> and found some pages for page caches cannot be migrated.
>>>>> Some of them are page caches of superblock of ext4 filesystem.
>>>>>
>>>>> Current ext4 reads superblock with sb_bread(). sb_bread() allocates page
>>>> >from movable area. But the problem is that ext4 hold the page until
>>>>> it is unmounted. If root filesystem is ext4 the page cannot be migrated forever.
>>>>>
>>>>> I introduce a new API for allocating page from non-movable area.
>>>>> It is useful for ext4 and others that want to hold page cache for a long time.
>>>>
>>>> There's no word on why you can't teach ext4 to still migrate that page.
>>>> For all I know it might be impossible, but at least mention why.
>>
>> I am very sorry for lacking of details.
>>
>> In ext4_fill_super() the buffer-head of superblock is stored in sbi->s_sbh.
>> The page belongs to the buffer-head is allocated from movable area.
>> To migrate the page the buffer-head should be released via brelse().
>> But brelse() is not called until unmount.
>    Hum, I don't see where in the code do we check buffer_head use count. Can
> you please point me? Thanks.

Filesystem code does not check buffer_head use count.
sb_bread() returns the buffer_head that is included in bh_lru and has non-zero use count.
You can see the bh_lru code in buffer.c: __find_get_clock() and lookup_bh_lru().
bh_lru_install() inserts the buffer_head into the bh_lru().
It first calls get_bh() to increase the use count and insert bh into the lru array.

The buffer_head use count is non-zero until brelse() is called.

>
>>>    It doesn't seem to be worth the effort to make that page movable to me
>>> (it's reasonably doable since superblock buffer isn't accessed in *that*
>>> many places but single movable page doesn't seem like a good tradeoff for
>>> the complexity).
>>>
>>> But this made me look into the migration code and it isn't completely clear
>>> to me what makes the migration code decide that sb buffer isn't movable? We
>>> seem to be locking the buffers before moving the underlying page but we
>>> don't do any reference or state checks on the buffers... That seems to be
>>> assuming that noone looks at bh->b_data without holding buffer lock. That
>>> is likely true for ordinary data but definitely not true for metadata
>>> buffers (i.e., buffers for pages from block device mappings).
>>
>> The sb buffer is not movable because it is not released.
>> sb_bread increase the reference counter of buffer-head so that
>> the page of the buffer-head cannot be movable.
>>
>> sb_bread allocates page from movable area but it is not movable until the
>> reference counter of the buffer-head becomes zero.
>> There is no lock for the buffer but the reference counter acts like lock.
>    OK, but why do you care about a single page (of at most handful if you
> have more filesystems) which isn't movable? That shouldn't make a big
> difference to compaction...

Even a single page can make CMA migration fail.

>
>> Actually it is strange that ext4 keeps buffer-head in superblock
>> structure until unmount (it can be long time) I thinks the buffer-head
>> should be released immediately like fat_fill_super() did.  I believe
>> there is a reason to keep buffer-head so that I suggest this patch.
>    We don't copy some data from the superblock to other structure so from
> time to time we need to look e.g. at feature bits within superblock buffer.
> Historically we were updating numbers of free blocks and inodes in the
> superblock with each allocation but we don't do that anymore because it
> scales poorly. So there is no fundamental reason for keeping sb buffer
> pinned anymore. Just someone would have to rewrite the code to copy some
> pieces of data from the buffer to some other structure and use it there.

I hope so.

>
> 								Honza
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Gioh Kim <gioh.kim@lge.com>
To: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	linux-ext4@vger.kernel.org, linux-mm@kvack.org,
	Minchan Kim <minchan@kernel.org>, Joonsoo Kim <js1304@gmail.com>
Subject: Re: [PATCH 0/2] new API to allocate buffer-cache for superblock in non-movable area
Date: Thu, 31 Jul 2014 08:54:40 +0900	[thread overview]
Message-ID: <53D985C0.3070300@lge.com> (raw)
In-Reply-To: <20140730101143.GB19205@quack.suse.cz>



2014-07-30 i??i?? 7:11, Jan Kara i?' e,?:
> On Wed 30-07-14 16:44:24, Gioh Kim wrote:
>> 2014-07-22 i??i?? 6:38, Jan Kara i?' e,?:
>>> On Tue 22-07-14 09:30:05, Peter Zijlstra wrote:
>>>> On Tue, Jul 22, 2014 at 02:18:47PM +0900, Gioh Kim wrote:
>>>>> Hello,
>>>>>
>>>>> This patch try to solve problem that a long-lasting page cache of
>>>>> ext4 superblock disturbs page migration.
>>>>>
>>>>> I've been testing CMA feature on my ARM-based platform
>>>>> and found some pages for page caches cannot be migrated.
>>>>> Some of them are page caches of superblock of ext4 filesystem.
>>>>>
>>>>> Current ext4 reads superblock with sb_bread(). sb_bread() allocates page
>>>> >from movable area. But the problem is that ext4 hold the page until
>>>>> it is unmounted. If root filesystem is ext4 the page cannot be migrated forever.
>>>>>
>>>>> I introduce a new API for allocating page from non-movable area.
>>>>> It is useful for ext4 and others that want to hold page cache for a long time.
>>>>
>>>> There's no word on why you can't teach ext4 to still migrate that page.
>>>> For all I know it might be impossible, but at least mention why.
>>
>> I am very sorry for lacking of details.
>>
>> In ext4_fill_super() the buffer-head of superblock is stored in sbi->s_sbh.
>> The page belongs to the buffer-head is allocated from movable area.
>> To migrate the page the buffer-head should be released via brelse().
>> But brelse() is not called until unmount.
>    Hum, I don't see where in the code do we check buffer_head use count. Can
> you please point me? Thanks.

Filesystem code does not check buffer_head use count.
sb_bread() returns the buffer_head that is included in bh_lru and has non-zero use count.
You can see the bh_lru code in buffer.c: __find_get_clock() and lookup_bh_lru().
bh_lru_install() inserts the buffer_head into the bh_lru().
It first calls get_bh() to increase the use count and insert bh into the lru array.

The buffer_head use count is non-zero until brelse() is called.

>
>>>    It doesn't seem to be worth the effort to make that page movable to me
>>> (it's reasonably doable since superblock buffer isn't accessed in *that*
>>> many places but single movable page doesn't seem like a good tradeoff for
>>> the complexity).
>>>
>>> But this made me look into the migration code and it isn't completely clear
>>> to me what makes the migration code decide that sb buffer isn't movable? We
>>> seem to be locking the buffers before moving the underlying page but we
>>> don't do any reference or state checks on the buffers... That seems to be
>>> assuming that noone looks at bh->b_data without holding buffer lock. That
>>> is likely true for ordinary data but definitely not true for metadata
>>> buffers (i.e., buffers for pages from block device mappings).
>>
>> The sb buffer is not movable because it is not released.
>> sb_bread increase the reference counter of buffer-head so that
>> the page of the buffer-head cannot be movable.
>>
>> sb_bread allocates page from movable area but it is not movable until the
>> reference counter of the buffer-head becomes zero.
>> There is no lock for the buffer but the reference counter acts like lock.
>    OK, but why do you care about a single page (of at most handful if you
> have more filesystems) which isn't movable? That shouldn't make a big
> difference to compaction...

Even a single page can make CMA migration fail.

>
>> Actually it is strange that ext4 keeps buffer-head in superblock
>> structure until unmount (it can be long time) I thinks the buffer-head
>> should be released immediately like fat_fill_super() did.  I believe
>> there is a reason to keep buffer-head so that I suggest this patch.
>    We don't copy some data from the superblock to other structure so from
> time to time we need to look e.g. at feature bits within superblock buffer.
> Historically we were updating numbers of free blocks and inodes in the
> superblock with each allocation but we don't do that anymore because it
> scales poorly. So there is no fundamental reason for keeping sb buffer
> pinned anymore. Just someone would have to rewrite the code to copy some
> pieces of data from the buffer to some other structure and use it there.

I hope so.

>
> 								Honza
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Gioh Kim <gioh.kim@lge.com>
To: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Theodore Ts'o" <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	linux-ext4@vger.kernel.org, linux-mm@kvack.org,
	Minchan Kim <minchan@kernel.org>, Joonsoo Kim <js1304@gmail.com>
Subject: Re: [PATCH 0/2] new API to allocate buffer-cache for superblock in non-movable area
Date: Thu, 31 Jul 2014 08:54:40 +0900	[thread overview]
Message-ID: <53D985C0.3070300@lge.com> (raw)
In-Reply-To: <20140730101143.GB19205@quack.suse.cz>



2014-07-30 오후 7:11, Jan Kara 쓴 글:
> On Wed 30-07-14 16:44:24, Gioh Kim wrote:
>> 2014-07-22 오후 6:38, Jan Kara 쓴 글:
>>> On Tue 22-07-14 09:30:05, Peter Zijlstra wrote:
>>>> On Tue, Jul 22, 2014 at 02:18:47PM +0900, Gioh Kim wrote:
>>>>> Hello,
>>>>>
>>>>> This patch try to solve problem that a long-lasting page cache of
>>>>> ext4 superblock disturbs page migration.
>>>>>
>>>>> I've been testing CMA feature on my ARM-based platform
>>>>> and found some pages for page caches cannot be migrated.
>>>>> Some of them are page caches of superblock of ext4 filesystem.
>>>>>
>>>>> Current ext4 reads superblock with sb_bread(). sb_bread() allocates page
>>>> >from movable area. But the problem is that ext4 hold the page until
>>>>> it is unmounted. If root filesystem is ext4 the page cannot be migrated forever.
>>>>>
>>>>> I introduce a new API for allocating page from non-movable area.
>>>>> It is useful for ext4 and others that want to hold page cache for a long time.
>>>>
>>>> There's no word on why you can't teach ext4 to still migrate that page.
>>>> For all I know it might be impossible, but at least mention why.
>>
>> I am very sorry for lacking of details.
>>
>> In ext4_fill_super() the buffer-head of superblock is stored in sbi->s_sbh.
>> The page belongs to the buffer-head is allocated from movable area.
>> To migrate the page the buffer-head should be released via brelse().
>> But brelse() is not called until unmount.
>    Hum, I don't see where in the code do we check buffer_head use count. Can
> you please point me? Thanks.

Filesystem code does not check buffer_head use count.
sb_bread() returns the buffer_head that is included in bh_lru and has non-zero use count.
You can see the bh_lru code in buffer.c: __find_get_clock() and lookup_bh_lru().
bh_lru_install() inserts the buffer_head into the bh_lru().
It first calls get_bh() to increase the use count and insert bh into the lru array.

The buffer_head use count is non-zero until brelse() is called.

>
>>>    It doesn't seem to be worth the effort to make that page movable to me
>>> (it's reasonably doable since superblock buffer isn't accessed in *that*
>>> many places but single movable page doesn't seem like a good tradeoff for
>>> the complexity).
>>>
>>> But this made me look into the migration code and it isn't completely clear
>>> to me what makes the migration code decide that sb buffer isn't movable? We
>>> seem to be locking the buffers before moving the underlying page but we
>>> don't do any reference or state checks on the buffers... That seems to be
>>> assuming that noone looks at bh->b_data without holding buffer lock. That
>>> is likely true for ordinary data but definitely not true for metadata
>>> buffers (i.e., buffers for pages from block device mappings).
>>
>> The sb buffer is not movable because it is not released.
>> sb_bread increase the reference counter of buffer-head so that
>> the page of the buffer-head cannot be movable.
>>
>> sb_bread allocates page from movable area but it is not movable until the
>> reference counter of the buffer-head becomes zero.
>> There is no lock for the buffer but the reference counter acts like lock.
>    OK, but why do you care about a single page (of at most handful if you
> have more filesystems) which isn't movable? That shouldn't make a big
> difference to compaction...

Even a single page can make CMA migration fail.

>
>> Actually it is strange that ext4 keeps buffer-head in superblock
>> structure until unmount (it can be long time) I thinks the buffer-head
>> should be released immediately like fat_fill_super() did.  I believe
>> there is a reason to keep buffer-head so that I suggest this patch.
>    We don't copy some data from the superblock to other structure so from
> time to time we need to look e.g. at feature bits within superblock buffer.
> Historically we were updating numbers of free blocks and inodes in the
> superblock with each allocation but we don't do that anymore because it
> scales poorly. So there is no fundamental reason for keeping sb buffer
> pinned anymore. Just someone would have to rewrite the code to copy some
> pieces of data from the buffer to some other structure and use it there.

I hope so.

>
> 								Honza
>

  parent reply	other threads:[~2014-07-30 23:54 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-22  5:18 [PATCH 0/2] new API to allocate buffer-cache for superblock in non-movable area Gioh Kim
2014-07-22  7:30 ` Peter Zijlstra
2014-07-22  8:14   ` Theodore Ts'o
2014-07-27  1:01     ` Theodore Ts'o
2014-07-30  7:56       ` Gioh Kim
2014-07-30  7:56         ` Gioh Kim
2014-07-22  9:38   ` Jan Kara
2014-07-22  9:38     ` Jan Kara
2014-07-30  7:44     ` Gioh Kim
2014-07-30  7:44       ` Gioh Kim
2014-07-30  7:44       ` Gioh Kim
2014-07-30  7:57       ` Kyungmin Park
2014-07-30  7:57         ` Kyungmin Park
2014-07-30 10:11       ` Jan Kara
2014-07-30 10:11         ` Jan Kara
2014-07-30 10:11         ` Jan Kara
2014-07-30 10:19         ` Peter Zijlstra
2014-07-30 23:45           ` Gioh Kim
2014-07-30 23:45             ` Gioh Kim
2014-07-30 23:45             ` Gioh Kim
2014-07-30 23:54         ` Gioh Kim [this message]
2014-07-30 23:54           ` Gioh Kim
2014-07-30 23:54           ` Gioh Kim
2014-07-31  0:03           ` Jan Kara
2014-07-31  0:03             ` Jan Kara
2014-07-31  0:03             ` Jan Kara
2014-07-31  0:37             ` Gioh Kim
2014-07-31  0:37               ` Gioh Kim
2014-07-31  0:37               ` Gioh Kim
2014-07-31 12:21               ` Jan Kara
2014-07-31 12:21                 ` Jan Kara
2014-07-31 12:21                 ` Jan Kara
2014-08-01  0:07                 ` Gioh Kim
2014-08-01  0:07                   ` Gioh Kim
2014-08-01  0:07                   ` Gioh Kim
2014-08-01  1:06                   ` Gioh Kim
2014-08-01  1:06                     ` Gioh Kim
2014-08-01  1:06                     ` Gioh Kim
2014-08-01  9:57                     ` Jan Kara
2014-08-01  9:57                       ` Jan Kara
2014-08-01 13:36                       ` Peter Zijlstra
2014-08-01 15:24                         ` Jan Kara
2014-08-01 15:24                           ` Jan Kara
2014-08-01 16:04                           ` Peter Zijlstra
2014-08-06  6:15                             ` Gioh Kim
2014-08-06  6:15                               ` Gioh Kim
2014-08-06  6:15                               ` Gioh Kim
2014-08-01  8:34                 ` Joonsoo Kim
2014-08-01  8:34                   ` Joonsoo Kim
2014-08-01  8:34                   ` Joonsoo Kim
2014-08-01  9:15                   ` Jan Kara
2014-08-01  9:15                     ` Jan Kara
2014-08-01  9:15                     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53D985C0.3070300@lge.com \
    --to=gioh.kim@lge.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=js1304@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.