From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Hugh Dickins <hughd@google.com>,
Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
Mel Gorman <mgorman@suse.de>,
linux-mm@kvack.org, Andi Kleen <ak@linux.intel.com>,
Matthew Wilcox <willy@linux.intel.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Hillf Danton <dhillf@gmail.com>, Dave Hansen <dave@sr71.net>,
Ning Qu <quning@google.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCHv6 00/22] Transparent huge page cache: phase 1, everything but mmap()
Date: Mon, 14 Oct 2013 16:56:41 +0300 (EEST) [thread overview]
Message-ID: <20131014135642.05DFEE0090@blue.fi.intel.com> (raw)
In-Reply-To: <20130925232915.GK26872@dastard>
Dave Chinner wrote:
> On Wed, Sep 25, 2013 at 12:51:04PM +0300, Kirill A. Shutemov wrote:
> > Andrew Morton wrote:
> > > On Mon, 23 Sep 2013 15:05:28 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > >
> > > > It brings thp support for ramfs, but without mmap() -- it will be posted
> > > > separately.
> > >
> > > We were never going to do this :(
> > >
> > > Has anyone reviewed these patches much yet?
> >
> > Dave did very good review. Few other people looked to separate patches.
> > See Reviewed-by/Acked-by tags in patches.
> >
> > It looks like most mm experts are busy with numa balancing nowadays, so
> > it's hard to get more review.
>
> Nobody has reviewed it from the filesystem side, though.
>
> The changes that require special code paths for huge pages in the
> write_begin/write_end paths are nasty. You're adding conditional
> code that depends on the page size and then having to add checks to
> ensure that large page operations don't step over small page
> boundaries and other such corner cases. It's an extremely fragile
> design, IMO.
>
> In general, I don't like all the if (thp) {} else {}; code that this
> series introduces - they are code paths that simply won't get tested
> with any sort of regularity and make the code more complex for those
> that aren't using THP to understand and debug...
Okay, I'll try to get rid of special cases where it's possible.
> Then there is a new per-inode lock that is used in
> generic_perform_write() which is held across page faults and calls
> to filesystem block mapping callbacks. This inserts into the middle
> of an existing locking chain that needs to be strictly ordered, and
> as such will lead to the same type of lock inversion problems that
> the mmap_sem had. We do not want to introduce a new lock that has
> this same problem just as we are getting rid of that long standing
> nastiness from the page fault path...
I don't see how we can protect against splitting with existing locks,
but I'll try find a way.
> I also note that you didn't convert invalidate_inode_pages2_range()
> to support huge pages which is needed by real filesystems that
> support direct IO. There are other truncate/invalidate interfaces
> that you didn't convert, either, and some of them will present you
> with interesting locking challenges as a result of adding that new
> lock...
Thanks. I'll take a look on these code paths.
> > The patchset was mostly ignored for few rounds and Dave suggested to split
> > to have less scary patch number.
>
> It's still being ignored by filesystem people because you haven't
> actually tried to implement support into a real filesystem.....
If it will support a real filesystem, wouldn't it be ignored due
patch count? ;)
> > > > Please review and consider applying.
> > >
> > > It appears rather too immature at this stage.
> >
> > More review is always welcome and I'm committed to address issues.
>
> IMO, supporting a real block based filesystem like ext4 or XFS and
> demonstrating that everything works is necessary before we go any
> further...
Will see what numbers I can bring in next iterations.
Thanks for your feedback. And sorry for late answer.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Hugh Dickins <hughd@google.com>,
Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
Mel Gorman <mgorman@suse.de>,
linux-mm@kvack.org, Andi Kleen <ak@linux.intel.com>,
Matthew Wilcox <willy@linux.intel.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Hillf Danton <dhillf@gmail.com>, Dave Hansen <dave@sr71.net>,
Ning Qu <quning@google.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCHv6 00/22] Transparent huge page cache: phase 1, everything but mmap()
Date: Mon, 14 Oct 2013 16:56:41 +0300 (EEST) [thread overview]
Message-ID: <20131014135642.05DFEE0090@blue.fi.intel.com> (raw)
In-Reply-To: <20130925232915.GK26872@dastard>
Dave Chinner wrote:
> On Wed, Sep 25, 2013 at 12:51:04PM +0300, Kirill A. Shutemov wrote:
> > Andrew Morton wrote:
> > > On Mon, 23 Sep 2013 15:05:28 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > >
> > > > It brings thp support for ramfs, but without mmap() -- it will be posted
> > > > separately.
> > >
> > > We were never going to do this :(
> > >
> > > Has anyone reviewed these patches much yet?
> >
> > Dave did very good review. Few other people looked to separate patches.
> > See Reviewed-by/Acked-by tags in patches.
> >
> > It looks like most mm experts are busy with numa balancing nowadays, so
> > it's hard to get more review.
>
> Nobody has reviewed it from the filesystem side, though.
>
> The changes that require special code paths for huge pages in the
> write_begin/write_end paths are nasty. You're adding conditional
> code that depends on the page size and then having to add checks to
> ensure that large page operations don't step over small page
> boundaries and other such corner cases. It's an extremely fragile
> design, IMO.
>
> In general, I don't like all the if (thp) {} else {}; code that this
> series introduces - they are code paths that simply won't get tested
> with any sort of regularity and make the code more complex for those
> that aren't using THP to understand and debug...
Okay, I'll try to get rid of special cases where it's possible.
> Then there is a new per-inode lock that is used in
> generic_perform_write() which is held across page faults and calls
> to filesystem block mapping callbacks. This inserts into the middle
> of an existing locking chain that needs to be strictly ordered, and
> as such will lead to the same type of lock inversion problems that
> the mmap_sem had. We do not want to introduce a new lock that has
> this same problem just as we are getting rid of that long standing
> nastiness from the page fault path...
I don't see how we can protect against splitting with existing locks,
but I'll try find a way.
> I also note that you didn't convert invalidate_inode_pages2_range()
> to support huge pages which is needed by real filesystems that
> support direct IO. There are other truncate/invalidate interfaces
> that you didn't convert, either, and some of them will present you
> with interesting locking challenges as a result of adding that new
> lock...
Thanks. I'll take a look on these code paths.
> > The patchset was mostly ignored for few rounds and Dave suggested to split
> > to have less scary patch number.
>
> It's still being ignored by filesystem people because you haven't
> actually tried to implement support into a real filesystem.....
If it will support a real filesystem, wouldn't it be ignored due
patch count? ;)
> > > > Please review and consider applying.
> > >
> > > It appears rather too immature at this stage.
> >
> > More review is always welcome and I'm committed to address issues.
>
> IMO, supporting a real block based filesystem like ext4 or XFS and
> demonstrating that everything works is necessary before we go any
> further...
Will see what numbers I can bring in next iterations.
Thanks for your feedback. And sorry for late answer.
--
Kirill A. Shutemov
next prev parent reply other threads:[~2013-10-14 13:56 UTC|newest]
Thread overview: 100+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-23 12:05 [PATCHv6 00/22] Transparent huge page cache: phase 1, everything but mmap() Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 01/22] mm: implement zero_huge_user_segment and friends Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 02/22] radix-tree: implement preload for multiple contiguous elements Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 03/22] memcg, thp: charge huge cache pages Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 04/22] thp: compile-time and sysfs knob for thp pagecache Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 05/22] thp, mm: introduce mapping_can_have_hugepages() predicate Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 06/22] thp: represent file thp pages in meminfo and friends Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 07/22] thp, mm: rewrite add_to_page_cache_locked() to support huge pages Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 08/22] mm: trace filemap: dump page order Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 09/22] block: implement add_bdi_stat() Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 10/22] thp, mm: rewrite delete_from_page_cache() to support huge pages Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-25 20:02 ` Ning Qu
2013-09-25 20:02 ` Ning Qu
2013-09-23 12:05 ` [PATCHv6 11/22] thp, mm: warn if we try to use replace_page_cache_page() with THP Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 12/22] thp, mm: add event counters for huge page alloc on file write or read Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 13/22] mm, vfs: introduce i_split_sem Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 14/22] thp, mm: allocate huge pages in grab_cache_page_write_begin() Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 15/22] thp, mm: naive support of thp in generic_perform_write Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 16/22] thp, mm: handle transhuge pages in do_generic_file_read() Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 17/22] thp, libfs: initial thp support Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 18/22] truncate: support huge pages Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 19/22] thp: handle file pages in split_huge_page() Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 20/22] thp: wait_split_huge_page(): serialize over i_mmap_mutex too Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 21/22] thp, mm: split huge page on mmap file page Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 22/22] ramfs: enable transparent huge page cache Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov
2013-09-24 23:37 ` [PATCHv6 00/22] Transparent huge page cache: phase 1, everything but mmap() Andrew Morton
2013-09-24 23:37 ` Andrew Morton
2013-09-24 23:48 ` Ning Qu
2013-09-24 23:49 ` Andi Kleen
2013-09-24 23:49 ` Andi Kleen
2013-09-24 23:58 ` Andrew Morton
2013-09-24 23:58 ` Andrew Morton
2013-09-25 11:15 ` Kirill A. Shutemov
2013-09-25 11:15 ` Kirill A. Shutemov
2013-09-25 15:05 ` Andi Kleen
2013-09-25 15:05 ` Andi Kleen
2013-09-26 18:30 ` Zach Brown
2013-09-26 18:30 ` Zach Brown
2013-09-26 19:05 ` Andi Kleen
2013-09-26 19:05 ` Andi Kleen
2013-09-30 10:13 ` Mel Gorman
2013-09-30 10:13 ` Mel Gorman
2013-09-30 16:05 ` Andi Kleen
2013-09-30 16:05 ` Andi Kleen
2013-09-25 9:51 ` Kirill A. Shutemov
2013-09-25 9:51 ` Kirill A. Shutemov
2013-09-25 23:29 ` Dave Chinner
2013-09-25 23:29 ` Dave Chinner
2013-10-14 13:56 ` Kirill A. Shutemov [this message]
2013-10-14 13:56 ` Kirill A. Shutemov
2013-09-30 10:02 ` Mel Gorman
2013-09-30 10:02 ` Mel Gorman
2013-09-30 10:10 ` Mel Gorman
2013-09-30 10:10 ` Mel Gorman
2013-09-30 18:07 ` Ning Qu
2013-09-30 18:07 ` Ning Qu
2013-09-30 18:07 ` Ning Qu
2013-09-30 18:51 ` Andi Kleen
2013-09-30 18:51 ` Andi Kleen
2013-10-01 8:38 ` Mel Gorman
2013-10-01 8:38 ` Mel Gorman
2013-10-01 17:11 ` Ning Qu
2013-10-01 17:11 ` Ning Qu
2013-10-14 14:27 ` Kirill A. Shutemov
2013-10-14 14:27 ` Kirill A. Shutemov
2013-09-30 15:27 ` Dave Hansen
2013-09-30 15:27 ` Dave Hansen
2013-09-30 18:05 ` Ning Qu
2013-09-30 18:05 ` Ning Qu
2013-09-30 18:05 ` Ning Qu
2013-09-25 0:12 ` Ning Qu
2013-09-25 9:23 ` Kirill A. Shutemov
2013-09-25 9:23 ` Kirill A. Shutemov
2013-09-26 21:13 ` Dave Hansen
2013-09-26 21:13 ` Dave Hansen
-- strict thread matches above, loose matches on Subject: below --
2013-09-25 18:11 Ning Qu
2013-09-25 18:11 ` Ning Qu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131014135642.05DFEE0090@blue.fi.intel.com \
--to=kirill.shutemov@linux.intel.com \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=dave@sr71.net \
--cc=david@fromorbit.com \
--cc=dhillf@gmail.com \
--cc=fengguang.wu@intel.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=quning@google.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.