All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Octavian Purdila <octavian.purdila@intel.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	lkml <linux-kernel@vger.kernel.org>, xfs <xfs@oss.sgi.com>
Subject: Re: [RFC PATCH] xfs: support for non-mmu architectures
Date: Fri, 20 Nov 2015 10:40:20 -0500	[thread overview]
Message-ID: <20151120154020.GD60886@bfoster.bfoster> (raw)
In-Reply-To: <CAE1zot+t6mVovxHy-ohEjCOLcs87_OPPA8Drc-0ubsTvhQVo_w@mail.gmail.com>

On Fri, Nov 20, 2015 at 05:35:55PM +0200, Octavian Purdila wrote:
> On Fri, Nov 20, 2015 at 5:11 PM, Brian Foster <bfoster@redhat.com> wrote:
> > On Fri, Nov 20, 2015 at 10:35:47AM +1100, Dave Chinner wrote:
> >> On Thu, Nov 19, 2015 at 10:55:25AM -0500, Brian Foster wrote:
> >> > On Wed, Nov 18, 2015 at 12:46:21AM +0200, Octavian Purdila wrote:
> >> > > Naive implementation for non-mmu architectures: allocate physically
> >> > > contiguous xfs buffers with alloc_pages. Terribly inefficient with
> >> > > memory and fragmentation on high I/O loads but it may be good enough
> >> > > for basic usage (which most non-mmu architectures will need).
> >> > >
> >> > > This patch was tested with lklfuse [1] and basic operations seems to
> >> > > work even with 16MB allocated for LKL.
> >> > >
> >> > > [1] https://github.com/lkl/linux
> >> > >
> >> > > Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
> >> > > ---
> >> >
> >> > Interesting, though this makes me wonder why we couldn't have a new
> >> > _XBF_VMEM (for example) buffer type that uses vmalloc(). I'm not
> >> > familiar with mmu-less context, but I see that mm/nommu.c has a
> >> > __vmalloc() interface that looks like it ultimately translates into an
> >> > alloc_pages() call. Would that accomplish what this patch is currently
> >> > trying to do?
> >>
> >> vmalloc is always a last resort.  vmalloc space on 32 bit systems is
> >> extremely limited and it is easy to exhaust with XFS.
> >>
> >
> > Sure, but my impression is that a vmalloc() buffer is roughly equivalent
> > in this regard to a current !XBF_UNMAPPED && size > PAGE_SIZE buffer. We
> > just do the allocation and mapping separately (presumably for other
> > reasons).
> >
> >> Also, vmalloc limits the control we have over allocation context
> >> (e.g. the hoops we jump through in kmem_alloc_large() to maintain
> >> GFP_NOFS contexts), so just using vmalloc doesn't make things much
> >> simpler from an XFS perspective.
> >>
> >
> > The comment in kmem_zalloc_large() calls out some apparent hardcoded
> > allocation flags down in the depths of vmalloc(). It looks to me that
> > page allocation (__vmalloc_area_node()) actually uses the provided
> > flags, so I'm not following the "data page" part of that comment.
> > Indeed, I do see that this is not the case down in calls like
> > pmd_alloc_one(), pte_alloc_one_kernel(), etc., associated with page
> > table management.
> >
> > Those latter calls are all from following down through the
> > map_vm_area()->vmap_page_range() codepath from __vmalloc_area_node(). We
> > call vm_map_ram() directly from _xfs_buf_map_pages(), which itself calls
> > down into the same code. Indeed, we already protect ourselves here via
> > the same memalloc_noio_save() mechanism that kmem_zalloc_large() uses.
> >
> > I suspect there's more to it than that because it does look like
> > vm_map_ram() has a different mechanism for managing vmalloc space for
> > certain (smaller) allocations, either of which I'm not really familiar
> > with. That aside, I don't see how vmalloc() introduces any new
> > allocation context issues for those buffers where we already set up a
> > multi-page mapping.
> >
> > We still have the somewhat customized page allocation code in
> > xfs_buf_allocate_memory() to contend with. I actually think it would be
> > useful to have a DEBUG sysfs tunable to turn on vmalloc() buffers and
> > actually test how effective some of this code is.
> >
> >> > I ask because it seems like that would help clean up the code a bit, for
> >> > one. It might also facilitate some degree of testing of the XFS bits
> >> > (even if utilized sparingly in DEBUG mode if it weren't suitable enough
> >> > for generic/mmu use). We currently allocate and map the buffer pages
> >> > separately and I'm not sure if there's any particular reasons for doing
> >> > that outside of some congestion handling in the allocation code and
> >> > XBF_UNMAPPED buffers, the latter probably being irrelevant for nommu.
> >> > Any other thoughts on that?
> >>
> >> We could probably clean the code up more (the allocation logic
> >> is now largely a historic relic) but I'm not convinced yet that we
> >> should be spending any time trying to specifically support mmu-less
> >> hardware.
> >>
> >
> > Fair point, we'll see where the use case discussion goes. That said, I
> > was a little surprised that this is all that was required to enable
> > nommu support. If that is indeed the case and we aren't in for a series
> > of subsequent nommu specific changes (Octavian?) by letting this
> > through, what's the big deal? This seems fairly harmless to me as is,
> > particularly if it can be semi-tested via DEBUG mode and has potential
> > generic use down the road.
> >
> 
> I don't foresee additional patches. I was able to use lklfuse to mount
> an XFS image and perform basic operations. Are there any xfs specific
> tests coverage tools I can use to make sure I am not missing anything?
> 

Ok, well you probably want to run some of the tests in xfstests and see
if anything falls over:

https://git.kernel.org/cgit/fs/xfs/xfstests-dev.git/

Note that this has generic as well as fs-specific tests for other
filesystems (ext4, btrfs, etc.).

Brian

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Brian Foster <bfoster@redhat.com>
To: Octavian Purdila <octavian.purdila@intel.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	lkml <linux-kernel@vger.kernel.org>, xfs <xfs@oss.sgi.com>
Subject: Re: [RFC PATCH] xfs: support for non-mmu architectures
Date: Fri, 20 Nov 2015 10:40:20 -0500	[thread overview]
Message-ID: <20151120154020.GD60886@bfoster.bfoster> (raw)
In-Reply-To: <CAE1zot+t6mVovxHy-ohEjCOLcs87_OPPA8Drc-0ubsTvhQVo_w@mail.gmail.com>

On Fri, Nov 20, 2015 at 05:35:55PM +0200, Octavian Purdila wrote:
> On Fri, Nov 20, 2015 at 5:11 PM, Brian Foster <bfoster@redhat.com> wrote:
> > On Fri, Nov 20, 2015 at 10:35:47AM +1100, Dave Chinner wrote:
> >> On Thu, Nov 19, 2015 at 10:55:25AM -0500, Brian Foster wrote:
> >> > On Wed, Nov 18, 2015 at 12:46:21AM +0200, Octavian Purdila wrote:
> >> > > Naive implementation for non-mmu architectures: allocate physically
> >> > > contiguous xfs buffers with alloc_pages. Terribly inefficient with
> >> > > memory and fragmentation on high I/O loads but it may be good enough
> >> > > for basic usage (which most non-mmu architectures will need).
> >> > >
> >> > > This patch was tested with lklfuse [1] and basic operations seems to
> >> > > work even with 16MB allocated for LKL.
> >> > >
> >> > > [1] https://github.com/lkl/linux
> >> > >
> >> > > Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
> >> > > ---
> >> >
> >> > Interesting, though this makes me wonder why we couldn't have a new
> >> > _XBF_VMEM (for example) buffer type that uses vmalloc(). I'm not
> >> > familiar with mmu-less context, but I see that mm/nommu.c has a
> >> > __vmalloc() interface that looks like it ultimately translates into an
> >> > alloc_pages() call. Would that accomplish what this patch is currently
> >> > trying to do?
> >>
> >> vmalloc is always a last resort.  vmalloc space on 32 bit systems is
> >> extremely limited and it is easy to exhaust with XFS.
> >>
> >
> > Sure, but my impression is that a vmalloc() buffer is roughly equivalent
> > in this regard to a current !XBF_UNMAPPED && size > PAGE_SIZE buffer. We
> > just do the allocation and mapping separately (presumably for other
> > reasons).
> >
> >> Also, vmalloc limits the control we have over allocation context
> >> (e.g. the hoops we jump through in kmem_alloc_large() to maintain
> >> GFP_NOFS contexts), so just using vmalloc doesn't make things much
> >> simpler from an XFS perspective.
> >>
> >
> > The comment in kmem_zalloc_large() calls out some apparent hardcoded
> > allocation flags down in the depths of vmalloc(). It looks to me that
> > page allocation (__vmalloc_area_node()) actually uses the provided
> > flags, so I'm not following the "data page" part of that comment.
> > Indeed, I do see that this is not the case down in calls like
> > pmd_alloc_one(), pte_alloc_one_kernel(), etc., associated with page
> > table management.
> >
> > Those latter calls are all from following down through the
> > map_vm_area()->vmap_page_range() codepath from __vmalloc_area_node(). We
> > call vm_map_ram() directly from _xfs_buf_map_pages(), which itself calls
> > down into the same code. Indeed, we already protect ourselves here via
> > the same memalloc_noio_save() mechanism that kmem_zalloc_large() uses.
> >
> > I suspect there's more to it than that because it does look like
> > vm_map_ram() has a different mechanism for managing vmalloc space for
> > certain (smaller) allocations, either of which I'm not really familiar
> > with. That aside, I don't see how vmalloc() introduces any new
> > allocation context issues for those buffers where we already set up a
> > multi-page mapping.
> >
> > We still have the somewhat customized page allocation code in
> > xfs_buf_allocate_memory() to contend with. I actually think it would be
> > useful to have a DEBUG sysfs tunable to turn on vmalloc() buffers and
> > actually test how effective some of this code is.
> >
> >> > I ask because it seems like that would help clean up the code a bit, for
> >> > one. It might also facilitate some degree of testing of the XFS bits
> >> > (even if utilized sparingly in DEBUG mode if it weren't suitable enough
> >> > for generic/mmu use). We currently allocate and map the buffer pages
> >> > separately and I'm not sure if there's any particular reasons for doing
> >> > that outside of some congestion handling in the allocation code and
> >> > XBF_UNMAPPED buffers, the latter probably being irrelevant for nommu.
> >> > Any other thoughts on that?
> >>
> >> We could probably clean the code up more (the allocation logic
> >> is now largely a historic relic) but I'm not convinced yet that we
> >> should be spending any time trying to specifically support mmu-less
> >> hardware.
> >>
> >
> > Fair point, we'll see where the use case discussion goes. That said, I
> > was a little surprised that this is all that was required to enable
> > nommu support. If that is indeed the case and we aren't in for a series
> > of subsequent nommu specific changes (Octavian?) by letting this
> > through, what's the big deal? This seems fairly harmless to me as is,
> > particularly if it can be semi-tested via DEBUG mode and has potential
> > generic use down the road.
> >
> 
> I don't foresee additional patches. I was able to use lklfuse to mount
> an XFS image and perform basic operations. Are there any xfs specific
> tests coverage tools I can use to make sure I am not missing anything?
> 

Ok, well you probably want to run some of the tests in xfstests and see
if anything falls over:

https://git.kernel.org/cgit/fs/xfs/xfstests-dev.git/

Note that this has generic as well as fs-specific tests for other
filesystems (ext4, btrfs, etc.).

Brian

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-11-20 15:40 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-17 22:46 [RFC PATCH] xfs: support for non-mmu architectures Octavian Purdila
2015-11-17 22:46 ` Octavian Purdila
2015-11-19 15:55 ` Brian Foster
2015-11-19 15:55   ` Brian Foster
2015-11-19 20:54   ` Octavian Purdila
2015-11-19 20:54     ` Octavian Purdila
2015-11-20 15:11     ` Brian Foster
2015-11-20 15:11       ` Brian Foster
2015-11-19 23:35   ` Dave Chinner
2015-11-19 23:35     ` Dave Chinner
2015-11-20 14:09     ` Octavian Purdila
2015-11-20 14:09       ` Octavian Purdila
2015-11-20 15:11     ` Brian Foster
2015-11-20 15:11       ` Brian Foster
2015-11-20 15:35       ` Octavian Purdila
2015-11-20 15:35         ` Octavian Purdila
2015-11-20 15:40         ` Brian Foster [this message]
2015-11-20 15:40           ` Brian Foster
2015-11-20 20:36       ` Dave Chinner
2015-11-20 20:36         ` Dave Chinner
2015-11-20 22:47         ` Brian Foster
2015-11-20 22:47           ` Brian Foster
2015-11-22 22:04           ` Dave Chinner
2015-11-22 22:04             ` Dave Chinner
2015-11-23 12:50             ` Brian Foster
2015-11-23 12:50               ` Brian Foster
2015-11-23 21:00               ` Dave Chinner
2015-11-23 21:00                 ` Dave Chinner
2015-11-19 23:24 ` Dave Chinner
2015-11-19 23:24   ` Dave Chinner
2015-11-19 23:54   ` Richard Weinberger
2015-11-19 23:54     ` Richard Weinberger
2015-11-20  0:58     ` Dave Chinner
2015-11-20  0:58       ` Dave Chinner
2015-11-20 14:26       ` Octavian Purdila
2015-11-20 14:26         ` Octavian Purdila
2015-11-20 15:24         ` Brian Foster
2015-11-20 15:24           ` Brian Foster
2015-11-20 15:31           ` Octavian Purdila
2015-11-20 15:31             ` Octavian Purdila
2015-11-20 15:43             ` Brian Foster
2015-11-20 15:43               ` Brian Foster
2015-11-20 20:07         ` Theodore Ts'o
2015-11-20 20:07           ` Theodore Ts'o
2015-11-20 13:43   ` Octavian Purdila
2015-11-20 13:43     ` Octavian Purdila
2015-11-20 21:08     ` Dave Chinner
2015-11-20 21:08       ` Dave Chinner
2015-11-20 22:26       ` Octavian Purdila
2015-11-20 22:26         ` Octavian Purdila
2015-11-22 22:44         ` Dave Chinner
2015-11-22 22:44           ` Dave Chinner
2015-11-23  1:41           ` Octavian Purdila
2015-11-23  1:41             ` Octavian Purdila
2015-11-23 21:46             ` Dave Chinner
2015-11-23 21:46               ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151120154020.GD60886@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=octavian.purdila@intel.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.