public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-fsdevel@vger.kernel.org,
	Octavian Purdila <octavian.purdila@intel.com>,
	linux-kernel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [RFC PATCH] xfs: support for non-mmu architectures
Date: Fri, 20 Nov 2015 10:11:19 -0500	[thread overview]
Message-ID: <20151120151118.GB60886@bfoster.bfoster> (raw)
In-Reply-To: <20151119233547.GN14311@dastard>

On Fri, Nov 20, 2015 at 10:35:47AM +1100, Dave Chinner wrote:
> On Thu, Nov 19, 2015 at 10:55:25AM -0500, Brian Foster wrote:
> > On Wed, Nov 18, 2015 at 12:46:21AM +0200, Octavian Purdila wrote:
> > > Naive implementation for non-mmu architectures: allocate physically
> > > contiguous xfs buffers with alloc_pages. Terribly inefficient with
> > > memory and fragmentation on high I/O loads but it may be good enough
> > > for basic usage (which most non-mmu architectures will need).
> > > 
> > > This patch was tested with lklfuse [1] and basic operations seems to
> > > work even with 16MB allocated for LKL.
> > > 
> > > [1] https://github.com/lkl/linux
> > > 
> > > Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
> > > ---
> > 
> > Interesting, though this makes me wonder why we couldn't have a new
> > _XBF_VMEM (for example) buffer type that uses vmalloc(). I'm not
> > familiar with mmu-less context, but I see that mm/nommu.c has a
> > __vmalloc() interface that looks like it ultimately translates into an
> > alloc_pages() call. Would that accomplish what this patch is currently
> > trying to do?
> 
> vmalloc is always a last resort.  vmalloc space on 32 bit systems is
> extremely limited and it is easy to exhaust with XFS.
> 

Sure, but my impression is that a vmalloc() buffer is roughly equivalent
in this regard to a current !XBF_UNMAPPED && size > PAGE_SIZE buffer. We
just do the allocation and mapping separately (presumably for other
reasons).

> Also, vmalloc limits the control we have over allocation context
> (e.g. the hoops we jump through in kmem_alloc_large() to maintain
> GFP_NOFS contexts), so just using vmalloc doesn't make things much
> simpler from an XFS perspective.
> 

The comment in kmem_zalloc_large() calls out some apparent hardcoded
allocation flags down in the depths of vmalloc(). It looks to me that
page allocation (__vmalloc_area_node()) actually uses the provided
flags, so I'm not following the "data page" part of that comment.
Indeed, I do see that this is not the case down in calls like
pmd_alloc_one(), pte_alloc_one_kernel(), etc., associated with page
table management.

Those latter calls are all from following down through the
map_vm_area()->vmap_page_range() codepath from __vmalloc_area_node(). We
call vm_map_ram() directly from _xfs_buf_map_pages(), which itself calls
down into the same code. Indeed, we already protect ourselves here via
the same memalloc_noio_save() mechanism that kmem_zalloc_large() uses.

I suspect there's more to it than that because it does look like
vm_map_ram() has a different mechanism for managing vmalloc space for
certain (smaller) allocations, either of which I'm not really familiar
with. That aside, I don't see how vmalloc() introduces any new
allocation context issues for those buffers where we already set up a
multi-page mapping.

We still have the somewhat customized page allocation code in
xfs_buf_allocate_memory() to contend with. I actually think it would be
useful to have a DEBUG sysfs tunable to turn on vmalloc() buffers and
actually test how effective some of this code is.

> > I ask because it seems like that would help clean up the code a bit, for
> > one. It might also facilitate some degree of testing of the XFS bits
> > (even if utilized sparingly in DEBUG mode if it weren't suitable enough
> > for generic/mmu use). We currently allocate and map the buffer pages
> > separately and I'm not sure if there's any particular reasons for doing
> > that outside of some congestion handling in the allocation code and
> > XBF_UNMAPPED buffers, the latter probably being irrelevant for nommu.
> > Any other thoughts on that?
> 
> We could probably clean the code up more (the allocation logic
> is now largely a historic relic) but I'm not convinced yet that we
> should be spending any time trying to specifically support mmu-less
> hardware.
> 

Fair point, we'll see where the use case discussion goes. That said, I
was a little surprised that this is all that was required to enable
nommu support. If that is indeed the case and we aren't in for a series
of subsequent nommu specific changes (Octavian?) by letting this
through, what's the big deal? This seems fairly harmless to me as is,
particularly if it can be semi-tested via DEBUG mode and has potential
generic use down the road.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2015-11-20 15:11 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-17 22:46 [RFC PATCH] xfs: support for non-mmu architectures Octavian Purdila
2015-11-19 15:55 ` Brian Foster
2015-11-19 20:54   ` Octavian Purdila
2015-11-20 15:11     ` Brian Foster
2015-11-19 23:35   ` Dave Chinner
2015-11-20 14:09     ` Octavian Purdila
2015-11-20 15:11     ` Brian Foster [this message]
2015-11-20 15:35       ` Octavian Purdila
2015-11-20 15:40         ` Brian Foster
2015-11-20 20:36       ` Dave Chinner
2015-11-20 22:47         ` Brian Foster
2015-11-22 22:04           ` Dave Chinner
2015-11-23 12:50             ` Brian Foster
2015-11-23 21:00               ` Dave Chinner
2015-11-19 23:24 ` Dave Chinner
2015-11-19 23:54   ` Richard Weinberger
2015-11-20  0:58     ` Dave Chinner
2015-11-20 14:26       ` Octavian Purdila
2015-11-20 15:24         ` Brian Foster
2015-11-20 15:31           ` Octavian Purdila
2015-11-20 15:43             ` Brian Foster
2015-11-20 20:07         ` Theodore Ts'o
2015-11-20 13:43   ` Octavian Purdila
2015-11-20 21:08     ` Dave Chinner
2015-11-20 22:26       ` Octavian Purdila
2015-11-22 22:44         ` Dave Chinner
2015-11-23  1:41           ` Octavian Purdila
2015-11-23 21:46             ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151120151118.GB60886@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=octavian.purdila@intel.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox