linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Kees Cook <keescook@chromium.org>,
	linux-kernel@vger.kernel.org, David Windsor <dave@nullcore.net>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org,
	kernel-hardening@lists.openwall.com
Subject: Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
Date: Wed, 30 Aug 2017 00:14:03 -0700	[thread overview]
Message-ID: <20170830071403.GA8904@infradead.org> (raw)
In-Reply-To: <20170829215157.GC10621@dastard>

On Wed, Aug 30, 2017 at 07:51:57AM +1000, Dave Chinner wrote:
> Right, I've looked at btrees, too, but it's more complex than just
> using an rbtree. I originally looked at using Peter Z's old
> RCU-aware btree code, but it doesn't hold data in the tree leaves.
> So that needed significant modification to make work without a
> memory alloc per extent and that didn't work with original aim of
> RCU-safe extent lookups.  I also looked at that "generic" btree
> stuff that came from logfs, and after a little while ran away
> screaming.

I started with the latter, but it's not really looking like it any more:
there nodes are formatted as a series of u64s instead of all the
long magic, and the data is stored inline - in fact I use a cute
trick to keep the size down, derived from our "compressed" on disk
extent format:

Key:

 +-------+----------------------------+
 | 00:51 | all 52 bits of startoff    |
 | 52:63 | low 12 bits of startblock  |
 +-------+----------------------------+

Value

 +-------+----------------------------+
 | 00:20 | all 21 bits of length      |
 |    21 | unwritten extent bit       |
 | 22:63 | high 42 bits of startblock |
 +-------+----------------------------+

So we only need a 64-bit key and a 64-bit value by abusing parts
of the key to store bits of the startblock.

For non-leaf nodes we iterate through the keys only, never touching
the cache lines for the value.  For the leaf nodes we have to touch
the value anyway because we have to do a range lookup to find the
exact record.

This works fine so far in an isolated simulator, and now I'm ammending
it to be a b+tree with pointers to the previous and next node so
that we can nicely implement our extent iterators instead of doing
full lookups.

> The sticking point, IMO, is the extent array index based lookups in
> all the bmbt code.  I've been looking at converting all that to use
> offset based lookups and a cursor w/ lookup/inc/dec/insert/delete
> ioperations wrapping xfs_iext_lookup_ext() and friends. This means
> the modifications are pretty much identical to the on-disk extent
> btree, so they can be abstracted out into a single extent update
> interface for both trees.  Have you planned/done any cleanup/changes
> with this code?

I've done various cleanups, but I've not yet consolidated the two.
Basically step one at the moment is to move everyone to
xfs_iext_lookup_extent + xfs_iext_get_extent that removes all the
bad intrusion.

Once we move to the actual b+trees the extnum_t cursor will be replaced
with a real cursor structure that contains a pointer to the current
b+tree leaf node, and an index inside that, which will allows us very
efficient iteration.  The xfs_iext_get_extent calls will be replaced
with more specific xfs_iext_prev_extent, xfs_iext_next_extent calls
that include the now slightly more complex cursor decrement, increment
as well as a new xfs_iext_last_extent helper for the last extent
that we need in a few places.

insert/delete remain very similar to what they do right now, they'll
get a different cursor type, and the manual xfs_iext_add calls will
go away.  The new xfs_iext_update_extent helper I posted to the list
yesterday will become a bit more complex, as changing the startoff
will have to be propagated up the tree.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-08-30  7:14 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-28 21:34 [PATCH v2 00/30] Hardened usercopy whitelisting Kees Cook
2017-08-28 21:34 ` [PATCH v2 01/30] usercopy: Prepare for " Kees Cook
2017-08-28 21:34 ` [PATCH v2 02/30] usercopy: Enforce slab cache usercopy region boundaries Kees Cook
2017-08-28 21:34 ` [PATCH v2 03/30] usercopy: Mark kmalloc caches as usercopy caches Kees Cook
2017-08-28 21:34 ` [PATCH v2 04/30] dcache: Define usercopy region in dentry_cache slab cache Kees Cook
2017-08-28 21:34 ` [PATCH v2 05/30] vfs: Define usercopy region in names_cache slab caches Kees Cook
2017-08-28 21:34 ` [PATCH v2 06/30] vfs: Copy struct mount.mnt_id to userspace using put_user() Kees Cook
2017-08-28 21:34 ` [PATCH v2 07/30] ext4: Define usercopy region in ext4_inode_cache slab cache Kees Cook
2017-08-28 21:34 ` [PATCH v2 08/30] ext2: Define usercopy region in ext2_inode_cache " Kees Cook
2017-08-30 11:22   ` Jan Kara
2017-08-28 21:34 ` [PATCH v2 09/30] jfs: Define usercopy region in jfs_ip " Kees Cook
2017-08-28 21:34 ` [PATCH v2 10/30] befs: Define usercopy region in befs_inode_cache " Kees Cook
2017-08-29 10:12   ` Luis de Bethencourt
2017-08-29 15:36     ` Kees Cook
2017-08-29 17:10       ` Luis de Bethencourt
2017-08-28 21:34 ` [PATCH v2 11/30] exofs: Define usercopy region in exofs_inode_cache " Kees Cook
2017-08-28 21:34 ` [PATCH v2 12/30] orangefs: Define usercopy region in orangefs_inode_cache " Kees Cook
2017-08-28 21:34 ` [PATCH v2 13/30] ufs: Define usercopy region in ufs_inode_cache " Kees Cook
2017-08-28 21:34 ` [PATCH v2 14/30] vxfs: Define usercopy region in vxfs_inode " Kees Cook
2017-08-28 21:34 ` [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode " Kees Cook
2017-08-28 21:49   ` Darrick J. Wong
2017-08-28 21:57     ` Kees Cook
2017-08-29  4:47       ` Darrick J. Wong
2017-08-29 18:48         ` Kees Cook
2017-08-29 19:00           ` Darrick J. Wong
2017-08-29 22:15           ` Dave Chinner
2017-08-29 22:25             ` Kees Cook
2017-08-29  8:14   ` Christoph Hellwig
2017-08-29 12:31     ` Dave Chinner
2017-08-29 12:45       ` Christoph Hellwig
2017-08-29 21:51         ` Dave Chinner
2017-08-30  7:14           ` Christoph Hellwig [this message]
2017-08-30  8:05             ` Dave Chinner
2017-08-30  8:33               ` Christoph Hellwig
2017-08-29 18:55     ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 16/30] cifs: Define usercopy region in cifs_request " Kees Cook
2017-08-28 21:34 ` [PATCH v2 17/30] scsi: Define usercopy region in scsi_sense_cache " Kees Cook
2017-08-28 21:42   ` Bart Van Assche
2017-08-28 21:52     ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 18/30] net: Define usercopy region in struct proto " Kees Cook
2017-08-28 21:35 ` [PATCH v2 19/30] ip: Define usercopy region in IP " Kees Cook
2017-08-28 21:35 ` [PATCH v2 20/30] caif: Define usercopy region in caif " Kees Cook
2017-08-28 21:35 ` [PATCH v2 21/30] sctp: Define usercopy region in SCTP " Kees Cook
2017-08-28 21:35 ` [PATCH v2 22/30] sctp: Copy struct sctp_sock.autoclose to userspace using put_user() Kees Cook
2017-08-28 21:35 ` [PATCH v2 23/30] net: Restrict unwhitelisted proto caches to size 0 Kees Cook
2017-08-28 21:35 ` [PATCH v2 24/30] fork: Define usercopy region in mm_struct slab caches Kees Cook
2017-08-30 19:29   ` [kernel-hardening] " Rik van Riel
2017-08-28 21:35 ` [PATCH v2 25/30] fork: Define usercopy region in thread_stack " Kees Cook
2017-08-30 18:55   ` [kernel-hardening] " Rik van Riel
2017-08-28 21:35 ` [PATCH v2 26/30] fork: Provide usercopy whitelisting for task_struct Kees Cook
2017-08-30 18:55   ` [kernel-hardening] " Rik van Riel
2017-08-28 21:35 ` [PATCH v2 27/30] x86: Implement thread_struct whitelist for hardened usercopy Kees Cook
2017-08-30 18:55   ` [kernel-hardening] " Rik van Riel
2017-08-28 21:35 ` [PATCH v2 28/30] arm64: " Kees Cook
2017-08-28 21:35 ` [PATCH v2 29/30] arm: " Kees Cook
2017-08-28 21:35 ` [PATCH v2 30/30] usercopy: Restrict non-usercopy caches to size 0 Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170830071403.GA8904@infradead.org \
    --to=hch@infradead.org \
    --cc=darrick.wong@oracle.com \
    --cc=dave@nullcore.net \
    --cc=david@fromorbit.com \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).