From: "Darrick J. Wong" <djwong@kernel.org>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-xfs@vger.kernel.org, willy@infradead.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 1/7] xfs: create a big array data structure
Date: Thu, 25 May 2023 20:19:17 -0700 [thread overview]
Message-ID: <20230526031917.GK11620@frogsfrogsfrogs> (raw)
In-Reply-To: <ZHAMpeyu/KmXtRw8@moria.home.lan>
On Thu, May 25, 2023 at 09:34:29PM -0400, Kent Overstreet wrote:
> On Thu, May 25, 2023 at 05:47:08PM -0700, Darrick J. Wong wrote:
> > +struct xfarray {
> > + /* Underlying file that backs the array. */
> > + struct xfile *xfile;
> > +
> > + /* Number of array elements. */
> > + xfarray_idx_t nr;
> > +
> > + /* Maximum possible array size. */
> > + xfarray_idx_t max_nr;
> > +
> > + /* Number of unset slots in the array below @nr. */
> > + uint64_t unset_slots;
> > +
> > + /* Size of an array element. */
> > + size_t obj_size;
> > +
> > + /* log2 of array element size, if possible. */
> > + int obj_size_log;
> > +};
> > +
> > +int xfarray_create(struct xfs_mount *mp, const char *descr,
> > + unsigned long long required_capacity, size_t obj_size,
> > + struct xfarray **arrayp);
> > +void xfarray_destroy(struct xfarray *array);
> > +int xfarray_load(struct xfarray *array, xfarray_idx_t idx, void *ptr);
> > +int xfarray_unset(struct xfarray *array, xfarray_idx_t idx);
> > +int xfarray_store(struct xfarray *array, xfarray_idx_t idx, const void *ptr);
> > +int xfarray_store_anywhere(struct xfarray *array, const void *ptr);
> > +bool xfarray_element_is_null(struct xfarray *array, const void *ptr);
>
> Nice simple external interface... +1
>
> Since you're storing fixed size elements, if you wanted to make it
> slicker you could steal the generic-radix tree approach of using a
> wrapper type to make the object size known at compile time, which lets
> you constant propagate through the index -> offset calculations.
>
> But not worth it from a performance POV with the current implementation,
> because...
>
> > +/*
> > + * Read a memory object directly from the xfile's page cache. Unlike regular
> > + * pread, we return -E2BIG and -EFBIG for reads that are too large or at too
> > + * high an offset, instead of truncating the read. Otherwise, we return
> > + * bytes read or an error code, like regular pread.
> > + */
> > +ssize_t
> > +xfile_pread(
> > + struct xfile *xf,
> > + void *buf,
> > + size_t count,
> > + loff_t pos)
> > +{
> > + struct inode *inode = file_inode(xf->file);
> > + struct address_space *mapping = inode->i_mapping;
> > + struct page *page = NULL;
> > + ssize_t read = 0;
> > + unsigned int pflags;
> > + int error = 0;
> > +
> > + if (count > MAX_RW_COUNT)
> > + return -E2BIG;
> > + if (inode->i_sb->s_maxbytes - pos < count)
> > + return -EFBIG;
> > +
> > + trace_xfile_pread(xf, pos, count);
> > +
> > + pflags = memalloc_nofs_save();
> > + while (count > 0) {
> > + void *p, *kaddr;
> > + unsigned int len;
> > +
> > + len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
> > +
> > + /*
> > + * In-kernel reads of a shmem file cause it to allocate a page
> > + * if the mapping shows a hole. Therefore, if we hit ENOMEM
> > + * we can continue by zeroing the caller's buffer.
> > + */
> > + page = shmem_read_mapping_page_gfp(mapping, pos >> PAGE_SHIFT,
> > + __GFP_NOWARN);
> > + if (IS_ERR(page)) {
> > + error = PTR_ERR(page);
> > + if (error != -ENOMEM)
> > + break;
> > +
> > + memset(buf, 0, len);
> > + goto advance;
> > + }
> > +
> > + if (PageUptodate(page)) {
> > + /*
> > + * xfile pages must never be mapped into userspace, so
> > + * we skip the dcache flush.
> > + */
> > + kaddr = kmap_local_page(page);
> > + p = kaddr + offset_in_page(pos);
> > + memcpy(buf, p, len);
> > + kunmap_local(kaddr);
> > + } else {
> > + memset(buf, 0, len);
> > + }
> > + put_page(page);
> > +
> > +advance:
> > + count -= len;
> > + pos += len;
> > + buf += len;
> > + read += len;
> > + }
> > + memalloc_nofs_restore(pflags);
> > +
> > + if (read > 0)
> > + return read;
> > + return error;
> > +}
>
> this all, and the write path, looks a bit heavy - you're calling through
> shmem_read_mapping_page_gfp() on every lookup. Does it matter?
Longer term I'd like to work with willy on an in-kernel mmap and/or
using large folios with the tmpfs file, but for now I only care that it
works correctly and gets merged. :)
> If we care about performance, we want to get it as much as possible down
> to just the page cache radix tree lookup - and possibly cache the last
> page returned if we care about sequential performance.
(That comes later in this megapatchset.)
> OTOH, maybe shmem_get_folio_gfp() and __filemap_get_folio() could
> benefit from some early returns -
> if (likely(got_the_thing_we_want)) return folio;
>
> Another thought... if obj_size <= PAGE_SIZE, maybe you could do what
> genradix does and not have objects span pages? That would let you get
> rid of the loop in read/write - but then you'd want to be doing an
> interface that works in terms of pages/folios, which wouldn't be as
> clean as what you've got.
Yeah... the xfs dquot files store 136-byte dquot records which don't
cross fsblock boundaries. There's a lot of math involved there, which
at least there's an incore dquot object so we're mostly not pounding on
the dquot file itself.
--D
> Just spitballing random ideas, looks good :)
next prev parent reply other threads:[~2023-05-26 3:19 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-26 0:00 [MEGAPATCHSET v25 1/2] xfs: online repair, part 1 Darrick J. Wong
2023-05-26 0:28 ` [PATCHSET v25.0 0/7] xfs: stage repair information in pageable memory Darrick J. Wong
2023-05-26 0:47 ` [PATCH 1/7] xfs: create a big array data structure Darrick J. Wong
2023-05-26 1:34 ` Kent Overstreet
2023-05-26 3:19 ` Darrick J. Wong [this message]
2023-06-22 2:55 ` Dave Chinner
2023-07-05 23:48 ` Darrick J. Wong
2023-05-26 0:47 ` [PATCH 2/7] xfs: enable sorting of xfile-backed arrays Darrick J. Wong
2023-05-26 0:47 ` [PATCH 3/7] xfs: convert xfarray insertion sort to heapsort using scratchpad memory Darrick J. Wong
2023-05-26 0:47 ` [PATCH 4/7] xfs: teach xfile to pass back direct-map pages to caller Darrick J. Wong
2023-05-26 0:48 ` [PATCH 5/7] xfs: speed up xfarray sort by sorting xfile page contents directly Darrick J. Wong
2023-05-26 0:48 ` [PATCH 6/7] xfs: cache pages used for xfarray quicksort convergence Darrick J. Wong
2023-05-26 0:48 ` [PATCH 7/7] xfs: improve xfarray quicksort pivot Darrick J. Wong
2023-06-22 2:58 ` [PATCHSET v25.0 0/7] xfs: stage repair information in pageable memory Dave Chinner
2023-05-26 0:32 ` [PATCHSET v25.0 0/9] xfs: support in-memory btrees Darrick J. Wong
2023-05-26 1:04 ` [PATCH 1/9] xfs: dump xfiles for debugging purposes Darrick J. Wong
2023-05-26 1:05 ` [PATCH 2/9] xfs: teach buftargs to maintain their own buffer hashtable Darrick J. Wong
2023-05-26 1:05 ` [PATCH 3/9] xfs: create buftarg helpers to abstract block_device operations Darrick J. Wong
2023-05-26 1:05 ` [PATCH 4/9] xfs: make GFP_ usage consistent when allocating buftargs Darrick J. Wong
2023-05-26 1:05 ` [PATCH 5/9] xfs: support in-memory buffer cache targets Darrick J. Wong
2023-05-26 1:06 ` [PATCH 6/9] xfs: consolidate btree block freeing tracepoints Darrick J. Wong
2023-05-26 1:06 ` [PATCH 7/9] xfs: consolidate btree block allocation tracepoints Darrick J. Wong
2023-05-26 1:06 ` [PATCH 8/9] xfs: support in-memory btrees Darrick J. Wong
2023-05-26 1:06 ` [PATCH 9/9] xfs: connect in-memory btrees to xfiles Darrick J. Wong
2023-05-26 0:34 ` [PATCHSET v25.0 00/25] xfs: atomic file updates Darrick J. Wong
2023-05-26 1:14 ` [PATCH 01/25] xfs: add a libxfs header file for staging new ioctls Darrick J. Wong
2023-05-26 1:14 ` [PATCH 02/25] xfs: introduce new file range exchange ioctl Darrick J. Wong
2023-05-26 1:15 ` [PATCH 03/25] xfs: move inode lease breaking functions to xfs_inode.c Darrick J. Wong
2023-05-26 1:15 ` [PATCH 04/25] xfs: move xfs_iops.c declarations out of xfs_inode.h Darrick J. Wong
2023-05-26 1:15 ` [PATCH 05/25] xfs: declare xfs_file.c symbols in xfs_file.h Darrick J. Wong
2023-05-26 1:16 ` [PATCH 06/25] xfs: create a new helper to return a file's allocation unit Darrick J. Wong
2023-05-26 1:16 ` [PATCH 07/25] xfs: refactor non-power-of-two alignment checks Darrick J. Wong
2023-05-26 1:16 ` [PATCH 08/25] xfs: parameterize all the incompat log feature helpers Darrick J. Wong
2023-05-26 1:16 ` [PATCH 09/25] xfs: create a log incompat flag for atomic extent swapping Darrick J. Wong
2023-05-26 1:17 ` [PATCH 10/25] xfs: introduce a swap-extent log intent item Darrick J. Wong
2023-05-26 1:17 ` [PATCH 11/25] xfs: create deferred log items for extent swapping Darrick J. Wong
2023-05-26 1:17 ` [PATCH 12/25] xfs: enable xlog users to toggle atomic " Darrick J. Wong
2023-05-26 1:17 ` [PATCH 13/25] xfs: bind the xfs-specific extent swape code to the vfs-generic file exchange code Darrick J. Wong
2023-05-26 1:18 ` [PATCH 14/25] xfs: add error injection to test swapext recovery Darrick J. Wong
2023-05-26 1:18 ` [PATCH 15/25] xfs: port xfs_swap_extents_rmap to our new code Darrick J. Wong
2023-05-26 1:18 ` [PATCH 16/25] xfs: consolidate all of the xfs_swap_extent_forks code Darrick J. Wong
2023-05-26 1:19 ` [PATCH 17/25] xfs: port xfs_swap_extent_forks to use xfs_swapext_req Darrick J. Wong
2023-05-26 1:26 ` [PATCH 18/25] xfs: allow xfs_swap_range to use older extent swap algorithms Darrick J. Wong
2023-05-26 1:26 ` [PATCH 19/25] xfs: remove old swap extents implementation Darrick J. Wong
2023-05-26 1:27 ` [PATCH 20/25] xfs: condense extended attributes after an atomic swap Darrick J. Wong
2023-05-26 1:27 ` [PATCH 21/25] xfs: condense directories " Darrick J. Wong
2023-05-26 1:27 ` [PATCH 22/25] xfs: condense symbolic links " Darrick J. Wong
2023-05-26 1:28 ` [PATCH 23/25] xfs: make atomic extent swapping support realtime files Darrick J. Wong
2023-05-26 1:28 ` [PATCH 24/25] xfs: support non-power-of-two rtextsize with exchange-range Darrick J. Wong
2023-05-26 1:28 ` [PATCH 25/25] xfs: enable atomic swapext feature Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2023-07-27 22:19 [PATCHSET v26.0 0/7] xfs: stage repair information in pageable memory Darrick J. Wong
2023-07-27 22:25 ` [PATCH 1/7] xfs: create a big array data structure Darrick J. Wong
2023-07-28 3:10 ` Matthew Wilcox
2023-07-28 4:39 ` Darrick J. Wong
2022-12-30 22:12 [PATCHSET v24.0 0/7] xfs: stage repair information in pageable memory Darrick J. Wong
2022-12-30 22:12 ` [PATCH 1/7] xfs: create a big array data structure Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230526031917.GK11620@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox