From: Matthew Ahrens <mahrens@delphix.com>
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] Request arc buffer, zerocopy
Date: Fri, 5 Jul 2019 11:15:01 -0700 [thread overview]
Message-ID: <CAJjvXiG_NEnL1ZVTZ9dpjSsekOd_-S7KsNORcobJudXUoXUiTg@mail.gmail.com> (raw)
In-Reply-To: <1561715412.12733.3@informatik.uni-hamburg.de>
On Fri, Jun 28, 2019 at 2:50 AM Anna Fuchs <
anna.fuchs@informatik.uni-hamburg.de> wrote:
> Hello Matt,
>
> thanks for your reply.
>
> >
> > You can set the block size (of the first and only block) using
> > dmu_object_set_blocksize(). FYI, I think that this comment is
> > incorrect:
> > * If the first block is allocated already, the new size must be
> > greater
> > * than the current block size.
> >
> > You can increase or decrease the block size with this routine.
>
> This is a deeper call of Lustre's osd_grow_blocksize I mentioned
> before. If I understand it correctly, they are called in the context of
> transactions, right?
>
It is not possible to change the block size of a ZFS object outside of a
transaction.
> If so, I can not use it - I need the blocksize to be set in the buffer
> preparation stage, before comitting in a transaction.
> Lustre's original routine looks simplified as follows:
>
> osd_bufs_get_write
> bs = dn->dn_datablksz
> while (len > 0)
> if (sz_in_block == bs) /* full block, try
> zerocopy */
> abuf = osd_request_arcbuf(dn, bs);
> else /* can't use zerocopy,
> allocate temp. buffers */
> ... alloc_page ...
> here going later on the dmu_write path (pagewise!)
>
> Currently, in the very first iteration this blocksize (bs) is taken
> from the dnode and is e.g. 4K.
> When writing a chunk of 16K, I get 4 arcbufs 4K each. For the next
> chunk the block size might be grown up to x (recordsize?).
> Here I need the blocksize to be set to 16K (or 128K or later some
> generic value defined by the Lustre client) before the first arcbuf is
> requested,
> because the compressed chunk sent from client is logically this size.
> At this point I don't have any dmu_tx yet to grow the blocksize saved
> in dn->dn_datablksz before the while loop.
> So I am not sure how deep to go? This min size is set on dnode creation
> by ZFS, how can I "reset" it?
>
I think that you are saying that you want to be loaned an ARC buffer of the
size provided by the client, which may be different from the object's
current blocksize. You will later (within a transaction) change the
object's blocksize to the size specified by the client, and write the data
(using the loaned ARC buffer). You can do exactly that using the routines
I mentioned.
In your example code above, you are providing the dnode when requesting the
arc buf, which leads to the problem you described (needing that dnode's
block size to match the size provided by the client before you change the
blocksize of the object). However, this is an unnecessary restriction,
because arc_loan_compressed_buf() does not need to know the dnode, or its
current block size.
--matt
>
> >
> > I'd recommend that you hand the compressed data to ZFS similarly to
> > how "zfs receive" does (for compressed send streams). It sounds like
> > the is the direction you're going, which is great. FYI, here are
> > some of the routines you'd want to use (copied from dmu_recv.c):
> >
> > abuf = arc_loan_compressed_buf(
> >
> > dmu_objset_spa(drc->drc_os),
> >
> > drrw->drr_compressed_size,
> drrw->drr_logical_size,
> >
> > drrw->drr_compressiontype);
> >
> >
> > dmu_assign_arcbuf(bonus, drrw->drr_offset, abuf, tx);
> >
> > (or dmu_assign_arcbuf_dnode())
> >
> >
> > dmu_return_arcbuf(rrd->write_buf);
> >
>
> Yes, thanks for that. We have two paths how Lustre interacts with ZFS -
> requesting arc buffers or dmu_write.
> The common dmu_request_arcbuf goes over arc_loan_buf, so we introduced
> dmu_request_compressed_arcbuf to go over arc_loan_compressed_buf to
> reuse the receive functionality.
> We try to make as few changes as possible on Lustre's interface since
> we want mix compressed and uncompressed data chunks (and be at the same
> time compatible with ZFS' on disk format..)
> The dmu_write path will be tricky, though.
>
> Any comments are welcome.
>
> Best regards
> Anna
>
> --
>
> Anna Fuchs
> Universit?t Hamburg
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190705/da855dc7/attachment.html>
prev parent reply other threads:[~2019-07-05 18:15 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-13 11:54 [lustre-devel] Request arc buffer, zerocopy Anna Fuchs
2019-06-13 17:26 ` Andreas Dilger
2019-06-26 13:11 ` Anna Fuchs
2019-06-27 18:13 ` Matthew Ahrens
2019-06-28 9:50 ` Anna Fuchs
2019-07-05 18:15 ` Matthew Ahrens [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJjvXiG_NEnL1ZVTZ9dpjSsekOd_-S7KsNORcobJudXUoXUiTg@mail.gmail.com \
--to=mahrens@delphix.com \
--cc=lustre-devel@lists.lustre.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).