From: Junio C Hamano <gitster@pobox.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Han-Wen Nienhuys <hanwenn@gmail.com>
Subject: Re: [PATCH 4/4] reftable/blocksource: use mmap to read tables
Date: Wed, 10 Jan 2024 13:24:24 -0800 [thread overview]
Message-ID: <xmqq5y00anlj.fsf@gitster.g> (raw)
In-Reply-To: <a23f38a80609a5c5a6931400ffd28a92b33838bb.1704714575.git.ps@pks.im> (Patrick Steinhardt's message of "Mon, 8 Jan 2024 13:18:43 +0100")
Patrick Steinhardt <ps@pks.im> writes:
> Using mmap comes with a significant drawback on Windows though, because
> mmapped files cannot be deleted and neither is it possible to rename
> files onto an mmapped file. But for one, the reftable library gracefully
> handles the case where auto-compaction cannot delete a still-open stack
> already and ignores any such errors. Also, `reftable_stack_clean()` will
> prune stale tables which are not referenced by "tables.list" anymore so
> that those files can eventually be pruned. And second, we never rewrite
> already-rewritten stacks, so it does not matter that we cannot rename a
> file over an mmaped file, either.
I somehow thought that we use "read into an allocated memory and
pretend as if we mapped" emulation on Windows, so all of that may be
moot.
> diff --git a/reftable/blocksource.c b/reftable/blocksource.c
> index a1ea304429..5d3f3d264c 100644
> --- a/reftable/blocksource.c
> +++ b/reftable/blocksource.c
> @@ -13,6 +13,12 @@ license that can be found in the LICENSE file or at
> #include "reftable-blocksource.h"
> #include "reftable-error.h"
>
> +#if defined(NO_MMAP)
> +static int use_mmap = 0;
> +#else
> +static int use_mmap = 1;
> +#endif
Is there (do you need) some externally controllable knob that the
user can use to turn this off on a system that is compiled without
NO_MMAP? Otherwise it is misleading to have this as a variable.
> -static void file_close(void *b)
> +static void file_close(void *v)
> {
> - int fd = ((struct file_block_source *)b)->fd;
> - if (fd > 0) {
> - close(fd);
> - ((struct file_block_source *)b)->fd = 0;
> + struct file_block_source *b = v;
> +
> + if (b->fd >= 0) {
> + close(b->fd);
> + b->fd = -1;
> }
>
> + if (use_mmap)
> + munmap(b->data, b->size);
> + else
> + reftable_free(b->data);
> + b->data = NULL;
> +
> reftable_free(b);
> }
It would have been nicer to do this kind of "a void pointer is taken
and we immediately cast it to a concrete and useful type upon entry"
clean-up as a separate step that is purely clean-up, if there were
many instances. It is the first such one in the series as far as I
remember, so it is not a huge deal.
> @@ -108,9 +119,7 @@ static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
> {
> struct file_block_source *b = v;
> assert(off + size <= b->size);
> - dest->data = reftable_malloc(size);
> - if (pread_in_full(b->fd, dest->data, size, off) != size)
> - return -1;
> + dest->data = b->data + off;
> dest->len = size;
> return size;
> }
So, we used to delay the allocation and reading of a block until
this function gets called. Now, by the time the control reaches
the function, we are expected to have the data handy at b->data.
That is ensured by reftable_block_source_from_file(), I presume?
> @@ -127,8 +136,10 @@ int reftable_block_source_from_file(struct reftable_block_source *bs,
> {
> struct stat st = { 0 };
> int err = 0;
> - int fd = open(name, O_RDONLY);
> + int fd;
> struct file_block_source *p = NULL;
> +
> + fd = open(name, O_RDONLY);
> if (fd < 0) {
> if (errno == ENOENT) {
> return REFTABLE_NOT_EXIST_ERROR;
This is a no-op clean-up that would have been better in a separate
clean-up step. Again, not a huge deal.
> @@ -144,7 +155,18 @@ int reftable_block_source_from_file(struct reftable_block_source *bs,
>
> p = reftable_calloc(sizeof(struct file_block_source));
> p->size = st.st_size;
> - p->fd = fd;
> + if (use_mmap) {
> + p->data = xmmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
> + p->fd = fd;
This is a bit unusual for our use of mmap() where the norm is to
close the file descriptor once we mapped (we only need the memory
region itself and not the originating file descriptor to unmap it).
Is there a reason why we need to keep p->fd? Now the other side of
this if/else preallocates the whole thing and slurps everything in
core to allow the remainder of the code to mimic what happens on the
mmaped block memory (e.g., we saw how file_read_block() assumes that
b->data[0..b->size] are valid) and does not obviously need p->fd,
can we just remove .fd member from the file_block_source structure?
That way, file_close() can also lose the conditional close() call.
For that matter, do we need any codepath in this file that is
enabled by !use_mmap? Can't we just use xmmap() and let its
"instead, we allocate, read into it and return" emulation?
Thanks.
> + } else {
> + p->data = xmalloc(st.st_size);
> + if (read_in_full(fd, p->data, st.st_size) != st.st_size) {
> + close(fd);
> + return -1;
> + }
> + close(fd);
> + p->fd = -1;
> + }
>
> assert(!bs->ops);
> bs->ops = &file_vtable;
next prev parent reply other threads:[~2024-01-10 21:24 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-08 12:18 [PATCH 0/4] reftable: optimize I/O patterns Patrick Steinhardt
2024-01-08 12:18 ` [PATCH 1/4] reftable/stack: refactor stack reloading to have common exit path Patrick Steinhardt
2024-01-10 17:30 ` Junio C Hamano
2024-01-11 7:33 ` Patrick Steinhardt
2024-01-08 12:18 ` [PATCH 2/4] reftable/stack: refactor reloading to use file descriptor Patrick Steinhardt
2024-01-10 19:57 ` Junio C Hamano
2024-01-08 12:18 ` [PATCH 3/4] reftable/stack: use stat info to avoid re-reading stack list Patrick Steinhardt
2024-01-10 20:07 ` Junio C Hamano
2024-01-11 7:41 ` Patrick Steinhardt
2024-01-08 12:18 ` [PATCH 4/4] reftable/blocksource: use mmap to read tables Patrick Steinhardt
2024-01-10 21:24 ` Junio C Hamano [this message]
2024-01-11 9:21 ` Patrick Steinhardt
2024-01-11 10:06 ` [PATCH v2 0/5] reftable: optimize I/O patterns Patrick Steinhardt
2024-01-11 10:06 ` [PATCH v2 1/5] reftable/stack: refactor stack reloading to have common exit path Patrick Steinhardt
2024-01-11 10:06 ` [PATCH v2 2/5] reftable/stack: refactor reloading to use file descriptor Patrick Steinhardt
2024-01-14 10:14 ` Jeff King
2024-01-15 10:03 ` Patrick Steinhardt
2024-01-16 15:14 ` Jeff King
2024-01-16 16:44 ` Junio C Hamano
2024-01-11 10:06 ` [PATCH v2 3/5] reftable/stack: use stat info to avoid re-reading stack list Patrick Steinhardt
2024-01-11 10:06 ` [PATCH v2 4/5] reftable/blocksource: refactor code to match our coding style Patrick Steinhardt
2024-01-11 10:06 ` [PATCH v2 5/5] reftable/blocksource: use mmap to read tables Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq5y00anlj.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=hanwenn@gmail.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).