From: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Linux Trace Devel <linux-trace-devel@vger.kernel.org>
Subject: Re: [PATCH v7 01/20] trace-cmd library: Add support for compression algorithms
Date: Wed, 26 Jan 2022 06:56:13 +0200 [thread overview]
Message-ID: <CAPpZLN7EYCuAS=uU_BzS1r_G8TBcfDta3nbSFwykPFuzgbwJNA@mail.gmail.com> (raw)
In-Reply-To: <CAPpZLN4ET-wF2BcQzEb99F8LFADVDYZR6_kdo4530XC=1eJG3Q@mail.gmail.com>
On Tue, Jan 25, 2022 at 7:30 PM Tzvetomir Stoyanov
<tz.stoyanov@gmail.com> wrote:
>
> On Tue, Jan 25, 2022 at 5:08 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Tue, 25 Jan 2022 09:48:43 +0200
> > Tzvetomir Stoyanov <tz.stoyanov@gmail.com> wrote:
> >
> > > > > >
> > > > > > > + free(bytes);
> > > > > > > + handle->pointer = 0;
> > > > > > > + handle->capacity = s_uncompressed;
> > > > > >
> > > > > > If size is the greater of the two (compressed could be larger) and
> > > > > > capacity is the allocated buffer, shouldn't we make it equal to size
> > > > > > and not uncompressed?
> > > > >
> > > > > The name "capacity" in reading logic is a bit confusing, as it
> > > > > actually is used to track the size of uncompressed data. The buffer
> > > > > may be larger, but if the uncompressed is smaller - the rest is
> > > > > garbage and must not be read.
> > > >
> > > > Then the extend function is broken:
> > >
> > > For this particular use case, the logic is not correct - the realloc()
> > > could be redundant, if the real buffer size is bigger than the
> > > capacity. But as usually a compression handle is used for reading or
> > > for writing only, this is unlikely to happen. At least I cannot
> > > imagine a use case where the same compression handle will be used for
> > > reading a compress block and for writing compressed data. But even if
> > > we hit that case - there is no data corruption or memory leak, just a
> > > redundant realloc(). To solve that, reading and writing capacities
> > > could be introduced ?
> > > struct tracecmd_compression {
> > > ...
> > > unsigned int capacity_read;
> > > unsigned int capacity_write;
> > > ...
> > > }
> >
> > A redundant realloc is not an issue, in fact, they are pretty fast (it
> > detects that the requested size still fits what is allocated and just
> > returns).
> >
> > >
> > > But that change will be for a very rare case - when the compressed
> > > data size is bigger than the uncompressed (usually happens only with a
> > > very small data) + the same compression handle is used for compression
> > > and decompression (not used in that way in trace-cmd).
> > >
> > > >
> > > > static inline int buffer_extend(struct tracecmd_compression *handle, unsigned int size)
> > > > {
> > > > int extend;
> > > > char *buf;
> > > >
> > > > if (size <= handle->capacity)
> > > > return 0;
> > > >
> > > > extend = ((size - handle->capacity) / BUFSIZ + 1) * BUFSIZ;
> > > > buf = realloc(handle->buffer, handle->capacity + extend);
> > > > if (!buf)
> > > > return -1;
> > > > handle->buffer = buf;
> > > > handle->capacity += extend;
> > > >
> > > > return 0;
> > > > }
> > > >
> > > > As it sets the capacity to the allocated size, not what is stored.
> > > >
> > > > Perhaps you meant:
> > > >
> > > > handle->capacity += size;
> >
> > The above should still work, instead of doing the += extend.
> >
> > The point I was making is that capacity should either be the requested size
> > allocated, or what is actually allocated. The above function sets it to
> > what is allocated, and the other code sets it to what was requested. We
> > should pick it to be one or the other.
>
> I didn't know about that realloc optimization (check if requested size
> fits into already allocated memory). In that case, we can use the
> requested size in both places. But still allocate a page aligned
> buffer in buffer_extend(), to avoid frequent reallocs (although
> realloc will be called almost every time).
>
I did some tests, these extra realloc() calls actually cause 7-8% slow
down on writing compressed trace files. So, I decided to go with the
other approach - use capacity_read and capacity_write.
> >
> > -- Steve
>
>
>
> --
> Tzvetomir (Ceco) Stoyanov
> VMware Open Source Technology Center
--
Tzvetomir (Ceco) Stoyanov
VMware Open Source Technology Center
next prev parent reply other threads:[~2022-01-26 4:56 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-19 8:26 [PATCH v7 00/20] Trace file version 7 - compression Tzvetomir Stoyanov (VMware)
2022-01-19 8:26 ` [PATCH v7 01/20] trace-cmd library: Add support for compression algorithms Tzvetomir Stoyanov (VMware)
2022-01-23 22:48 ` Steven Rostedt
2022-01-24 4:54 ` Tzvetomir Stoyanov
2022-01-24 20:03 ` Steven Rostedt
2022-01-24 21:24 ` Steven Rostedt
2022-01-25 7:48 ` Tzvetomir Stoyanov
2022-01-25 15:08 ` Steven Rostedt
2022-01-25 17:30 ` Tzvetomir Stoyanov
2022-01-26 4:56 ` Tzvetomir Stoyanov [this message]
2022-01-26 14:16 ` Steven Rostedt
2022-01-19 8:26 ` [PATCH v7 02/20] trace-cmd library: Internal helpers for compressing data Tzvetomir Stoyanov (VMware)
2022-01-19 8:26 ` [PATCH v7 03/20] trace-cmd library: Internal helpers for uncompressing data Tzvetomir Stoyanov (VMware)
2022-01-19 8:26 ` [PATCH v7 04/20] trace-cmd library: Inherit compression algorithm from input file Tzvetomir Stoyanov (VMware)
2022-01-24 19:22 ` Steven Rostedt
2022-01-19 8:27 ` [PATCH v7 05/20] trace-cmd library: New API to configure compression on an output handler Tzvetomir Stoyanov (VMware)
2022-01-19 8:27 ` [PATCH v7 06/20] trace-cmd library: Write compression header in the trace file Tzvetomir Stoyanov (VMware)
2022-01-19 8:27 ` [PATCH v7 07/20] trace-cmd library: Compress part of " Tzvetomir Stoyanov (VMware)
2022-01-19 8:27 ` [PATCH v7 08/20] trace-cmd library: Add local helper function for data compression Tzvetomir Stoyanov (VMware)
2022-01-19 8:27 ` [PATCH v7 09/20] trace-cmd library: Compress the trace data Tzvetomir Stoyanov (VMware)
2022-01-19 8:27 ` [PATCH v7 10/20] trace-cmd library: Decompress the options section, if it is compressed Tzvetomir Stoyanov (VMware)
2022-01-19 8:27 ` [PATCH v7 11/20] trace-cmd library: Read compression header Tzvetomir Stoyanov (VMware)
2022-01-19 8:27 ` [PATCH v7 12/20] trace-cmd library: Extend the input handler with trace data decompression context Tzvetomir Stoyanov (VMware)
2022-01-19 8:27 ` [PATCH v7 13/20] trace-cmd library: Initialize CPU data decompression logic Tzvetomir Stoyanov (VMware)
2022-01-19 8:27 ` [PATCH v7 14/20] trace-cmd library: Add logic for in-memory decompression Tzvetomir Stoyanov (VMware)
2022-01-25 18:30 ` Steven Rostedt
2022-01-19 8:27 ` [PATCH v7 15/20] trace-cmd library: Read compressed latency data Tzvetomir Stoyanov (VMware)
2022-01-19 8:27 ` [PATCH v7 16/20] trace-cmd library: Decompress file sections on reading Tzvetomir Stoyanov (VMware)
2022-01-19 8:27 ` [PATCH v7 17/20] trace-cmd library: Add zlib compression algorithm Tzvetomir Stoyanov (VMware)
2022-01-19 8:27 ` [PATCH v7 18/20] trace-cmd list: Show supported compression algorithms Tzvetomir Stoyanov (VMware)
2022-01-19 8:27 ` [PATCH v7 19/20] trace-cmd record: Add compression to the trace context Tzvetomir Stoyanov (VMware)
2022-01-19 8:27 ` [PATCH v7 20/20] trace-cmd report: Add new parameter for trace file compression Tzvetomir Stoyanov (VMware)
2022-01-25 18:39 ` [PATCH v7 00/20] Trace file version 7 - compression Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPpZLN7EYCuAS=uU_BzS1r_G8TBcfDta3nbSFwykPFuzgbwJNA@mail.gmail.com' \
--to=tz.stoyanov@gmail.com \
--cc=linux-trace-devel@vger.kernel.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).