lttng-dev.lists.lttng.org archive mirror
 help / color / mirror / Atom feed
* [lttng-dev] error with ctf_sequence_text when using non-utf8 encoded strings
@ 2024-01-31 17:10 Nathan Ricci via lttng-dev
  2024-01-31 18:39 ` Philippe Proulx via lttng-dev
  0 siblings, 1 reply; 3+ messages in thread
From: Nathan Ricci via lttng-dev @ 2024-01-31 17:10 UTC (permalink / raw)
  To: lttng-dev@lists.lttng.org


[-- Attachment #1.1: Type: text/plain, Size: 5333 bytes --]

I am trying to emit a trace that has a wchar string, using as one of the fields ctf_sequence_text. When the trace is recorded, it seems like everything at but the first character is truncated, and I think this is because it is assuming UTF8 encoding and stopping at the first null character.  This is on Ubuntu 22.04, using this liblttng-ust package:

liblttng-ust-common1/jammy,now 2.13.1-1ubuntu1

I've boiled this down to a simple repro; I've included the code below, but you can also get it here: https://github.com/naricc/lttng-test

Here is the main file:
-----

#include <stdio.h>
#include <unistd.h>
#include <lttng/lttng.h>
#include <lttng/tracepoint.h>
#include <wchar.h>
#include "repro-tracepoint.h"



int main() {
    puts("Hello, World!\nPress Enter to continue...");
    getchar();

    const char* utf8_text_value = "Hello, UTF8 Sequence Text!";
    const wchar_t *wchar_text_value = L"Hello, WChar Sequence Text!";

    // Emit the tracepoint event with the sequence text field
    lttng_ust_tracepoint(naricc_test_provider, test_event, utf8_text_value, wchar_text_value);

    return 0;
}

----
Here is the tracepoint header (repro-tracepoint.h):
___

#undef TRACEPOINT_PROVIDER
#define TRACEPOINT_PROVIDER naricc_test_provider

#undef TRACEPOINT_INCLUDE
#define TRACEPOINT_INCLUDE "./repro-tracepoint.h"

#if !defined(_TP_H) || defined(TRACEPOINT_HEADER_MULTI_READ)
#define _TP_H

#include <lttng/tracepoint.h>
#include <wchar.h>

// Define the tracepoint event with a sequence text field
TRACEPOINT_EVENT(naricc_test_provider, test_event,
    TP_ARGS(
        const char*, utf8_text_value,
        const wchar_t*, wchar_text_value
    ),
    TP_FIELDS(
        ctf_sequence_text(char, utf8_text_sequence, utf8_text_value, size_t, strlen(utf8_text_value))
        ctf_sequence_text(wchar_t, wchar_text_sequence, wchar_text_value, size_t, wcslen(wchar_text_value) * 2 + 2)
    )
)

#endif /* _TP_H */

#include <lttng/tracepoint-event.h>

----
And here is the repro-tracepoint.cpp:
___

#define LTTNG_UST_TRACEPOINT_CREATE_PROBES
#define LTTNG_UST_TRACEPOINT_DEFINE

#include "repro-tracepoint.h"

---

I built it like so:


g++ -c -I. repro-tracepoint.cpp
g++ -c lttng-test.cpp
g++ -o lttng-test lttng-test.o repro-tracepoint.o -llttng-ust -ldl

naricc@TDC20748914:/workspace/lttng-test$

---

After starting a session, running that program, and destroying the session, this is what I get with babeltrace2:

```
$ babeltrace2  ~/lttng-traces/my-user-space-session-20240131-161638
[16:16:43.553211421] (+?.?????????) TDC20748914 naricc_test_provider:test_event: { cpu_id = 6 }, { _utf8_text_sequence_length = 26, utf8_text_sequence = "Hello, UTF8 Sequence Text!", _wchar_text_sequence_length = 56, wchar_text_sequence = [ [0] = 72, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0, [8] = 0, [9] = 0, [10] = 0, [11] = 0, [12] = 0, [13] = 0, [14] = 0, [15] = 0, [16] = 0, [17] = 0, [18] = 0, [19] = 0, [20] = 0, [21] = 0, [22] = 0, [23] = 0, [24] = 0, [25] = 0, [26] = 0, [27] = 0, [28] = 0, [29] = 0, [30] = 0, [31] = 0, [32] = 0, [33] = 0, [34] = 0, [35] = 0, [36] = 0, [37] = 0, [38] = 0, [39] = 0, [40] = 0, [41] = 0, [42] = 0, [43] = 0, [44] = 0, [45] = 0, [46] = 0, [47] = 0, [48] = 0, [49] = 0, [50] = 0, [51] = 0, [52] = 0, [53] = 0, [54] = 0, [55] = 0 ] }

```

The utf8 sequence prints fine, but the wchar one is truncated to a single character and then zeros.  To rule out an error in babelltrace, I inspected the channel files with hexedit and found this:

```
85 58  E1 C1 43 EE  9A 93 9C 60  6D B7 2B B0  00 00 00 00  .......X..C....`m.+.....
00000018   06 00 00 00  00 00 00 00  6A 33 DA B4  2D AD 02 00  04 78 1C 56  30 AD 02 00  ........j3..-....x.V0...
00000030   60 0B 00 00  00 00 00 00  00 80 00 00  00 00 00 00  00 00 00 00  00 00 00 00  `.......................
00000048   00 00 00 00  00 00 00 00  06 00 00 00  FF FF 00 00  00 00 3C 10  F2 E2 2E AD  ..................<.....
00000060   02 00 1A 00  00 00 00 00  00 00 48 65  6C 6C 6F 2C  20 55 54 46  38 20 53 65  ..........Hello, UTF8 Se
00000078   71 75 65 6E  63 65 20 54  65 78 74 21  38 00 00 00  00 00 00 00  48 00 00 00  quence Text!8.......H...
00000090   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ........................
```

So it seems the error in the the recording of the trace, not in the viewing.

Looking into the lttng-ust code, it seems like ctf_sequence_text ends up mapped to this:

lttng-ust/include/lttng/ust-tracepoint-event-write.h at 717c38f658248bc04ccfc6e7fdf5d03040c2a846 * lttng/lttng-ust * GitHub<https://github.com/lttng/lttng-ust/blob/717c38f658248bc04ccfc6e7fdf5d03040c2a846/include/lttng/ust-tracepoint-event-write.h#L73>

Which assumes utf8 encoding, and ultimately writes into a ring buffer terminating on null:

lttng-ust/src/common/ringbuffer/backend.h at 717c38f658248bc04ccfc6e7fdf5d03040c2a846 * lttng/lttng-ust * GitHub<https://github.com/lttng/lttng-ust/blob/717c38f658248bc04ccfc6e7fdf5d03040c2a846/src/common/ringbuffer/backend.h#L126>


If we agree this is an error, I believe I can produce a fix for it. Or if I am just using the APIs wrong, please let me know what I should do instead.

             --Nathan Ricci






[-- Attachment #1.2: Type: text/html, Size: 10511 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [lttng-dev] error with ctf_sequence_text when using non-utf8 encoded strings
  2024-01-31 17:10 [lttng-dev] error with ctf_sequence_text when using non-utf8 encoded strings Nathan Ricci via lttng-dev
@ 2024-01-31 18:39 ` Philippe Proulx via lttng-dev
  2024-01-31 19:05   ` [lttng-dev] [EXTERNAL] " Nathan Ricci via lttng-dev
  0 siblings, 1 reply; 3+ messages in thread
From: Philippe Proulx via lttng-dev @ 2024-01-31 18:39 UTC (permalink / raw)
  To: Nathan Ricci; +Cc: lttng-dev@lists.lttng.org

On Wed, Jan 31, 2024 at 12:58 PM Nathan Ricci via lttng-dev
<lttng-dev@lists.lttng.org> wrote:
>
> I am trying to emit a trace that has a wchar string, using as one of the fields ctf_sequence_text. When the trace is recorded, it seems like everything at but the first character is truncated, and I think this is because it is assuming UTF8 encoding and stopping at the first null character.  This is on Ubuntu 22.04, using this liblttng-ust package:

Hi Nathan.

I'd like to see your generated trace metadata. Here's how:

    $ babeltrace2 -o ctf-metadata /path/to/dir/containing/metadata

In your case, that path is somewhere within
`~/lttng-traces/my-user-space-session-20240131-161638`.

CTF 1.8 doesn't support "wide characters" or any other string encoding
than UTF-8 and its ASCII subset.

Therefore LTTng-UST assumes UTF-8 and doesn't care about anything past
the first U+0000 codepoint (single zero byte in UTF-8).

I don't think it's an error per se, but maybe the ctf_sequence_text()
macro could fail with an element type having a size greater than one
byte (like `wchar_t`)?

That being said, you'll be glad to learn that, although it won't help in
the short term, CTF 2 will support UTF-8-, UTF-16-, and UTF-32-encoded
string fields: <https://diamon.org/ctf/files/CTF2-SPECRC-9.0rA.html#str-fc>.

In the meantime, you could use the ctf_sequence() macro, but your string
will become a dynamic-length array of integers in the CTF data stream,
meaning you'll need to decode it manually using the Babeltrace 2 API.

I'll wait for your metadata text.

Philippe Proulx

>
> liblttng-ust-common1/jammy,now 2.13.1-1ubuntu1
>
> I’ve boiled this down to a simple repro; I’ve included the code below, but you can also get it here: https://github.com/naricc/lttng-test
>
> Here is the main file:
> -----
>
> #include <stdio.h>
>
> #include <unistd.h>
>
> #include <lttng/lttng.h>
>
> #include <lttng/tracepoint.h>
>
> #include <wchar.h>
>
> #include "repro-tracepoint.h"
>
>
>
>
>
>
>
> int main() {
>
>     puts("Hello, World!\nPress Enter to continue...");
>
>     getchar();
>
>
>
>     const char* utf8_text_value = "Hello, UTF8 Sequence Text!";
>
>     const wchar_t *wchar_text_value = L"Hello, WChar Sequence Text!";
>
>
>
>     // Emit the tracepoint event with the sequence text field
>
>     lttng_ust_tracepoint(naricc_test_provider, test_event, utf8_text_value, wchar_text_value);
>
>
>
>     return 0;
>
> }
>
>
>
> ----
>
> Here is the tracepoint header (repro-tracepoint.h):
> ___
>
> #undef TRACEPOINT_PROVIDER
>
> #define TRACEPOINT_PROVIDER naricc_test_provider
>
>
>
> #undef TRACEPOINT_INCLUDE
>
> #define TRACEPOINT_INCLUDE "./repro-tracepoint.h"
>
>
>
> #if !defined(_TP_H) || defined(TRACEPOINT_HEADER_MULTI_READ)
>
> #define _TP_H
>
>
>
> #include <lttng/tracepoint.h>
>
> #include <wchar.h>
>
>
>
> // Define the tracepoint event with a sequence text field
>
> TRACEPOINT_EVENT(naricc_test_provider, test_event,
>
>     TP_ARGS(
>
>         const char*, utf8_text_value,
>
>         const wchar_t*, wchar_text_value
>
>     ),
>
>     TP_FIELDS(
>
>         ctf_sequence_text(char, utf8_text_sequence, utf8_text_value, size_t, strlen(utf8_text_value))
>
>         ctf_sequence_text(wchar_t, wchar_text_sequence, wchar_text_value, size_t, wcslen(wchar_text_value) * 2 + 2)
>
>     )
>
> )
>
>
>
> #endif /* _TP_H */
>
>
>
> #include <lttng/tracepoint-event.h>
>
> ----
> And here is the repro-tracepoint.cpp:
> ___
>
> #define LTTNG_UST_TRACEPOINT_CREATE_PROBES
>
> #define LTTNG_UST_TRACEPOINT_DEFINE
>
>
>
> #include "repro-tracepoint.h"
>
> ---
>
> I built it like so:
>
>
> g++ -c -I. repro-tracepoint.cpp
>
> g++ -c lttng-test.cpp
>
> g++ -o lttng-test lttng-test.o repro-tracepoint.o -llttng-ust -ldl
>
> naricc@TDC20748914:/workspace/lttng-test$
>
> ---
>
> After starting a session, running that program, and destroying the session, this is what I get with babeltrace2:
>
> ```
> $ babeltrace2  ~/lttng-traces/my-user-space-session-20240131-161638
> [16:16:43.553211421] (+?.?????????) TDC20748914 naricc_test_provider:test_event: { cpu_id = 6 }, { _utf8_text_sequence_length = 26, utf8_text_sequence = "Hello, UTF8 Sequence Text!", _wchar_text_sequence_length = 56, wchar_text_sequence = [ [0] = 72, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0, [8] = 0, [9] = 0, [10] = 0, [11] = 0, [12] = 0, [13] = 0, [14] = 0, [15] = 0, [16] = 0, [17] = 0, [18] = 0, [19] = 0, [20] = 0, [21] = 0, [22] = 0, [23] = 0, [24] = 0, [25] = 0, [26] = 0, [27] = 0, [28] = 0, [29] = 0, [30] = 0, [31] = 0, [32] = 0, [33] = 0, [34] = 0, [35] = 0, [36] = 0, [37] = 0, [38] = 0, [39] = 0, [40] = 0, [41] = 0, [42] = 0, [43] = 0, [44] = 0, [45] = 0, [46] = 0, [47] = 0, [48] = 0, [49] = 0, [50] = 0, [51] = 0, [52] = 0, [53] = 0, [54] = 0, [55] = 0 ] }
>
> ```
>
> The utf8 sequence prints fine, but the wchar one is truncated to a single character and then zeros.  To rule out an error in babelltrace, I inspected the channel files with hexedit and found this:
>
> ```
> 85 58  E1 C1 43 EE  9A 93 9C 60  6D B7 2B B0  00 00 00 00  .......X..C....`m.+.....
> 00000018   06 00 00 00  00 00 00 00  6A 33 DA B4  2D AD 02 00  04 78 1C 56  30 AD 02 00  ........j3..-....x.V0...
> 00000030   60 0B 00 00  00 00 00 00  00 80 00 00  00 00 00 00  00 00 00 00  00 00 00 00  `.......................
> 00000048   00 00 00 00  00 00 00 00  06 00 00 00  FF FF 00 00  00 00 3C 10  F2 E2 2E AD  ..................<.....
> 00000060   02 00 1A 00  00 00 00 00  00 00 48 65  6C 6C 6F 2C  20 55 54 46  38 20 53 65  ..........Hello, UTF8 Se
> 00000078   71 75 65 6E  63 65 20 54  65 78 74 21  38 00 00 00  00 00 00 00  48 00 00 00  quence Text!8.......H...
> 00000090   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ........................
>
> ```
>
> So it seems the error in the the recording of the trace, not in the viewing.
>
> Looking into the lttng-ust code, it seems like ctf_sequence_text ends up mapped to this:
>
> lttng-ust/include/lttng/ust-tracepoint-event-write.h at 717c38f658248bc04ccfc6e7fdf5d03040c2a846 · lttng/lttng-ust · GitHub
>
> Which assumes utf8 encoding, and ultimately writes into a ring buffer terminating on null:
>
> lttng-ust/src/common/ringbuffer/backend.h at 717c38f658248bc04ccfc6e7fdf5d03040c2a846 · lttng/lttng-ust · GitHub
>
>
> If we agree this is an error, I believe I can produce a fix for it. Or if I am just using the APIs wrong, please let me know what I should do instead.
>
>              --Nathan Ricci
>
>
>
>
>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [lttng-dev] [EXTERNAL] Re: error with ctf_sequence_text when using non-utf8 encoded strings
  2024-01-31 18:39 ` Philippe Proulx via lttng-dev
@ 2024-01-31 19:05   ` Nathan Ricci via lttng-dev
  0 siblings, 0 replies; 3+ messages in thread
From: Nathan Ricci via lttng-dev @ 2024-01-31 19:05 UTC (permalink / raw)
  To: Philippe Proulx; +Cc: lttng-dev@lists.lttng.org

Phillipe,

        Running that babeltrace command gives an error:

# babeltrace2 -o ctf-metadata /home/naricc/lttng-traces/my-user-space-session-20240131-161638/
01-31 18:59:17.715 1166285 1166285 E PLUGIN/SRC.CTF.FS/QUERY metadata_info_query@query.c:111 [fs] Cannot open trace metadata: path="/home/naricc/lttng-traces/my-user-space-session-20240131-161638/".
01-31 18:59:17.715 1166285 1166285 W LIB/QUERY-EXECUTOR bt_query_executor_query@query-executor.c:243 Component class's "query" method failed: query-exec-addr=0x557f47f2fb20, cc-addr=0x557f47f3e300, cc-type=SOURCE, cc-name="fs", cc-partial-descr="Read CTF traces from the file sy", cc-is-frozen=0, cc-so-handle-addr=0x557f47f3d430, cc-so-handle-path="/usr/lib/x86_64-linux-gnu/babeltrace2/plugins/babeltrace-plugin-ctf.so", object="metadata-info", params-addr=0x557f47f2e9d0, params-type=MAP, params-element-count=1, log-level=WARNING
01-31 18:59:17.715 1166285 1166285 E CLI cmd_print_ctf_metadata@babeltrace2.c:1083 Failed to query `metadata-info` object: unknown error

I was able to just read the meta data file with cat though:

/home/naricc/lttng-traces/my-user-space-session-20240131-161638/ust/uid/1002/64-bit# cat metadata
W�u��X��C�`m�+�0c/* CTF 1.8 */

typealias integer { size = 8; align = 8; signed = false; } := uint8_t;
typealias integer { size = 16; align = 8; signed = false; } := uint16_t;
typealias integer { size = 32; align = 8; signed = false; } := uint32_t;
typealias integer { size = 64; align = 8; signed = false; } := uint64_t;
typealias integer { size = 64; align = 8; signed = false; } := unsigned long;
typealias integer { size = 5; align = 1; signed = false; } := uint5_t;
typealias integer { size = 27; align = 1; signed = false; } := uint27_t;

trace {
        major = 1;
        minor = 8;
        uuid = "17bd8558-e1c1-43ee-9a93-9c606db72bb0";
        byte_order = le;
        packet.header := struct {
                uint32_t magic;
                uint8_t  uuid[16];
                uint32_t stream_id;
                uint64_t stream_instance_id;
        };
};

env {
        domain = "ust";
        tracer_name = "lttng-ust";
        tracer_major = 2;
        tracer_minor = 13;
        tracer_buffering_scheme = "uid";
        tracer_buffering_id = 1002;
        architecture_bit_width = 64;
        trace_name = "my-user-space-session";
        trace_creation_datetime = "20240131T161638+0000";
        hostname = "TDC20748914";
};

clock {
        name = "monotonic";
        uuid = "ece735c7-26b7-446c-b6fe-1edefe5c7062";
        description = "Monotonic Clock";
        freq = 1000000000; /* Frequency, in Hz */
        /* clock value offset from Epoch is: offset * (1/freq) */
        offset = 1705964436712174561;
};

typealias integer {
        size = 27; align = 1; signed = false;
        map = clock.monotonic.value;
} := uint27_clock_monotonic_t;

typealias integer {
        size = 32; align = 8; signed = false;
        map = clock.monotonic.value;
} := uint32_clock_monotonic_t;

typealias integer {
        size = 64; align = 8; signed = false;
        map = clock.monotonic.value;
} := uint64_clock_monotonic_t;

struct packet_context {
        uint64_clock_monotonic_t timestamp_begin;
        uint64_clock_monotonic_t timestamp_end;
        uint64_t content_size;
        uint64_t packet_size;
        uint64_t packet_seq_num;
        unsigned long events_discarded;
        uint32_t cpu_id;
};

struct event_header_compact {
        enum : uint5_t { compact = 0 ... 30, extended = 31 } id;
        variant <id> {
                struct {
                        uint27_clock_monotonic_t timestamp;
                } compact;
                struct {
                        uint32_t id;
                        uint64_clock_monotonic_t timestamp;
                } extended;
        } v;
} align(8);

struct event_header_large {
        enum : uint16_t { compact = 0 ... 65534, extended = 65535 } id;
        variant <id> {
                struct {
                        uint32_clock_monotonic_t timestamp;
                } compact;
                struct {
                        uint32_t id;
                        uint64_clock_monotonic_t timestamp;
                } extended;
        } v;
} align(8);

event {
        name = "naricc_test_provider:test_event";
        id = 0;
        stream_id = 0;
        loglevel = 13;
        fields := struct {
                integer { size = 64; align = 8; signed = 0; encoding = none; base = 10; } __utf8_text_sequence_length;
                integer { size = 8; align = 8; signed = 1; encoding = UTF8; base = 10; } _utf8_text_sequence[ __utf8_text_sequence_length ];
                integer { size = 64; align = 8; signed = 0; encoding = none; base = 10; } __wchar_text_sequence_length;
                integer { size = 32; align = 8; signed = 1; encoding = UTF8; base = 10; } _wchar_text_sequence[ __wchar_text_sequence_length ];
        };
};

stream {
        id = 0;
        event.header := struct event_header_large;
        packet.context := struct packet_context;
};


I think ctf_sequence will work as a work around, although it will make it somewhat more annoying for the program eventually reading these traces.

              --Nathan Ricci

-----Original Message-----
From: Philippe Proulx <eeppeliteloop@gmail.com>
Sent: Wednesday, January 31, 2024 1:39 PM
To: Nathan Ricci <naricc@microsoft.com>
Cc: lttng-dev@lists.lttng.org
Subject: [EXTERNAL] Re: [lttng-dev] error with ctf_sequence_text when using non-utf8 encoded strings

[You don't often get email from eeppeliteloop@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

On Wed, Jan 31, 2024 at 12:58 PM Nathan Ricci via lttng-dev <lttng-dev@lists.lttng.org> wrote:
>
> I am trying to emit a trace that has a wchar string, using as one of the fields ctf_sequence_text. When the trace is recorded, it seems like everything at but the first character is truncated, and I think this is because it is assuming UTF8 encoding and stopping at the first null character.  This is on Ubuntu 22.04, using this liblttng-ust package:

Hi Nathan.

I'd like to see your generated trace metadata. Here's how:

    $ babeltrace2 -o ctf-metadata /path/to/dir/containing/metadata

In your case, that path is somewhere within `~/lttng-traces/my-user-space-session-20240131-161638`.

CTF 1.8 doesn't support "wide characters" or any other string encoding than UTF-8 and its ASCII subset.

Therefore LTTng-UST assumes UTF-8 and doesn't care about anything past the first U+0000 codepoint (single zero byte in UTF-8).

I don't think it's an error per se, but maybe the ctf_sequence_text() macro could fail with an element type having a size greater than one byte (like `wchar_t`)?

That being said, you'll be glad to learn that, although it won't help in the short term, CTF 2 will support UTF-8-, UTF-16-, and UTF-32-encoded string fields: <https://diamon.org/ctf/files/CTF2-SPECRC-9.0rA.html#str-fc>.

In the meantime, you could use the ctf_sequence() macro, but your string will become a dynamic-length array of integers in the CTF data stream, meaning you'll need to decode it manually using the Babeltrace 2 API.

I'll wait for your metadata text.

Philippe Proulx

>
> liblttng-ust-common1/jammy,now 2.13.1-1ubuntu1
>
> I’ve boiled this down to a simple repro; I’ve included the code below,
> but you can also get it here:
> https://gith/
> ub.com%2Fnaricc%2Flttng-test&data=05%7C02%7Cnaricc%40microsoft.com%7C5
> 72ac60916b6455e566108dc228c0470%7C72f988bf86f141af91ab2d7cd011db47%7C1
> %7C0%7C638423233009199050%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
> LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C60000%7C%7C%7C&sdata=
> jugmTDeB4upGRIKotLuSFqfbIEwlbrp0z96GON2r%2BcI%3D&reserved=0
>
> Here is the main file:
> -----
>
> #include <stdio.h>
>
> #include <unistd.h>
>
> #include <lttng/lttng.h>
>
> #include <lttng/tracepoint.h>
>
> #include <wchar.h>
>
> #include "repro-tracepoint.h"
>
>
>
>
>
>
>
> int main() {
>
>     puts("Hello, World!\nPress Enter to continue...");
>
>     getchar();
>
>
>
>     const char* utf8_text_value = "Hello, UTF8 Sequence Text!";
>
>     const wchar_t *wchar_text_value = L"Hello, WChar Sequence Text!";
>
>
>
>     // Emit the tracepoint event with the sequence text field
>
>     lttng_ust_tracepoint(naricc_test_provider, test_event,
> utf8_text_value, wchar_text_value);
>
>
>
>     return 0;
>
> }
>
>
>
> ----
>
> Here is the tracepoint header (repro-tracepoint.h):
> ___
>
> #undef TRACEPOINT_PROVIDER
>
> #define TRACEPOINT_PROVIDER naricc_test_provider
>
>
>
> #undef TRACEPOINT_INCLUDE
>
> #define TRACEPOINT_INCLUDE "./repro-tracepoint.h"
>
>
>
> #if !defined(_TP_H) || defined(TRACEPOINT_HEADER_MULTI_READ)
>
> #define _TP_H
>
>
>
> #include <lttng/tracepoint.h>
>
> #include <wchar.h>
>
>
>
> // Define the tracepoint event with a sequence text field
>
> TRACEPOINT_EVENT(naricc_test_provider, test_event,
>
>     TP_ARGS(
>
>         const char*, utf8_text_value,
>
>         const wchar_t*, wchar_text_value
>
>     ),
>
>     TP_FIELDS(
>
>         ctf_sequence_text(char, utf8_text_sequence, utf8_text_value,
> size_t, strlen(utf8_text_value))
>
>         ctf_sequence_text(wchar_t, wchar_text_sequence,
> wchar_text_value, size_t, wcslen(wchar_text_value) * 2 + 2)
>
>     )
>
> )
>
>
>
> #endif /* _TP_H */
>
>
>
> #include <lttng/tracepoint-event.h>
>
> ----
> And here is the repro-tracepoint.cpp:
> ___
>
> #define LTTNG_UST_TRACEPOINT_CREATE_PROBES
>
> #define LTTNG_UST_TRACEPOINT_DEFINE
>
>
>
> #include "repro-tracepoint.h"
>
> ---
>
> I built it like so:
>
>
> g++ -c -I. repro-tracepoint.cpp
>
> g++ -c lttng-test.cpp
>
> g++ -o lttng-test lttng-test.o repro-tracepoint.o -llttng-ust -ldl
>
> naricc@TDC20748914:/workspace/lttng-test$
>
> ---
>
> After starting a session, running that program, and destroying the session, this is what I get with babeltrace2:
>
> ```
> $ babeltrace2  ~/lttng-traces/my-user-space-session-20240131-161638
> [16:16:43.553211421] (+?.?????????) TDC20748914
> naricc_test_provider:test_event: { cpu_id = 6 }, {
> _utf8_text_sequence_length = 26, utf8_text_sequence = "Hello, UTF8
> Sequence Text!", _wchar_text_sequence_length = 56, wchar_text_sequence
> = [ [0] = 72, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0,
> [7] = 0, [8] = 0, [9] = 0, [10] = 0, [11] = 0, [12] = 0, [13] = 0,
> [14] = 0, [15] = 0, [16] = 0, [17] = 0, [18] = 0, [19] = 0, [20] = 0,
> [21] = 0, [22] = 0, [23] = 0, [24] = 0, [25] = 0, [26] = 0, [27] = 0,
> [28] = 0, [29] = 0, [30] = 0, [31] = 0, [32] = 0, [33] = 0, [34] = 0,
> [35] = 0, [36] = 0, [37] = 0, [38] = 0, [39] = 0, [40] = 0, [41] = 0,
> [42] = 0, [43] = 0, [44] = 0, [45] = 0, [46] = 0, [47] = 0, [48] = 0,
> [49] = 0, [50] = 0, [51] = 0, [52] = 0, [53] = 0, [54] = 0, [55] = 0 ]
> }
>
> ```
>
> The utf8 sequence prints fine, but the wchar one is truncated to a single character and then zeros.  To rule out an error in babelltrace, I inspected the channel files with hexedit and found this:
>
> ```
> 85 58  E1 C1 43 EE  9A 93 9C 60  6D B7 2B B0  00 00 00 00  .......X..C....`m.+.....
> 00000018   06 00 00 00  00 00 00 00  6A 33 DA B4  2D AD 02 00  04 78 1C 56  30 AD 02 00  ........j3..-....x.V0...
> 00000030   60 0B 00 00  00 00 00 00  00 80 00 00  00 00 00 00  00 00 00 00  00 00 00 00  `.......................
> 00000048   00 00 00 00  00 00 00 00  06 00 00 00  FF FF 00 00  00 00 3C 10  F2 E2 2E AD  ..................<.....
> 00000060   02 00 1A 00  00 00 00 00  00 00 48 65  6C 6C 6F 2C  20 55 54 46  38 20 53 65  ..........Hello, UTF8 Se
> 00000078   71 75 65 6E  63 65 20 54  65 78 74 21  38 00 00 00  00 00 00 00  48 00 00 00  quence Text!8.......H...
> 00000090   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ........................
>
> ```
>
> So it seems the error in the the recording of the trace, not in the viewing.
>
> Looking into the lttng-ust code, it seems like ctf_sequence_text ends up mapped to this:
>
> lttng-ust/include/lttng/ust-tracepoint-event-write.h at
> 717c38f658248bc04ccfc6e7fdf5d03040c2a846 · lttng/lttng-ust · GitHub
>
> Which assumes utf8 encoding, and ultimately writes into a ring buffer terminating on null:
>
> lttng-ust/src/common/ringbuffer/backend.h at
> 717c38f658248bc04ccfc6e7fdf5d03040c2a846 · lttng/lttng-ust · GitHub
>
>
> If we agree this is an error, I believe I can produce a fix for it. Or if I am just using the APIs wrong, please let me know what I should do instead.
>
>              --Nathan Ricci
>
>
>
>
>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://list/
> s.lttng.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Flttng-dev&data=05%7C02%7C
> naricc%40microsoft.com%7C572ac60916b6455e566108dc228c0470%7C72f988bf86
> f141af91ab2d7cd011db47%7C1%7C0%7C638423233009205474%7CUnknown%7CTWFpbG
> Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%
> 3D%7C60000%7C%7C%7C&sdata=jldEmpyCh1YPMFL8Le359320rbfRfRFw5SJwREbXkm0%
> 3D&reserved=0
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-01-31 19:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-31 17:10 [lttng-dev] error with ctf_sequence_text when using non-utf8 encoded strings Nathan Ricci via lttng-dev
2024-01-31 18:39 ` Philippe Proulx via lttng-dev
2024-01-31 19:05   ` [lttng-dev] [EXTERNAL] " Nathan Ricci via lttng-dev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).