[diamon-discuss] Common Trace Format 1.9 planning

All of lore.kernel.org
 help / color / mirror / Atom feed

* [diamon-discuss] Common Trace Format 1.9 planning
       [not found] <470558667.103090.1427308587048.JavaMail.zimbra@efficios.com>
@ 2015-03-25 18:39 ` Mathieu Desnoyers
  2015-03-25 19:09   ` Matthew Khouzam
  0 siblings, 1 reply; 8+ messages in thread
From: Mathieu Desnoyers @ 2015-03-25 18:39 UTC (permalink / raw)
  To: diamon-discuss

Hi,

As discussed on the workgroup call today, I am posting
the list of changes I have gathered for CTF 1.9. Feedback
is welcome!

* Add new set of features to Common Trace Format and to the Babeltrace reference implementation.

1. Handle transition from CTF 1.8 to 1.9, including compatibility and upgrade path for users.

We must take great care when extending the Common Trace Format specification in non-fully backward compatible way, so that users do not suffer from this transition. We also want to minimize the frequency at which we do those non-compatible changes, by bundling those changes all within one release, and by ensuring, for the future, that the format can be extended with "features" that are simply ignored by a parser implementation that does not know about them.

Since the description format contains version numbering, we can keep CTF 1.8 and CTF 1.9 parsers side-by-side in the trace reader, so users gathering CTF 1.8 traces can use the new tools to view them. Only users generating new trace format (CTF 1.9) will need the new tools to view them.

2. Add base address and symbol information support.

The Perf developers expressed strong interest for add base address and symbol information support within the Common Trace Format at the Trace Summit 2014 [3]. This requirement seems to be interesting for most of the community.

In the spirit of keeping the Common Trace Format dedicated as much as possible to describe the layout of the binary trace, we intend to keep the base address and symbol information separate from the CTF metadata file. This can be achieved by creating a stream that contains the following object load and unload events:

[ object load, timestamp, address space, address, mapping length, object path ]
[ object unload, timestamp, address space, address ]

This should allow describing load/unload events for the kernel, and for each user-space process.

Initially, we can introduce those events into LTTng and Perf-to-CTF in the same way through discussion.

3. Add event versioning.

Experience with the Linux kernel Tracepoints shows us that relying on event and field names is not sufficient for binding a trace analysis on the semantic of the kernel or application. Sometimes, changes in the software keep the same event and field name, but change their semantic.

A good example of such a change is the wakeup delivery within the Linux kernel which has moved from the context of the thread performing the wake up to an inter-processor interrupt performing the wake-up on a remote CPU. This is an issue for critical path analysis, and it needs to know about this semantic change to the PID field, although the event and field names are unchanged.

Therefore, introduce an optional versioning property for the event, which allow trace analysis and models to track which semantic it is tracking.

4. Add field name prefix '$' to eliminate conflicts with reserved keywords.

Currently, the Common Trace Format metadata uses a '_' prefix to mitigate conflicts with reserved keywords. However, it was a bad idea, because '_' is a legitimate character part of the identifiers. This creates some confusion in the cases where an event name is indeed
prefixed by '_' already: the underscore is either removed when it should not be removed, or '__' is then necessary.

Fix this confusion by introducing the '$' optional prefix for identifiers.

5. Add attributes for type and field reference.

Turn the integer and floating point values description into attributes attached to the type:

integer ( attribute-list )
floating_point ( attribute-list )

This will allow using the same concept of attributes on compound types, e.g.:

struct { ... } ( attribute-list )
variant <tag> { ... } ( attribute-list )
int32_t [10] ( attribute-list )

attribute-list is a comma-separated list of:

  identifier = expression

For instance, in the case of event pretty-printing, this can be used as:

  format = "content of format string"

Where the content of the format string follows what has been proposed in the report on layout description.

6. Add pretty-printing "hints", specified outside of CTF, within either another specification, or an appendix. This will allow splitting data layout (CTF) from formatting.

The goal here is to allow the tracer implementation (e.g. LTTng-UST) to specify additional attributes attached to a CTF type. The two attributes we will implement in LTTng-UST and Babeltrace are:

  * format = "{field1}: {field3:02x} ({event.context.allo})"

The format string formatting would follow the Python format string syntax [4]. The choice of this syntax over printf-alike is due to the ability to refer to specific fields by name within the format string, and thus using just a string as an attribute (rather than a function call which would have a large impact on the CTF TSDL grammar). The Python format strings are very similar to its printf counterpart: they share a very similar format specification.

A few examples of printf vs Python format string syntax similarity:

Event payload layout:

struct {
        int myint;
        string mystring;
        double mydouble;
};

Printf: printf("abc %05d %s %g", myint, mystring, mydouble)
Python: "abc {:05} {} {}"

Printf: printf("%s: %x", mystring, myint)
Python: "{mystring}: {myint:x}"

  * printers = [ "hexdump", "..." ]

The tracer and trace viewer implementations can agree on a "printers" attribute, which describes pretty-printing plugins that shall be used to render the type, if such plugins are available.

This includes implementing in the Babeltrace reference implementation
support for such pretty-printing plugins, which can register to
Babeltrace, and express which type or types they expect as input.

A reference implementation of plugin will be implemented, which prints a
sequence of bytes as:

[...]
000015F0 5F 73 70 72 69 6E 74 66 5F 63 68 6B 00 5F 5F 78 _sprintf_chk.__x
00001600 73 74 61 74 00 6D 65 6D 6D 6F 76 65 00 5F 6F 62 stat.memmove._ob
00001610 73 74 61 63 6B 5F 62 65 67 69 6E 00 62 69 6E 64 stack_begin.bind
[...]

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [diamon-discuss] Common Trace Format 1.9 planning
  2015-03-25 18:39 ` [diamon-discuss] Common Trace Format 1.9 planning Mathieu Desnoyers
@ 2015-03-25 19:09   ` Matthew Khouzam
  2015-03-25 19:22     ` Mathieu Desnoyers
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Khouzam @ 2015-03-25 19:09 UTC (permalink / raw)
  To: diamon-discuss

This is great, so v1.9 is a superset of 1.8.

Would hints for time format be interesting or overspecialized?

Would hardware description in the metadata in a standardized form be a
good thing to have?

Would including a state system specification in the tsdl be an
interesting thing to do? Basically hints on how to process the events.

Thanks!
 
Matthew

On 15-03-25 02:39 PM, Mathieu Desnoyers wrote:
> Hi,
>
> As discussed on the workgroup call today, I am posting
> the list of changes I have gathered for CTF 1.9. Feedback
> is welcome!
>
> * Add new set of features to Common Trace Format and to the Babeltrace reference implementation.
>
>
> 1. Handle transition from CTF 1.8 to 1.9, including compatibility and upgrade path for users.
>
> We must take great care when extending the Common Trace Format specification in non-fully backward compatible way, so that users do not suffer from this transition. We also want to minimize the frequency at which we do those non-compatible changes, by bundling those changes all within one release, and by ensuring, for the future, that the format can be extended with "features" that are simply ignored by a parser implementation that does not know about them.
>
> Since the description format contains version numbering, we can keep CTF 1.8 and CTF 1.9 parsers side-by-side in the trace reader, so users gathering CTF 1.8 traces can use the new tools to view them. Only users generating new trace format (CTF 1.9) will need the new tools to view them.
>
>
> 2. Add base address and symbol information support.
>
> The Perf developers expressed strong interest for add base address and symbol information support within the Common Trace Format at the Trace Summit 2014 [3]. This requirement seems to be interesting for most of the community.
>
> In the spirit of keeping the Common Trace Format dedicated as much as possible to describe the layout of the binary trace, we intend to keep the base address and symbol information separate from the CTF metadata file. This can be achieved by creating a stream that contains the following object load and unload events:
>
> [ object load, timestamp, address space, address, mapping length, object path ]
> [ object unload, timestamp, address space, address ]
>
> This should allow describing load/unload events for the kernel, and for each user-space process.
>
> Initially, we can introduce those events into LTTng and Perf-to-CTF in the same way through discussion.
>
>
> 3. Add event versioning.
>
> Experience with the Linux kernel Tracepoints shows us that relying on event and field names is not sufficient for binding a trace analysis on the semantic of the kernel or application. Sometimes, changes in the software keep the same event and field name, but change their semantic.
>
> A good example of such a change is the wakeup delivery within the Linux kernel which has moved from the context of the thread performing the wake up to an inter-processor interrupt performing the wake-up on a remote CPU. This is an issue for critical path analysis, and it needs to know about this semantic change to the PID field, although the event and field names are unchanged.
>
> Therefore, introduce an optional versioning property for the event, which allow trace analysis and models to track which semantic it is tracking.
>
>
> 4. Add field name prefix '$' to eliminate conflicts with reserved keywords.
>
> Currently, the Common Trace Format metadata uses a '_' prefix to mitigate conflicts with reserved keywords. However, it was a bad idea, because '_' is a legitimate character part of the identifiers. This creates some confusion in the cases where an event name is indeed
> prefixed by '_' already: the underscore is either removed when it should not be removed, or '__' is then necessary.
>
> Fix this confusion by introducing the '$' optional prefix for identifiers.
>  
$context actually looks better than _context
>
> 5. Add attributes for type and field reference.
>
> Turn the integer and floating point values description into attributes attached to the type:
>
> integer ( attribute-list )
> floating_point ( attribute-list )
>
> This will allow using the same concept of attributes on compound types, e.g.:
>
> struct { ... } ( attribute-list )
> variant <tag> { ... } ( attribute-list )
> int32_t [10] ( attribute-list )
>
> attribute-list is a comma-separated list of:
>
>   identifier = expression
>
> For instance, in the case of event pretty-printing, this can be used as:
>
>   format = "content of format string"
>
> Where the content of the format string follows what has been proposed in the report on layout description.
>
>
> 6. Add pretty-printing "hints", specified outside of CTF, within either another specification, or an appendix. This will allow splitting data layout (CTF) from formatting.
>
> The goal here is to allow the tracer implementation (e.g. LTTng-UST) to specify additional attributes attached to a CTF type. The two attributes we will implement in LTTng-UST and Babeltrace are:
>
>   * format = "{field1}: {field3:02x} ({event.context.allo})"
>
> The format string formatting would follow the Python format string syntax [4]. The choice of this syntax over printf-alike is due to the ability to refer to specific fields by name within the format string, and thus using just a string as an attribute (rather than a function call which would have a large impact on the CTF TSDL grammar). The Python format strings are very similar to its printf counterpart: they share a very similar format specification.
>
> A few examples of printf vs Python format string syntax similarity:
>
> Event payload layout:
>
> struct {
>         int myint;
>         string mystring;
>         double mydouble;
> };
>
> Printf: printf("abc %05d %s %g", myint, mystring, mydouble)
> Python: "abc {:05} {} {}"
>
> Printf: printf("%s: %x", mystring, myint)
> Python: "{mystring}: {myint:x}"
>
>
>   * printers = [ "hexdump", "..." ]
>
> The tracer and trace viewer implementations can agree on a "printers" attribute, which describes pretty-printing plugins that shall be used to render the type, if such plugins are available.
>
> This includes implementing in the Babeltrace reference implementation
> support for such pretty-printing plugins, which can register to
> Babeltrace, and express which type or types they expect as input.
>
> A reference implementation of plugin will be implemented, which prints a
> sequence of bytes as:
>
> [...]
> 000015F0 5F 73 70 72 69 6E 74 66 5F 63 68 6B 00 5F 5F 78 _sprintf_chk.__x
> 00001600 73 74 61 74 00 6D 65 6D 6D 6F 76 65 00 5F 6F 62 stat.memmove._ob
> 00001610 73 74 61 63 6B 5F 62 65 67 69 6E 00 62 69 6E 64 stack_begin.bind
> [...]
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [diamon-discuss] Common Trace Format 1.9 planning
  2015-03-25 19:09   ` Matthew Khouzam
@ 2015-03-25 19:22     ` Mathieu Desnoyers
  2015-03-25 19:40       ` Simon Marchi
  0 siblings, 1 reply; 8+ messages in thread
From: Mathieu Desnoyers @ 2015-03-25 19:22 UTC (permalink / raw)
  To: Matthew Khouzam; +Cc: diamon-discuss

----- Original Message -----
> This is great, so v1.9 is a superset of 1.8.

No. A superset would be a completely backward compatible 1.9.
This is not the case here. We plan to do incompatible changes
within 1.9, hence the 1.8 and 1.9 parsers needed. Since the
version is self-described, it should not be an issue to detect
the CTF version from the trace metadata.

> 
> Would hints for time format be interesting or overspecialized?

I would keep that for presentation, either an appendix to CTF or
a separate spec.

After much discussion, we came to the conclusion that keeping CTF
as a binary trace layout description language, and sticking to just
that, seems to be a good way to keep the CTF specification minimal.

This can be seen as splitting presentation from a web page content,
where the presentation is in CSS, and the content in HTML.

We can then optionally add "attributes" associated to types within the
CTF metadata, but the semantic of those attributes would not be
defined within CTF. If a trace reader does not understand or care about
those attributes, they should simply be ignored.

> 
> Would hardware description in the metadata in a standardized form be a
> good thing to have?

This is not needed to describe the binary trace layout, so this
should be covered by another spec, or by a CTF appendix.

> 
> Would including a state system specification in the tsdl be an
> interesting thing to do? Basically hints on how to process the events.

We could have a separate spec for this. We could then map those
descriptions to the events. Again, this is not needed to decode
the binary trace.

The basic idea is that whatever is specified by CTF needs to be
fully implemented by _all_ CTF readers, and completely covered,
hence my intent to keep the optional stuff outside of this core
spec.

Thoughts ?

Thanks,

Mathieu

> 
> Thanks!
>  
> Matthew
> 
> On 15-03-25 02:39 PM, Mathieu Desnoyers wrote:
> > Hi,
> >
> > As discussed on the workgroup call today, I am posting
> > the list of changes I have gathered for CTF 1.9. Feedback
> > is welcome!
> >
> > * Add new set of features to Common Trace Format and to the Babeltrace
> > reference implementation.
> >
> >
> > 1. Handle transition from CTF 1.8 to 1.9, including compatibility and
> > upgrade path for users.
> >
> > We must take great care when extending the Common Trace Format
> > specification in non-fully backward compatible way, so that users do not
> > suffer from this transition. We also want to minimize the frequency at
> > which we do those non-compatible changes, by bundling those changes all
> > within one release, and by ensuring, for the future, that the format can
> > be extended with "features" that are simply ignored by a parser
> > implementation that does not know about them.
> >
> > Since the description format contains version numbering, we can keep CTF
> > 1.8 and CTF 1.9 parsers side-by-side in the trace reader, so users
> > gathering CTF 1.8 traces can use the new tools to view them. Only users
> > generating new trace format (CTF 1.9) will need the new tools to view
> > them.
> >
> >
> > 2. Add base address and symbol information support.
> >
> > The Perf developers expressed strong interest for add base address and
> > symbol information support within the Common Trace Format at the Trace
> > Summit 2014 [3]. This requirement seems to be interesting for most of the
> > community.
> >
> > In the spirit of keeping the Common Trace Format dedicated as much as
> > possible to describe the layout of the binary trace, we intend to keep the
> > base address and symbol information separate from the CTF metadata file.
> > This can be achieved by creating a stream that contains the following
> > object load and unload events:
> >
> > [ object load, timestamp, address space, address, mapping length, object
> > path ]
> > [ object unload, timestamp, address space, address ]
> >
> > This should allow describing load/unload events for the kernel, and for
> > each user-space process.
> >
> > Initially, we can introduce those events into LTTng and Perf-to-CTF in the
> > same way through discussion.
> >
> >
> > 3. Add event versioning.
> >
> > Experience with the Linux kernel Tracepoints shows us that relying on event
> > and field names is not sufficient for binding a trace analysis on the
> > semantic of the kernel or application. Sometimes, changes in the software
> > keep the same event and field name, but change their semantic.
> >
> > A good example of such a change is the wakeup delivery within the Linux
> > kernel which has moved from the context of the thread performing the wake
> > up to an inter-processor interrupt performing the wake-up on a remote CPU.
> > This is an issue for critical path analysis, and it needs to know about
> > this semantic change to the PID field, although the event and field names
> > are unchanged.
> >
> > Therefore, introduce an optional versioning property for the event, which
> > allow trace analysis and models to track which semantic it is tracking.
> >
> >
> > 4. Add field name prefix '$' to eliminate conflicts with reserved keywords.
> >
> > Currently, the Common Trace Format metadata uses a '_' prefix to mitigate
> > conflicts with reserved keywords. However, it was a bad idea, because '_'
> > is a legitimate character part of the identifiers. This creates some
> > confusion in the cases where an event name is indeed
> > prefixed by '_' already: the underscore is either removed when it should
> > not be removed, or '__' is then necessary.
> >
> > Fix this confusion by introducing the '$' optional prefix for identifiers.
> >  
> $context actually looks better than _context
> >
> > 5. Add attributes for type and field reference.
> >
> > Turn the integer and floating point values description into attributes
> > attached to the type:
> >
> > integer ( attribute-list )
> > floating_point ( attribute-list )
> >
> > This will allow using the same concept of attributes on compound types,
> > e.g.:
> >
> > struct { ... } ( attribute-list )
> > variant <tag> { ... } ( attribute-list )
> > int32_t [10] ( attribute-list )
> >
> > attribute-list is a comma-separated list of:
> >
> >   identifier = expression
> >
> > For instance, in the case of event pretty-printing, this can be used as:
> >
> >   format = "content of format string"
> >
> > Where the content of the format string follows what has been proposed in
> > the report on layout description.
> >
> >
> > 6. Add pretty-printing "hints", specified outside of CTF, within either
> > another specification, or an appendix. This will allow splitting data
> > layout (CTF) from formatting.
> >
> > The goal here is to allow the tracer implementation (e.g. LTTng-UST) to
> > specify additional attributes attached to a CTF type. The two attributes
> > we will implement in LTTng-UST and Babeltrace are:
> >
> >   * format = "{field1}: {field3:02x} ({event.context.allo})"
> >
> > The format string formatting would follow the Python format string syntax
> > [4]. The choice of this syntax over printf-alike is due to the ability to
> > refer to specific fields by name within the format string, and thus using
> > just a string as an attribute (rather than a function call which would
> > have a large impact on the CTF TSDL grammar). The Python format strings
> > are very similar to its printf counterpart: they share a very similar
> > format specification.
> >
> > A few examples of printf vs Python format string syntax similarity:
> >
> > Event payload layout:
> >
> > struct {
> >         int myint;
> >         string mystring;
> >         double mydouble;
> > };
> >
> > Printf: printf("abc %05d %s %g", myint, mystring, mydouble)
> > Python: "abc {:05} {} {}"
> >
> > Printf: printf("%s: %x", mystring, myint)
> > Python: "{mystring}: {myint:x}"
> >
> >
> >   * printers = [ "hexdump", "..." ]
> >
> > The tracer and trace viewer implementations can agree on a "printers"
> > attribute, which describes pretty-printing plugins that shall be used to
> > render the type, if such plugins are available.
> >
> > This includes implementing in the Babeltrace reference implementation
> > support for such pretty-printing plugins, which can register to
> > Babeltrace, and express which type or types they expect as input.
> >
> > A reference implementation of plugin will be implemented, which prints a
> > sequence of bytes as:
> >
> > [...]
> > 000015F0 5F 73 70 72 69 6E 74 66 5F 63 68 6B 00 5F 5F 78 _sprintf_chk.__x
> > 00001600 73 74 61 74 00 6D 65 6D 6D 6F 76 65 00 5F 6F 62 stat.memmove._ob
> > 00001610 73 74 61 63 6B 5F 62 65 67 69 6E 00 62 69 6E 64 stack_begin.bind
> > [...]
> >
> 
> _______________________________________________
> diamon-discuss mailing list
> diamon-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/diamon-discuss
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [diamon-discuss] Common Trace Format 1.9 planning
  2015-03-25 19:22     ` Mathieu Desnoyers
@ 2015-03-25 19:40       ` Simon Marchi
  2015-03-25 20:44         ` Mathieu Desnoyers
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Marchi @ 2015-03-25 19:40 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: diamon-discuss

On 15-03-25 03:22 PM, Mathieu Desnoyers wrote:
> > This is great, so v1.9 is a superset of 1.8.
>
> No. A superset would be a completely backward compatible 1.9.
> This is not the case here. We plan to do incompatible changes
> within 1.9, hence the 1.8 and 1.9 parsers needed. Since the
> version is self-described, it should not be an issue to detect
> the CTF version from the trace metadata.

From what I understand, it means that a reader for 1.9 won't be able to read
a 1.8 trace, is that right?

Do the version numbers mean something? If introducing non backward compatible
changes only bumps the minor version, what could ever bump the major?

If they don't have one already, this could be an opportunity to give a meaning
to the numbers. The major could be for breaking backward compatibility, while
the minor could be for backward-compatible changes. It would mean that a reader
for x.y should be able to read any x.z trace, where z <= y. In other words, the
x.y format would be a superset of x.z (I think?). Much like semver.org, but
without the PATCH level.

Simon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [diamon-discuss] Common Trace Format 1.9 planning
  2015-03-25 19:40       ` Simon Marchi
@ 2015-03-25 20:44         ` Mathieu Desnoyers
  2015-03-25 21:00           ` Mathieu Desnoyers
  0 siblings, 1 reply; 8+ messages in thread
From: Mathieu Desnoyers @ 2015-03-25 20:44 UTC (permalink / raw)
  To: Simon Marchi; +Cc: diamon-discuss

----- Original Message -----
> On 15-03-25 03:22 PM, Mathieu Desnoyers wrote:
> > > This is great, so v1.9 is a superset of 1.8.
> >
> > No. A superset would be a completely backward compatible 1.9.
> > This is not the case here. We plan to do incompatible changes
> > within 1.9, hence the 1.8 and 1.9 parsers needed. Since the
> > version is self-described, it should not be an issue to detect
> > the CTF version from the trace metadata.
> 
> From what I understand, it means that a reader for 1.9 won't be able to read
> a 1.8 trace, is that right?

Yes, we plan to add incompatible grammar changes. This might have to be
versioned as 2.0 then. However, we can have side-by-side implementations
of CTF 1.8 and 2.0 readers within a trace reading lib, and therefore
distinguish between those, and use the proper CTF reader implementation.

> 
> Do the version numbers mean something? If introducing non backward compatible
> changes only bumps the minor version, what could ever bump the major?

Good question! In this case, we might be talking about a CTF 2.0 then,
since we are planning non-compatible changes.

> 
> If they don't have one already, this could be an opportunity to give a
> meaning
> to the numbers. The major could be for breaking backward compatibility, while
> the minor could be for backward-compatible changes. It would mean that a
> reader
> for x.y should be able to read any x.z trace, where z <= y. In other words,
> the
> x.y format would be a superset of x.z (I think?). Much like semver.org, but
> without the PATCH level.

Yes, I think it's a good approach.

We have been hesitating between moving from CTF 1.8 to either 1.9 or 2.0.
Here are the upsides/downsides of each approach:

* 1.8 to 1.9:
  + Compatibility: New CTF 1.9 readers would be able to read CTF 1.8 traces,
  - Incompatibility: CTF 1.8 readers would not be able to read CTF 1.9 traces,
  - Complexity: The CTF 1.9 specification would need to be an exact subset
    of 1.8, which means a more complex spec, grammar, and implementations,

* 1.8 to 2.0:
  + Compability: Since we're keeping the version headers, a trace reader
    can implement parsers for both CTF 1.8 and 2.0, and read both trace
    formats without requiring user interaction,
  - Incompatibility: CTF 1.8 readers would not be able to read CTF 2.0 traces,
  + Simplicity: New CTF 2.0 readers would be simpler, since they would not
    need to read CTF 1.8 traces,

Since the user-visible impacts of bumping from 1.8 to 1.9 or from 1.8 to 2.0
appear to be the same for existing implementations of the spec, I am really
tempted to bump to 2.0 at this stage. Already having the version number in
the header makes the transition so much easier to manage.

Thoughts ?

Thanks!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [diamon-discuss] Common Trace Format 1.9 planning
  2015-03-25 20:44         ` Mathieu Desnoyers
@ 2015-03-25 21:00           ` Mathieu Desnoyers
  2015-03-25 21:41             ` Philippe Proulx
  0 siblings, 1 reply; 8+ messages in thread
From: Mathieu Desnoyers @ 2015-03-25 21:00 UTC (permalink / raw)
  To: Simon Marchi, Philippe Proulx; +Cc: diamon-discuss

Philippe might want to chime in.

----- Original Message -----
> ----- Original Message -----
> > On 15-03-25 03:22 PM, Mathieu Desnoyers wrote:
> > > > This is great, so v1.9 is a superset of 1.8.
> > >
> > > No. A superset would be a completely backward compatible 1.9.
> > > This is not the case here. We plan to do incompatible changes
> > > within 1.9, hence the 1.8 and 1.9 parsers needed. Since the
> > > version is self-described, it should not be an issue to detect
> > > the CTF version from the trace metadata.
> > 
> > From what I understand, it means that a reader for 1.9 won't be able to
> > read
> > a 1.8 trace, is that right?
> 
> Yes, we plan to add incompatible grammar changes. This might have to be
> versioned as 2.0 then. However, we can have side-by-side implementations
> of CTF 1.8 and 2.0 readers within a trace reading lib, and therefore
> distinguish between those, and use the proper CTF reader implementation.
> 
> > 
> > Do the version numbers mean something? If introducing non backward
> > compatible
> > changes only bumps the minor version, what could ever bump the major?
> 
> Good question! In this case, we might be talking about a CTF 2.0 then,
> since we are planning non-compatible changes.
> 
> > 
> > If they don't have one already, this could be an opportunity to give a
> > meaning
> > to the numbers. The major could be for breaking backward compatibility,
> > while
> > the minor could be for backward-compatible changes. It would mean that a
> > reader
> > for x.y should be able to read any x.z trace, where z <= y. In other words,
> > the
> > x.y format would be a superset of x.z (I think?). Much like semver.org, but
> > without the PATCH level.
> 
> Yes, I think it's a good approach.
> 
> We have been hesitating between moving from CTF 1.8 to either 1.9 or 2.0.
> Here are the upsides/downsides of each approach:
> 
> * 1.8 to 1.9:
>   + Compatibility: New CTF 1.9 readers would be able to read CTF 1.8 traces,
>   - Incompatibility: CTF 1.8 readers would not be able to read CTF 1.9
>   traces,
>   - Complexity: The CTF 1.9 specification would need to be an exact subset
>     of 1.8, which means a more complex spec, grammar, and implementations,
> 
> * 1.8 to 2.0:
>   + Compability: Since we're keeping the version headers, a trace reader
>     can implement parsers for both CTF 1.8 and 2.0, and read both trace
>     formats without requiring user interaction,
>   - Incompatibility: CTF 1.8 readers would not be able to read CTF 2.0
>   traces,
>   + Simplicity: New CTF 2.0 readers would be simpler, since they would not
>     need to read CTF 1.8 traces,
> 
> Since the user-visible impacts of bumping from 1.8 to 1.9 or from 1.8 to 2.0
> appear to be the same for existing implementations of the spec, I am really
> tempted to bump to 2.0 at this stage. Already having the version number in
> the header makes the transition so much easier to manage.
> 
> Thoughts ?
> 
> Thanks!
> 
> Mathieu
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
> _______________________________________________
> diamon-discuss mailing list
> diamon-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/diamon-discuss
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [diamon-discuss] Common Trace Format 1.9 planning
  2015-03-25 21:00           ` Mathieu Desnoyers
@ 2015-03-25 21:41             ` Philippe Proulx
  2015-03-25 23:07               ` Philippe Proulx
  0 siblings, 1 reply; 8+ messages in thread
From: Philippe Proulx @ 2015-03-25 21:41 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: diamon-discuss

----- Original Message -----
> From: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> To: "Simon Marchi" <simon.marchi@ericsson.com>, "Philippe Proulx" <pproulx@efficios.com>
> Cc: diamon-discuss@lists.linuxfoundation.org
> Sent: Wednesday, 25 March, 2015 5:00:43 PM
> Subject: Re: [diamon-discuss] Common Trace Format 1.9 planning
> 
> Philippe might want to chime in.

Yes, I might.

Comments below.

> 
> ----- Original Message -----
> > ----- Original Message -----
> > > On 15-03-25 03:22 PM, Mathieu Desnoyers wrote:
> > > > > This is great, so v1.9 is a superset of 1.8.
> > > >
> > > > No. A superset would be a completely backward compatible 1.9.
> > > > This is not the case here. We plan to do incompatible changes
> > > > within 1.9, hence the 1.8 and 1.9 parsers needed. Since the
> > > > version is self-described, it should not be an issue to detect
> > > > the CTF version from the trace metadata.
> > > 
> > > From what I understand, it means that a reader for 1.9 won't be able to
> > > read
> > > a 1.8 trace, is that right?
> > 
> > Yes, we plan to add incompatible grammar changes. This might have to be
> > versioned as 2.0 then. However, we can have side-by-side implementations
> > of CTF 1.8 and 2.0 readers within a trace reading lib, and therefore
> > distinguish between those, and use the proper CTF reader implementation.
> > 
> > > 
> > > Do the version numbers mean something? If introducing non backward
> > > compatible
> > > changes only bumps the minor version, what could ever bump the major?
> > 
> > Good question! In this case, we might be talking about a CTF 2.0 then,
> > since we are planning non-compatible changes.
> > 
> > > 
> > > If they don't have one already, this could be an opportunity to give a
> > > meaning
> > > to the numbers. The major could be for breaking backward compatibility,
> > > while
> > > the minor could be for backward-compatible changes. It would mean that a
> > > reader
> > > for x.y should be able to read any x.z trace, where z <= y. In other
> > > words,
> > > the
> > > x.y format would be a superset of x.z (I think?). Much like semver.org,
> > > but
> > > without the PATCH level.
> > 
> > Yes, I think it's a good approach.
> > 
> > We have been hesitating between moving from CTF 1.8 to either 1.9 or 2.0.
> > Here are the upsides/downsides of each approach:
> > 
> > * 1.8 to 1.9:
> >   + Compatibility: New CTF 1.9 readers would be able to read CTF 1.8
> >   traces,
> >   - Incompatibility: CTF 1.8 readers would not be able to read CTF 1.9
> >   traces,
> >   - Complexity: The CTF 1.9 specification would need to be an exact subset

I guess you meant "superset" here.

> >     of 1.8, which means a more complex spec, grammar, and implementations,
> > 
> > * 1.8 to 2.0:
> >   + Compability: Since we're keeping the version headers, a trace reader
> >     can implement parsers for both CTF 1.8 and 2.0, and read both trace
> >     formats without requiring user interaction,
> >   - Incompatibility: CTF 1.8 readers would not be able to read CTF 2.0
> >   traces,
> >   + Simplicity: New CTF 2.0 readers would be simpler, since they would not
> >     need to read CTF 1.8 traces,
> > 
> > Since the user-visible impacts of bumping from 1.8 to 1.9 or from 1.8 to
> > 2.0
> > appear to be the same for existing implementations of the spec, I am really
> > tempted to bump to 2.0 at this stage. Already having the version number in
> > the header makes the transition so much easier to manage.
> > 
> > Thoughts ?

File formats have various ways of identifying their versions.

The "serial" approach is often found (single version). One example is
Microsoft's Bitmap format:
<http://upload.wikimedia.org/wikipedia/commons/c/c4/BMPfileFormat.png>.
Here, the DIB Header Size field indicates the size of the DIB Header, and
depending on this size, a Bitmap file reader knows what fields are available.
This header size may be considered as a serial version number. As the Bitmap
format evolved over the years, new fields were added, and thus the DIB Header
Size field value was increased each time. Old readers may still read newer
Bitmap files here: they just skip the unknown fields thanks to the header size,
and continue from there. The program will miss some information, but should
still be able to decode the image.

I think the major.minor approach should follow this backward-compatibility
approach using the minor version: a CTF reader implemented by reading the
CTF 1.8 specification, should be able to decode CTF 1.9 traces, CTF 1.10
traces, and so on, with no changes. It is expected that this same reader
would not be able to decode CTF 2.0 traces (major bump).

This means that, when a new minor version is released, features may only be
_added_, and added in a way that makes sure older readers supporting older
versions sharing the same major number can still decode the new format as they
previously did. Just as a well-designed API may add features when bumping
its minor version, but must ensure older applications will work with no
changes using this new version.

Here's an example: if hypothetical CTF 1.2 says:

    Integers are defined this way:

    integer {
        size = <size>;
        align = <alignment>;
        <other optional attributes here>
    }

    Other optional attributes may be ignored.

and CTF 1.3 says:

    Integers are defined this way:

    integer {
        size = <size>;
        align = <alignment>;
        base = <base>;
        <other optional attributes here>
    }

    Other optional attributes may be ignored.

Then, a CTF 1.2 reader would be able to read a CTF 1.3 trace, but it
would ignore the "base" attribute (which falls into an "other optional
attribute" as per CTF 1.2) and display all integers in base 10.

A CTF 1.3 reader would know this "base" field, and treat it specially,
displaying integers with the provided radix.

The idea is letting some free space in the specification, like this
"<other optional attributes here>", for future minor revisions of the
format. In binary formats, this is usually done with a fixed-offset field
providing the header size, like Bitmap's DIB Header Size field. As
long as the purpose of this fixed-offset field is known in the first minor
version of a given major version, future, custom fields may be added
at will without breaking older decoders. For a text-based format like
TSDL, letting some free space means relaxing the grammar so that
unknown blocks, fields or attributes are still parseable, but have no
attached semantics (yet).

tl;dr: Unless the current version of Babeltrace/TraceCompass will be
able to read the next version of CTF with no change, it should be
CTF 2.0.

Phil

> > 
> > Thanks!
> > 
> > Mathieu
> > 
> > --
> > Mathieu Desnoyers
> > EfficiOS Inc.
> > http://www.efficios.com
> > _______________________________________________
> > diamon-discuss mailing list
> > diamon-discuss@lists.linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/diamon-discuss
> > 
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [diamon-discuss] Common Trace Format 1.9 planning
  2015-03-25 21:41             ` Philippe Proulx
@ 2015-03-25 23:07               ` Philippe Proulx
  0 siblings, 0 replies; 8+ messages in thread
From: Philippe Proulx @ 2015-03-25 23:07 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: diamon-discuss

----- Original Message -----
> From: "Philippe Proulx" <pproulx@efficios.com>
> To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> Cc: "Simon Marchi" <simon.marchi@ericsson.com>, diamon-discuss@lists.linuxfoundation.org
> Sent: Wednesday, 25 March, 2015 5:41:47 PM
> Subject: Re: [diamon-discuss] Common Trace Format 1.9 planning
> 
> ----- Original Message -----
> > From: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> > To: "Simon Marchi" <simon.marchi@ericsson.com>, "Philippe Proulx"
> > <pproulx@efficios.com>
> > Cc: diamon-discuss@lists.linuxfoundation.org
> > Sent: Wednesday, 25 March, 2015 5:00:43 PM
> > Subject: Re: [diamon-discuss] Common Trace Format 1.9 planning
> > 
> > Philippe might want to chime in.
> 
> Yes, I might.
> 
> Comments below.
> 
> > 
> > ----- Original Message -----
> > > ----- Original Message -----
> > > > On 15-03-25 03:22 PM, Mathieu Desnoyers wrote:
> > > > > > This is great, so v1.9 is a superset of 1.8.
> > > > >
> > > > > No. A superset would be a completely backward compatible 1.9.
> > > > > This is not the case here. We plan to do incompatible changes
> > > > > within 1.9, hence the 1.8 and 1.9 parsers needed. Since the
> > > > > version is self-described, it should not be an issue to detect
> > > > > the CTF version from the trace metadata.
> > > > 
> > > > From what I understand, it means that a reader for 1.9 won't be able to
> > > > read
> > > > a 1.8 trace, is that right?
> > > 
> > > Yes, we plan to add incompatible grammar changes. This might have to be
> > > versioned as 2.0 then. However, we can have side-by-side implementations
> > > of CTF 1.8 and 2.0 readers within a trace reading lib, and therefore
> > > distinguish between those, and use the proper CTF reader implementation.
> > > 
> > > > 
> > > > Do the version numbers mean something? If introducing non backward
> > > > compatible
> > > > changes only bumps the minor version, what could ever bump the major?
> > > 
> > > Good question! In this case, we might be talking about a CTF 2.0 then,
> > > since we are planning non-compatible changes.
> > > 
> > > > 
> > > > If they don't have one already, this could be an opportunity to give a
> > > > meaning
> > > > to the numbers. The major could be for breaking backward compatibility,
> > > > while
> > > > the minor could be for backward-compatible changes. It would mean that
> > > > a
> > > > reader
> > > > for x.y should be able to read any x.z trace, where z <= y. In other
> > > > words,
> > > > the
> > > > x.y format would be a superset of x.z (I think?). Much like semver.org,
> > > > but
> > > > without the PATCH level.
> > > 
> > > Yes, I think it's a good approach.
> > > 
> > > We have been hesitating between moving from CTF 1.8 to either 1.9 or 2.0.
> > > Here are the upsides/downsides of each approach:
> > > 
> > > * 1.8 to 1.9:
> > >   + Compatibility: New CTF 1.9 readers would be able to read CTF 1.8
> > >   traces,
> > >   - Incompatibility: CTF 1.8 readers would not be able to read CTF 1.9
> > >   traces,
> > >   - Complexity: The CTF 1.9 specification would need to be an exact
> > >   subset
> 
> I guess you meant "superset" here.
> 
> > >     of 1.8, which means a more complex spec, grammar, and
> > >     implementations,
> > > 
> > > * 1.8 to 2.0:
> > >   + Compability: Since we're keeping the version headers, a trace reader
> > >     can implement parsers for both CTF 1.8 and 2.0, and read both trace
> > >     formats without requiring user interaction,
> > >   - Incompatibility: CTF 1.8 readers would not be able to read CTF 2.0
> > >   traces,
> > >   + Simplicity: New CTF 2.0 readers would be simpler, since they would
> > >   not
> > >     need to read CTF 1.8 traces,
> > > 
> > > Since the user-visible impacts of bumping from 1.8 to 1.9 or from 1.8 to
> > > 2.0
> > > appear to be the same for existing implementations of the spec, I am
> > > really
> > > tempted to bump to 2.0 at this stage. Already having the version number
> > > in
> > > the header makes the transition so much easier to manage.
> > > 
> > > Thoughts ?
> 
> File formats have various ways of identifying their versions.
> 
> The "serial" approach is often found (single version). One example is
> Microsoft's Bitmap format:
> <http://upload.wikimedia.org/wikipedia/commons/c/c4/BMPfileFormat.png>.
> Here, the DIB Header Size field indicates the size of the DIB Header, and
> depending on this size, a Bitmap file reader knows what fields are available.
> This header size may be considered as a serial version number. As the Bitmap
> format evolved over the years, new fields were added, and thus the DIB Header
> Size field value was increased each time. Old readers may still read newer
> Bitmap files here: they just skip the unknown fields thanks to the header
> size,
> and continue from there. The program will miss some information, but should
> still be able to decode the image.
> 
> I think the major.minor approach should follow this backward-compatibility
> approach using the minor version: a CTF reader implemented by reading the
> CTF 1.8 specification, should be able to decode CTF 1.9 traces, CTF 1.10
> traces, and so on, with no changes. It is expected that this same reader
> would not be able to decode CTF 2.0 traces (major bump).

After some discussions with Mathieu, here's what we agreed.

Let my definition of the minor version (older reader able to read newer
minor versions) become the _patch version_. This means: older readers are able
to read newer patch versions (same major and minor versions).

Let my definition of the major version (complete break, including changing
the TSDL grammar) be split in two:

  * Minor version: changing the TSDL grammar is allowed, as long as the
    changes form a superset of the previous grammar (the previous grammar
    is completely included within the new one). This obviously means that
    a CTF 2.4 reader (which doesn't know CTF 2.5) won't be able to open a
    CTF 2.5 trace, since the grammar could have changed. However, a CTF 2.5
    reader is necessarily able to read CTF 2.4 traces since the grammar of
    CTF 2.5 is a superset of CTF 2.4's.
  * Major version: complete compatibility break, possibly including
    removing/altering parts of the grammar. A CTF 3.x reader is not necessarily
    able to read CTF 2.x traces, since the grammars of CTF 2.x and CTF 3.x
    could form disjoint sets.

This approach makes sure that the major number will not change often, since
adding features to the grammar, as long as it's a superset of the previous one,
only bumps the minor version.

Of course, our efforts for the initial 2.0.0 version will focus on making
sure the new grammar is relaxed enough to reserve room for future improvements
that do not need a change in the grammar (patch version bump). Thus a
CTF 2.0.0 reader will be able to parse CTF 2.0.1 traces, CTF 2.0.2 traces,
and so on, with no changes. For example, custom type attributes could be
prefixed (namespaced), so that new specified attributes may be added without
changing the grammar, and without interfering with custom (prefixed)
attributes, e.g. (hypothetical syntax):

    integer <
        size: 2,
        align: 4,
        x-my-custom-attribute: 0x1000,
        x-my-other-custom-attr: "something",
        attribute: 42 /* <-- May be added in CTF 2.0.1; CTF 2.0.0 readers
                             will ignore it, while CTF 2.0.1+ readers will
                             treat it specially (cannot be a custom attribute,
                             because it's not "x-attribute"). */
    >

So, again, since we plan major alterations of CTF 1.8's grammar, i.e. CTF 1.9's
grammar won't be a superset of the current one, CTF 2.0 is considered the best
next release version number for the moment.

Phil

> 
> This means that, when a new minor version is released, features may only be
> _added_, and added in a way that makes sure older readers supporting older
> versions sharing the same major number can still decode the new format as
> they
> previously did. Just as a well-designed API may add features when bumping
> its minor version, but must ensure older applications will work with no
> changes using this new version.
> 
> Here's an example: if hypothetical CTF 1.2 says:
> 
>     Integers are defined this way:
> 
>     integer {
>         size = <size>;
>         align = <alignment>;
>         <other optional attributes here>
>     }
> 
>     Other optional attributes may be ignored.
> 
> and CTF 1.3 says:
> 
>     Integers are defined this way:
> 
>     integer {
>         size = <size>;
>         align = <alignment>;
>         base = <base>;
>         <other optional attributes here>
>     }
> 
>     Other optional attributes may be ignored.
> 
> Then, a CTF 1.2 reader would be able to read a CTF 1.3 trace, but it
> would ignore the "base" attribute (which falls into an "other optional
> attribute" as per CTF 1.2) and display all integers in base 10.
> 
> A CTF 1.3 reader would know this "base" field, and treat it specially,
> displaying integers with the provided radix.
> 
> The idea is letting some free space in the specification, like this
> "<other optional attributes here>", for future minor revisions of the
> format. In binary formats, this is usually done with a fixed-offset field
> providing the header size, like Bitmap's DIB Header Size field. As
> long as the purpose of this fixed-offset field is known in the first minor
> version of a given major version, future, custom fields may be added
> at will without breaking older decoders. For a text-based format like
> TSDL, letting some free space means relaxing the grammar so that
> unknown blocks, fields or attributes are still parseable, but have no
> attached semantics (yet).
> 
> tl;dr: Unless the current version of Babeltrace/TraceCompass will be
> able to read the next version of CTF with no change, it should be
> CTF 2.0.
> 
> Phil
> 
> > > 
> > > Thanks!
> > > 
> > > Mathieu
> > > 
> > > --
> > > Mathieu Desnoyers
> > > EfficiOS Inc.
> > > http://www.efficios.com
> > > _______________________________________________
> > > diamon-discuss mailing list
> > > diamon-discuss@lists.linuxfoundation.org
> > > https://lists.linuxfoundation.org/mailman/listinfo/diamon-discuss
> > > 
> > 
> > --
> > Mathieu Desnoyers
> > EfficiOS Inc.
> > http://www.efficios.com
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-03-25 23:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <470558667.103090.1427308587048.JavaMail.zimbra@efficios.com>
2015-03-25 18:39 ` [diamon-discuss] Common Trace Format 1.9 planning Mathieu Desnoyers
2015-03-25 19:09   ` Matthew Khouzam
2015-03-25 19:22     ` Mathieu Desnoyers
2015-03-25 19:40       ` Simon Marchi
2015-03-25 20:44         ` Mathieu Desnoyers
2015-03-25 21:00           ` Mathieu Desnoyers
2015-03-25 21:41             ` Philippe Proulx
2015-03-25 23:07               ` Philippe Proulx

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.