qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/3] Simpletrace v2: Support multiple args, strings.
@ 2012-05-24  9:50 Harsh Prateek Bora
  2012-05-24  9:50 ` [Qemu-devel] [PATCH v2 1/3] monitor: remove unused do_info_trace Harsh Prateek Bora
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Harsh Prateek Bora @ 2012-05-24  9:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: Harsh Prateek Bora, stefanha, aneesh.kumar

Existing simpletrace backend allows to trace at max 6 args and does not
support strings. This newer tracelog format gets rid of fixed size records
and therefore allows to trace variable number of args including strings.

Sample trace:
v9fs_version 0.000 tag=0xffff id=0x64 msize=0x2000 version=9P2000.L
v9fs_version_return 6.705 tag=0xffff id=0x64 msize=0x2000 version=9P2000.L
v9fs_attach 174.467 tag=0x1 id=0x68 fid=0x0 afid=0xffffffffffffffff
uname=nobody aname=
v9fs_attach_return 4720.454 tag=0x1 id=0x68 type=0xffffffffffffff80
version=0x4f2a4dd0  path=0x220ea6

v2:
- moved elimination of st_print_trace to 1/3 patch
- use sizeof(TraceRecord) for #define ST_REC_HDR_LEN

v1:
- Simpletrace v2 for the new (pythonized) tracetool

Harsh Prateek Bora (3):
  monitor: remove unused do_info_trace
  Simpletrace v2: Add support for multiple args, strings.
  Update simpletrace.py to support new v2 log format

 monitor.c                           |   16 ---
 scripts/simpletrace.py              |  138 +++++++++++++------
 scripts/tracetool/backend/simple.py |   84 +++++++++---
 trace/simple.c                      |  247 ++++++++++++++++++++---------------
 trace/simple.h                      |   39 +++++-
 5 files changed, 335 insertions(+), 189 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v2 1/3] monitor: remove unused do_info_trace
  2012-05-24  9:50 [Qemu-devel] [PATCH v2 0/3] Simpletrace v2: Support multiple args, strings Harsh Prateek Bora
@ 2012-05-24  9:50 ` Harsh Prateek Bora
  2012-05-24  9:50 ` [Qemu-devel] [PATCH v2 2/3] Simpletrace v2: Add support for multiple args, strings Harsh Prateek Bora
  2012-05-24  9:50 ` [Qemu-devel] [PATCH v2 3/3] Update simpletrace.py to support new v2 log format Harsh Prateek Bora
  2 siblings, 0 replies; 9+ messages in thread
From: Harsh Prateek Bora @ 2012-05-24  9:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: Harsh Prateek Bora, stefanha, aneesh.kumar

Going forward with simpletrace v2 variable size trace records, we cannot
have a generic function to print trace event info and therefore this
interface becomes invalid.

As per Stefan Hajnoczi:

"This command is only available from the human monitor.  It's not very
useful because it historically hasn't been able to pretty-print events
or show them in the right order (we use a ringbuffer but it prints
them out from index 0).

Therefore, I don't think we're under any obligation to keep this
command around.  No one has complained about it's limitations - I
think this is a sign that no one has used it.  I'd be okay with a
patch that removes it."

Ref: http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg01268.html

Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
---
 monitor.c      |   16 ----------------
 trace/simple.c |   18 ------------------
 trace/simple.h |    1 -
 3 files changed, 0 insertions(+), 35 deletions(-)

diff --git a/monitor.c b/monitor.c
index 8946a10..bebf864 100644
--- a/monitor.c
+++ b/monitor.c
@@ -812,13 +812,6 @@ static void do_info_cpu_stats(Monitor *mon)
 }
 #endif
 
-#if defined(CONFIG_TRACE_SIMPLE)
-static void do_info_trace(Monitor *mon)
-{
-    st_print_trace((FILE *)mon, &monitor_fprintf);
-}
-#endif
-
 static void do_trace_print_events(Monitor *mon)
 {
     trace_print_events((FILE *)mon, &monitor_fprintf);
@@ -2585,15 +2578,6 @@ static mon_cmd_t info_cmds[] = {
         .help       = "show roms",
         .mhandler.info = do_info_roms,
     },
-#if defined(CONFIG_TRACE_SIMPLE)
-    {
-        .name       = "trace",
-        .args_type  = "",
-        .params     = "",
-        .help       = "show current contents of trace buffer",
-        .mhandler.info = do_info_trace,
-    },
-#endif
     {
         .name       = "trace-events",
         .args_type  = "",
diff --git a/trace/simple.c b/trace/simple.c
index 33ae486..a6583d3 100644
--- a/trace/simple.c
+++ b/trace/simple.c
@@ -288,24 +288,6 @@ void st_print_trace_file_status(FILE *stream, int (*stream_printf)(FILE *stream,
                   trace_file_name, trace_fp ? "on" : "off");
 }
 
-void st_print_trace(FILE *stream, int (*stream_printf)(FILE *stream, const char *fmt, ...))
-{
-    unsigned int i;
-
-    for (i = 0; i < TRACE_BUF_LEN; i++) {
-        TraceRecord record;
-
-        if (!get_trace_record(i, &record)) {
-            continue;
-        }
-        stream_printf(stream, "Event %" PRIu64 " : %" PRIx64 " %" PRIx64
-                      " %" PRIx64 " %" PRIx64 " %" PRIx64 " %" PRIx64 "\n",
-                      record.event, record.x1, record.x2,
-                      record.x3, record.x4, record.x5,
-                      record.x6);
-    }
-}
-
 void st_flush_trace_buffer(void)
 {
     flush_trace_file(true);
diff --git a/trace/simple.h b/trace/simple.h
index 466e75b..6b5358c 100644
--- a/trace/simple.h
+++ b/trace/simple.h
@@ -29,7 +29,6 @@ void trace3(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3);
 void trace4(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4);
 void trace5(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4, uint64_t x5);
 void trace6(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4, uint64_t x5, uint64_t x6);
-void st_print_trace(FILE *stream, fprintf_function stream_printf);
 void st_print_trace_file_status(FILE *stream, fprintf_function stream_printf);
 void st_set_trace_file_enabled(bool enable);
 bool st_set_trace_file(const char *file);
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v2 2/3] Simpletrace v2: Add support for multiple args, strings.
  2012-05-24  9:50 [Qemu-devel] [PATCH v2 0/3] Simpletrace v2: Support multiple args, strings Harsh Prateek Bora
  2012-05-24  9:50 ` [Qemu-devel] [PATCH v2 1/3] monitor: remove unused do_info_trace Harsh Prateek Bora
@ 2012-05-24  9:50 ` Harsh Prateek Bora
  2012-06-07 14:32   ` Stefan Hajnoczi
  2012-05-24  9:50 ` [Qemu-devel] [PATCH v2 3/3] Update simpletrace.py to support new v2 log format Harsh Prateek Bora
  2 siblings, 1 reply; 9+ messages in thread
From: Harsh Prateek Bora @ 2012-05-24  9:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: Harsh Prateek Bora, stefanha, aneesh.kumar

A newer tracelog format which gets rid of fixed size trace records and
therefore allows to trace multiple arguments as well as strings in trace
events.

Sample trace:
v9fs_version 0.000 tag=0xffff id=0x64 msize=0x2000 version=9P2000.L
v9fs_version_return 6.705 tag=0xffff id=0x64 msize=0x2000 version=9P2000.L
v9fs_attach 174.467 tag=0x1 id=0x68 fid=0x0 afid=0xffffffffffffffff
uname=nobody aname=
v9fs_attach_return 4720.454 tag=0x1 id=0x68 type=0xffffffffffffff80
version=0x4f2a4dd0  path=0x220ea6

Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
---
 scripts/tracetool/backend/simple.py |   84 ++++++++++---
 trace/simple.c                      |  229 ++++++++++++++++++++++-------------
 trace/simple.h                      |   38 +++++-
 3 files changed, 240 insertions(+), 111 deletions(-)

diff --git a/scripts/tracetool/backend/simple.py b/scripts/tracetool/backend/simple.py
index fbb5717..ef9ed93 100644
--- a/scripts/tracetool/backend/simple.py
+++ b/scripts/tracetool/backend/simple.py
@@ -15,9 +15,16 @@ __email__      = "stefanha@linux.vnet.ibm.com"
 
 from tracetool import out
 
+def is_string(arg):
+    strtype = ('const char*', 'char*', 'const char *', 'char *')
+    if arg.lstrip().startswith(strtype):
+        return True
+    else:
+        return False
 
 def c(events):
     out('#include "trace.h"',
+        '#include "trace/simple.h"',
         '',
         'TraceEvent trace_list[] = {')
 
@@ -26,30 +33,69 @@ def c(events):
             name = e.name,
             )
 
-    out('};')
-
-def h(events):
-    out('#include "trace/simple.h"',
+    out('};',
         '')
 
-    for num, e in enumerate(events):
-        if len(e.args):
-            argstr = e.args.names()
-            arg_prefix = ', (uint64_t)(uintptr_t)'
-            cast_args = arg_prefix + arg_prefix.join(argstr)
-            simple_args = (str(num) + cast_args)
-        else:
-            simple_args = str(num)
+    for num, event in enumerate(events):
+        sizes = []
+        for type_, name in event.args:
+            if is_string(type_):
+                sizes.append("4 + safe_strlen(%s)" % name)
+            else:
+                sizes.append("8")
+        sizestr = " + ".join(sizes)
+        if len(event.args) == 0:
+            sizestr = '0'
 
-        out('static inline void trace_%(name)s(%(args)s)',
+        out('void trace_%(name)s(%(args)s)',
             '{',
-            '    trace%(argc)d(%(trace_args)s);',
-            '}',
-            name = e.name,
-            args = e.args,
-            argc = len(e.args),
-            trace_args = simple_args,
+            '    TraceBufferRecord rec;',
+            '',
+            '    if (!trace_list[%(event_id)s].state) {',
+            '        return;',
+            '    }',
+            '',
+            '    if (trace_record_start(&rec, %(event_id)s, %(size_str)s)) {',
+            '        return; /* Trace Buffer Full, Event Dropped ! */',
+            '    }',
+            name = event.name,
+            args = event.args,
+            event_id = num,
+            size_str = sizestr,
             )
 
+        if len(event.args) > 0:
+            for type_, name in event.args:
+                # string
+                if is_string(type_):
+                    out('    trace_record_write_str(&rec, %(name)s);',
+                        name = name,
+                       )
+                # pointer var (not string)
+                elif type_.endswith('*'):
+                    out('    trace_record_write_u64(&rec, (uint64_t)(uint64_t *)%(name)s);',
+                        name = name,
+                       )
+                # primitive data type
+                else:
+                    out('    trace_record_write_u64(&rec, (uint64_t)%(name)s);',
+                       name = name,
+                       )
+
+        out('    trace_record_finish(&rec);',
+            '}',
+            '')
+
+
+def h(events):
+    out('#include "trace/simple.h"',
+        '')
+
+    for event in events:
+        out('void trace_%(name)s(%(args)s);',
+            name = event.name,
+            args = event.args,
+            )
+    out('')
     out('#define NR_TRACE_EVENTS %d' % len(events))
     out('extern TraceEvent trace_list[NR_TRACE_EVENTS];')
diff --git a/trace/simple.c b/trace/simple.c
index a6583d3..4e3ae65 100644
--- a/trace/simple.c
+++ b/trace/simple.c
@@ -27,7 +27,7 @@
 #define HEADER_MAGIC 0xf2b177cb0aa429b4ULL
 
 /** Trace file version number, bump if format changes */
-#define HEADER_VERSION 0
+#define HEADER_VERSION 2
 
 /** Records were dropped event ID */
 #define DROPPED_EVENT_ID (~(uint64_t)0 - 1)
@@ -35,23 +35,6 @@
 /** Trace record is valid */
 #define TRACE_RECORD_VALID ((uint64_t)1 << 63)
 
-/** Trace buffer entry */
-typedef struct {
-    uint64_t event;
-    uint64_t timestamp_ns;
-    uint64_t x1;
-    uint64_t x2;
-    uint64_t x3;
-    uint64_t x4;
-    uint64_t x5;
-    uint64_t x6;
-} TraceRecord;
-
-enum {
-    TRACE_BUF_LEN = 4096,
-    TRACE_BUF_FLUSH_THRESHOLD = TRACE_BUF_LEN / 4,
-};
-
 /*
  * Trace records are written out by a dedicated thread.  The thread waits for
  * records to become available, writes them out, and then waits again.
@@ -62,11 +45,49 @@ static GCond *trace_empty_cond;
 static bool trace_available;
 static bool trace_writeout_enabled;
 
-static TraceRecord trace_buf[TRACE_BUF_LEN];
+enum {
+    TRACE_BUF_LEN = 4096 * 64,
+    TRACE_BUF_FLUSH_THRESHOLD = TRACE_BUF_LEN / 4,
+};
+
+uint8_t trace_buf[TRACE_BUF_LEN];
 static unsigned int trace_idx;
+static unsigned int writeout_idx;
+static uint64_t dropped_events;
 static FILE *trace_fp;
 static char *trace_file_name = NULL;
 
+/* * Trace buffer entry */
+typedef struct {
+    uint64_t event; /*   TraceEventID */
+    uint64_t timestamp_ns;
+    uint32_t length;   /*    in bytes */
+    uint32_t reserved; /*    unused */
+    uint8_t arguments[]; /*  arguments position affects ST_REC_HDR_LEN */
+} TraceRecord;
+
+typedef struct {
+    uint64_t header_event_id; /* HEADER_EVENT_ID */
+    uint64_t header_magic;    /* HEADER_MAGIC    */
+    uint64_t header_version;  /* HEADER_VERSION  */
+} TraceRecordHeader;
+
+/* * Trace record header length */
+#define ST_REC_HDR_LEN sizeof(TraceRecord)
+
+int trace_alloc_record(TraceBufferRecord *rec, TraceEventID event, uint32_t datasize);
+static void read_from_buffer(unsigned int idx, uint8_t *dataptr, uint32_t size);
+static void write_to_buffer(unsigned int idx, uint8_t *dataptr, uint32_t size);
+void trace_mark_record_complete(TraceBufferRecord *rec);
+
+uint32_t safe_strlen(const char* str)
+{
+    if (str == NULL) {
+        return 0;
+    }
+    return strlen(str);
+}
+
 /**
  * Read a trace record from the trace buffer
  *
@@ -75,16 +96,22 @@ static char *trace_file_name = NULL;
  *
  * Returns false if the record is not valid.
  */
-static bool get_trace_record(unsigned int idx, TraceRecord *record)
+static bool get_trace_record(unsigned int idx, TraceRecord **recordptr)
 {
-    if (!(trace_buf[idx].event & TRACE_RECORD_VALID)) {
+    uint8_t temp_rec[ST_REC_HDR_LEN];
+    TraceRecord *record = (TraceRecord *) temp_rec;
+    read_from_buffer(idx, temp_rec, ST_REC_HDR_LEN);
+
+    if (!(record->event & TRACE_RECORD_VALID)) {
         return false;
     }
 
     __sync_synchronize(); /* read memory barrier before accessing record */
 
-    *record = trace_buf[idx];
-    record->event &= ~TRACE_RECORD_VALID;
+    *recordptr = g_malloc(record->length);
+    /* make a copy of record to avoid being overwritten */
+    read_from_buffer(idx, (uint8_t *)*recordptr, record->length);
+    (*recordptr)->event &= ~TRACE_RECORD_VALID;
     return true;
 }
 
@@ -120,29 +147,31 @@ static void wait_for_trace_records_available(void)
 
 static gpointer writeout_thread(gpointer opaque)
 {
-    TraceRecord record;
-    unsigned int writeout_idx = 0;
-    unsigned int num_available, idx;
+    TraceRecord *recordptr;
+    unsigned int idx = 0;
     size_t unused __attribute__ ((unused));
 
     for (;;) {
         wait_for_trace_records_available();
 
-        num_available = trace_idx - writeout_idx;
-        if (num_available > TRACE_BUF_LEN) {
-            record = (TraceRecord){
-                .event = DROPPED_EVENT_ID,
-                .x1 = num_available,
-            };
-            unused = fwrite(&record, sizeof(record), 1, trace_fp);
-            writeout_idx += num_available;
+        if (dropped_events) {
+            recordptr = g_malloc(ST_REC_HDR_LEN + sizeof(dropped_events));
+            recordptr->event = DROPPED_EVENT_ID,
+            recordptr->timestamp_ns = get_clock();
+            recordptr->length = ST_REC_HDR_LEN + sizeof(dropped_events),
+            recordptr->reserved = 0;
+            *(uint64_t *) &(recordptr->arguments[0]) = dropped_events;
+            dropped_events = 0;
+            unused = fwrite(recordptr, recordptr->length, 1, trace_fp);
         }
 
-        idx = writeout_idx % TRACE_BUF_LEN;
-        while (get_trace_record(idx, &record)) {
-            trace_buf[idx].event = 0; /* clear valid bit */
-            unused = fwrite(&record, sizeof(record), 1, trace_fp);
-            idx = ++writeout_idx % TRACE_BUF_LEN;
+        while (get_trace_record(idx, &recordptr)) {
+            unused = fwrite(recordptr, recordptr->length, 1, trace_fp);
+            writeout_idx += recordptr->length;
+            g_free(recordptr);
+            recordptr = (TraceRecord *) &trace_buf[idx];
+            recordptr->event = 0;
+            idx = writeout_idx % TRACE_BUF_LEN;
         }
 
         fflush(trace_fp);
@@ -150,72 +179,101 @@ static gpointer writeout_thread(gpointer opaque)
     return NULL;
 }
 
-static void trace(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3,
-                  uint64_t x4, uint64_t x5, uint64_t x6)
+int trace_record_start(TraceBufferRecord *rec, TraceEventID id, size_t arglen)
 {
-    unsigned int idx;
-    uint64_t timestamp;
-
-    if (!trace_list[event].state) {
-        return;
-    }
-
-    timestamp = get_clock();
-
-    idx = g_atomic_int_exchange_and_add((gint *)&trace_idx, 1) % TRACE_BUF_LEN;
-    trace_buf[idx] = (TraceRecord){
-        .event = event,
-        .timestamp_ns = timestamp,
-        .x1 = x1,
-        .x2 = x2,
-        .x3 = x3,
-        .x4 = x4,
-        .x5 = x5,
-        .x6 = x6,
-    };
-    __sync_synchronize(); /* write barrier before marking as valid */
-    trace_buf[idx].event |= TRACE_RECORD_VALID;
-
-    if ((idx + 1) % TRACE_BUF_FLUSH_THRESHOLD == 0) {
-        flush_trace_file(false);
-    }
+    return trace_alloc_record(rec, id, arglen); /* return 0 on success */
 }
 
-void trace0(TraceEventID event)
+void trace_record_write_u64(TraceBufferRecord *rec, uint64_t val)
 {
-    trace(event, 0, 0, 0, 0, 0, 0);
+    write_to_buffer(rec->rec_off, (uint8_t *)&val, sizeof(uint64_t));
+    rec->rec_off += sizeof(uint64_t);
 }
 
-void trace1(TraceEventID event, uint64_t x1)
+void trace_record_write_str(TraceBufferRecord *rec, const char *s)
 {
-    trace(event, x1, 0, 0, 0, 0, 0);
+    /* Write string length first */
+    uint32_t slen = (s == NULL ? 0 : strlen(s));
+    write_to_buffer(rec->rec_off, (uint8_t *)&slen, sizeof(slen));
+    rec->rec_off += sizeof(slen);
+    /* Write actual string now */
+    write_to_buffer(rec->rec_off, (uint8_t *)s, slen);
+    rec->rec_off += slen;
 }
 
-void trace2(TraceEventID event, uint64_t x1, uint64_t x2)
+void trace_record_finish(TraceBufferRecord *rec)
 {
-    trace(event, x1, x2, 0, 0, 0, 0);
+    trace_mark_record_complete(rec);
 }
 
-void trace3(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3)
+int trace_alloc_record(TraceBufferRecord *rec, TraceEventID event, uint32_t datasize)
 {
-    trace(event, x1, x2, x3, 0, 0, 0);
+    unsigned int idx, rec_off;
+    uint32_t rec_len = ST_REC_HDR_LEN + datasize;
+    uint64_t timestamp_ns = get_clock();
+
+    if ((rec_len + trace_idx - writeout_idx) > TRACE_BUF_LEN) {
+        /* Trace Buffer Full, Event dropped ! */
+        dropped_events++;
+        return 1;
+    }
+    idx = g_atomic_int_exchange_and_add((gint *)&trace_idx, rec_len) % TRACE_BUF_LEN;
+
+    /*  To check later if threshold crossed */
+    rec->next_tbuf_idx = trace_idx % TRACE_BUF_LEN;
+
+    rec_off = idx;
+    write_to_buffer(rec_off, (uint8_t*)&event, sizeof(event));
+    rec_off += sizeof(event);
+    write_to_buffer(rec_off, (uint8_t*)&timestamp_ns, sizeof(timestamp_ns));
+    rec_off += sizeof(timestamp_ns);
+    write_to_buffer(rec_off, (uint8_t*)&rec_len, sizeof(rec_len));
+    rec_off += sizeof(rec_len);
+
+    rec->tbuf_idx = idx;
+    rec->rec_off  = (idx + ST_REC_HDR_LEN) % TRACE_BUF_LEN;
+    return 0;
 }
 
-void trace4(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4)
+static void read_from_buffer(unsigned int idx, uint8_t *dataptr, uint32_t size)
 {
-    trace(event, x1, x2, x3, x4, 0, 0);
+    uint32_t x = 0;
+    while (x < size) {
+        if (idx >= TRACE_BUF_LEN) {
+            idx = idx % TRACE_BUF_LEN;
+        }
+        dataptr[x++] = trace_buf[idx++];
+    }
 }
 
-void trace5(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4, uint64_t x5)
+static void write_to_buffer(unsigned int idx, uint8_t *dataptr, uint32_t size)
 {
-    trace(event, x1, x2, x3, x4, x5, 0);
+    uint32_t x = 0;
+    while (x < size) {
+        if (idx >= TRACE_BUF_LEN) {
+            idx = idx % TRACE_BUF_LEN;
+        }
+        trace_buf[idx++] = dataptr[x++];
+    }
 }
 
-void trace6(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4, uint64_t x5, uint64_t x6)
+void trace_mark_record_complete(TraceBufferRecord *rec)
 {
-    trace(event, x1, x2, x3, x4, x5, x6);
+    uint8_t temp_rec[ST_REC_HDR_LEN];
+    TraceRecord *record = (TraceRecord *) temp_rec;
+    read_from_buffer(rec->tbuf_idx, temp_rec, ST_REC_HDR_LEN);
+    __sync_synchronize(); /* write barrier before marking as valid */
+    record->event |= TRACE_RECORD_VALID;
+    write_to_buffer(rec->tbuf_idx, temp_rec, ST_REC_HDR_LEN);
+
+    if (rec->next_tbuf_idx > TRACE_BUF_FLUSH_THRESHOLD &&
+        rec->tbuf_idx <= TRACE_BUF_FLUSH_THRESHOLD) {
+        flush_trace_file(false);
+    }
 }
 
+
+
 void st_set_trace_file_enabled(bool enable)
 {
     if (enable == !!trace_fp) {
@@ -228,10 +286,11 @@ void st_set_trace_file_enabled(bool enable)
     flush_trace_file(true);
 
     if (enable) {
-        static const TraceRecord header = {
-            .event = HEADER_EVENT_ID,
-            .timestamp_ns = HEADER_MAGIC,
-            .x1 = HEADER_VERSION,
+        static const TraceRecordHeader header = {
+            .header_event_id = HEADER_EVENT_ID,
+            .header_magic = HEADER_MAGIC,
+            /* Older log readers will check for version at next location */
+            .header_version = HEADER_VERSION,
         };
 
         trace_fp = fopen(trace_file_name, "wb");
diff --git a/trace/simple.h b/trace/simple.h
index 6b5358c..06ea778 100644
--- a/trace/simple.h
+++ b/trace/simple.h
@@ -22,16 +22,40 @@ typedef struct {
     bool state;
 } TraceEvent;
 
-void trace0(TraceEventID event);
-void trace1(TraceEventID event, uint64_t x1);
-void trace2(TraceEventID event, uint64_t x1, uint64_t x2);
-void trace3(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3);
-void trace4(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4);
-void trace5(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4, uint64_t x5);
-void trace6(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4, uint64_t x5, uint64_t x6);
 void st_print_trace_file_status(FILE *stream, fprintf_function stream_printf);
 void st_set_trace_file_enabled(bool enable);
 bool st_set_trace_file(const char *file);
 void st_flush_trace_buffer(void);
 
+typedef struct {
+    unsigned int tbuf_idx;
+    unsigned int next_tbuf_idx;
+    unsigned int rec_off;
+} TraceBufferRecord;
+
+/* *
+ * Initialize a trace record and claim space for it in the buffer
+ *
+ * @arglen  number of bytes required for arguments
+ */
+int trace_record_start(TraceBufferRecord *rec, TraceEventID id, size_t arglen);
+
+/* *
+ * Append a 64-bit argument to a trace record
+ */
+void trace_record_write_u64(TraceBufferRecord *rec, uint64_t val);
+
+/* *
+ * Append a string argument to a trace record
+ */
+void trace_record_write_str(TraceBufferRecord *rec, const char *s);
+
+/* *
+ * Mark a trace record completed
+ *
+ * Don't append any more arguments to the trace record after calling this.
+ */
+void trace_record_finish(TraceBufferRecord *rec);
+uint32_t safe_strlen(const char* str);
+
 #endif /* TRACE_SIMPLE_H */
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v2 3/3] Update simpletrace.py to support new v2 log format
  2012-05-24  9:50 [Qemu-devel] [PATCH v2 0/3] Simpletrace v2: Support multiple args, strings Harsh Prateek Bora
  2012-05-24  9:50 ` [Qemu-devel] [PATCH v2 1/3] monitor: remove unused do_info_trace Harsh Prateek Bora
  2012-05-24  9:50 ` [Qemu-devel] [PATCH v2 2/3] Simpletrace v2: Add support for multiple args, strings Harsh Prateek Bora
@ 2012-05-24  9:50 ` Harsh Prateek Bora
  2012-06-07 14:49   ` Stefan Hajnoczi
  2 siblings, 1 reply; 9+ messages in thread
From: Harsh Prateek Bora @ 2012-05-24  9:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: Harsh Prateek Bora, stefanha, aneesh.kumar

With this patch, simpletrace.py can parse the older as well as the newer log
format which supports variable arguments and strings.

Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
---
 scripts/simpletrace.py |  138 +++++++++++++++++++++++++++++++++---------------
 1 files changed, 95 insertions(+), 43 deletions(-)

diff --git a/scripts/simpletrace.py b/scripts/simpletrace.py
index f55e5e6..0f6d9d6 100755
--- a/scripts/simpletrace.py
+++ b/scripts/simpletrace.py
@@ -12,53 +12,79 @@
 import struct
 import re
 import inspect
+from tracetool import _read_events
+from tracetool.backend.simple import is_string
 
 header_event_id = 0xffffffffffffffff
 header_magic    = 0xf2b177cb0aa429b4
-header_version  = 0
+log_version = 0
 dropped_event_id = 0xfffffffffffffffe
 
-trace_fmt = '=QQQQQQQQ'
-trace_len = struct.calcsize(trace_fmt)
-event_re  = re.compile(r'(disable\s+)?([a-zA-Z0-9_]+)\(([^)]*)\).*')
+log_header_fmt = '=QQQ'
+rec_header_fmt = '=QQII'
+trace_v1_fmt = '=QQQQQQQQ'
+trace_v1_len = struct.calcsize(trace_v1_fmt)
 
-def parse_events(fobj):
-    """Parse a trace-events file into {event_num: (name, arg1, ...)}."""
-
-    def get_argnames(args):
-        """Extract argument names from a parameter list."""
-        return tuple(arg.split()[-1].lstrip('*') for arg in args.split(','))
-
-    events = {dropped_event_id: ('dropped', 'count')}
-    event_num = 0
-    for line in fobj:
-        m = event_re.match(line.strip())
-        if m is None:
-            continue
-
-        disable, name, args = m.groups()
-        events[event_num] = (name,) + get_argnames(args)
-        event_num += 1
-    return events
-
-def read_record(fobj):
-    """Deserialize a trace record from a file into a tuple (event_num, timestamp, arg1, ..., arg6)."""
-    s = fobj.read(trace_len)
-    if len(s) != trace_len:
+def read_header(fobj, hfmt):
+    '''Read a trace record header'''
+    hlen = struct.calcsize(hfmt)
+    hdr = fobj.read(hlen)
+    if len(hdr) != hlen:
         return None
-    return struct.unpack(trace_fmt, s)
+    return struct.unpack(hfmt, hdr)
 
-def read_trace_file(fobj):
+def get_record(edict, rechdr, fobj):
+    """Deserialize a simpletrace v2 record from a file into a tuple (event_num, timestamp, arg1, ..., arg6)."""
+    if rechdr is None:
+        return None
+    rec = (rechdr[0], rechdr[1])
+    if rechdr[0] != dropped_event_id:
+        event_id = rechdr[0]
+        event = edict[event_id]
+        for type, name in event.args:
+            if is_string(type):
+                l = fobj.read(4)
+                (len,) = struct.unpack('=L', l)
+                s = fobj.read(len)
+                rec = rec + (s,)
+            else:
+                (value,) = struct.unpack('=Q', fobj.read(8))
+                rec = rec + (value,)
+    else:
+        (value,) = struct.unpack('=Q', fobj.read(8))
+        rec = rec + (value,)
+    return rec
+
+
+def read_record(edict, fobj):
+    """Deserialize a trace record from a file into a tuple (event_num, timestamp, arg1, ..., arg6)."""
+    if log_version == 0:
+        s = fobj.read(trace_v1_len)
+        if len(s) != trace_v1_len:
+            return None
+        return struct.unpack(trace_v1_fmt, s)
+    if log_version == 2:
+        rechdr = read_header(fobj, rec_header_fmt)
+        return get_record(edict, rechdr, fobj) # return tuple of record elements
+
+def read_trace_file(edict, fobj):
     """Deserialize trace records from a file, yielding record tuples (event_num, timestamp, arg1, ..., arg6)."""
-    header = read_record(fobj)
+    global log_version
+    header = read_header(fobj, log_header_fmt)
     if header is None or \
        header[0] != header_event_id or \
-       header[1] != header_magic or \
-       header[2] != header_version:
-        raise ValueError('not a trace file or incompatible version')
+       header[1] != header_magic:
+        raise ValueError('not a valid trace file')
+    if header[2] != 0 and \
+       header[2] != 2:
+        raise ValueError('trace file version not supported')
+
+    log_version = header[2]
+    if log_version == 0:
+        temp = fobj.read(40) # read unused header bytes
 
     while True:
-        rec = read_record(fobj)
+        rec = read_record(edict, fobj)
         if rec is None:
             break
 
@@ -89,16 +115,28 @@ class Analyzer(object):
 def process(events, log, analyzer):
     """Invoke an analyzer on each event in a log."""
     if isinstance(events, str):
-        events = parse_events(open(events, 'r'))
+        events = _read_events(open(events, 'r'))
     if isinstance(log, str):
         log = open(log, 'rb')
 
+    enabled_events = []
+    edict = {dropped_event_id: 'dropped_count'}
+
+    for e in events:
+        if 'disable' not in e.properties:
+            enabled_events.append(e)
+    for num, event in enumerate(enabled_events):
+        edict[num] = event
+
     def build_fn(analyzer, event):
-        fn = getattr(analyzer, event[0], None)
+        if isinstance(event, str):
+            return analyzer.catchall
+
+        fn = getattr(analyzer, event.name, None)
         if fn is None:
             return analyzer.catchall
 
-        event_argcount = len(event) - 1
+        event_argcount = len(event.args)
         fn_argcount = len(inspect.getargspec(fn)[0]) - 1
         if fn_argcount == event_argcount + 1:
             # Include timestamp as first argument
@@ -109,9 +147,9 @@ def process(events, log, analyzer):
 
     analyzer.begin()
     fn_cache = {}
-    for rec in read_trace_file(log):
+    for rec in read_trace_file(edict, log):
         event_num = rec[0]
-        event = events[event_num]
+        event = edict[event_num]
         if event_num not in fn_cache:
             fn_cache[event_num] = build_fn(analyzer, event)
         fn_cache[event_num](event, rec)
@@ -128,7 +166,7 @@ def run(analyzer):
         sys.stderr.write('usage: %s <trace-events> <trace-file>\n' % sys.argv[0])
         sys.exit(1)
 
-    events = parse_events(open(sys.argv[1], 'r'))
+    events = _read_events(open(sys.argv[1], 'r'))
     process(events, sys.argv[2], analyzer)
 
 if __name__ == '__main__':
@@ -137,15 +175,29 @@ if __name__ == '__main__':
             self.last_timestamp = None
 
         def catchall(self, event, rec):
+            i = 1
             timestamp = rec[1]
             if self.last_timestamp is None:
                 self.last_timestamp = timestamp
             delta_ns = timestamp - self.last_timestamp
             self.last_timestamp = timestamp
 
-            fields = [event[0], '%0.3f' % (delta_ns / 1000.0)]
-            for i in xrange(1, len(event)):
-                fields.append('%s=0x%x' % (event[i], rec[i + 1]))
+            if rec[0] == dropped_event_id:
+                fields = ['Dropped_Event', '%0.3f' % (delta_ns / 1000.0)]
+                fields.append('%s=0x%x' % ("dropped_events", rec[2]))
+            else:
+                fields = [event.name, '%0.3f' % (delta_ns / 1000.0)]
+                if log_version == 0:
+                    for type, name in event.args:
+                        fields.append('%s=0x%x' % (name, rec[i + 1]))
+                        i += 1
+                elif log_version == 2:
+                    for type, name in event.args:
+                        if is_string(type):
+                             fields.append('%s=%s' % (name, rec[i + 1]))
+                        else:
+                             fields.append('%s=0x%x' % (name, rec[i + 1]))
+                        i += 1
             print ' '.join(fields)
 
     run(Formatter())
-- 
1.7.1.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] Simpletrace v2: Add support for multiple args, strings.
  2012-05-24  9:50 ` [Qemu-devel] [PATCH v2 2/3] Simpletrace v2: Add support for multiple args, strings Harsh Prateek Bora
@ 2012-06-07 14:32   ` Stefan Hajnoczi
  2012-06-11 12:31     ` Harsh Bora
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2012-06-07 14:32 UTC (permalink / raw)
  To: Harsh Prateek Bora; +Cc: aneesh.kumar, qemu-devel, stefanha

On Thu, May 24, 2012 at 10:50 AM, Harsh Prateek Bora
<harsh@linux.vnet.ibm.com> wrote:
> A newer tracelog format which gets rid of fixed size trace records and
> therefore allows to trace multiple arguments as well as strings in trace
> events.
>
> Sample trace:
> v9fs_version 0.000 tag=0xffff id=0x64 msize=0x2000 version=9P2000.L
> v9fs_version_return 6.705 tag=0xffff id=0x64 msize=0x2000 version=9P2000.L
> v9fs_attach 174.467 tag=0x1 id=0x68 fid=0x0 afid=0xffffffffffffffff
> uname=nobody aname=
> v9fs_attach_return 4720.454 tag=0x1 id=0x68 type=0xffffffffffffff80
> version=0x4f2a4dd0  path=0x220ea6
>
> Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
> ---
>  scripts/tracetool/backend/simple.py |   84 ++++++++++---
>  trace/simple.c                      |  229 ++++++++++++++++++++++-------------
>  trace/simple.h                      |   38 +++++-
>  3 files changed, 240 insertions(+), 111 deletions(-)

Sorry for the delay.  Mostly easy to fix comments, some care still
needed to avoid race conditions and use memory barriers in the right
places.

> diff --git a/trace/simple.c b/trace/simple.c
> index a6583d3..4e3ae65 100644
> --- a/trace/simple.c
> +++ b/trace/simple.c
> @@ -27,7 +27,7 @@
>  #define HEADER_MAGIC 0xf2b177cb0aa429b4ULL
>
>  /** Trace file version number, bump if format changes */
> -#define HEADER_VERSION 0
> +#define HEADER_VERSION 2
>
>  /** Records were dropped event ID */
>  #define DROPPED_EVENT_ID (~(uint64_t)0 - 1)
> @@ -35,23 +35,6 @@
>  /** Trace record is valid */
>  #define TRACE_RECORD_VALID ((uint64_t)1 << 63)
>
> -/** Trace buffer entry */
> -typedef struct {
> -    uint64_t event;
> -    uint64_t timestamp_ns;
> -    uint64_t x1;
> -    uint64_t x2;
> -    uint64_t x3;
> -    uint64_t x4;
> -    uint64_t x5;
> -    uint64_t x6;
> -} TraceRecord;
> -
> -enum {
> -    TRACE_BUF_LEN = 4096,
> -    TRACE_BUF_FLUSH_THRESHOLD = TRACE_BUF_LEN / 4,
> -};
> -
>  /*
>  * Trace records are written out by a dedicated thread.  The thread waits for
>  * records to become available, writes them out, and then waits again.
> @@ -62,11 +45,49 @@ static GCond *trace_empty_cond;
>  static bool trace_available;
>  static bool trace_writeout_enabled;
>
> -static TraceRecord trace_buf[TRACE_BUF_LEN];
> +enum {
> +    TRACE_BUF_LEN = 4096 * 64,
> +    TRACE_BUF_FLUSH_THRESHOLD = TRACE_BUF_LEN / 4,
> +};
> +
> +uint8_t trace_buf[TRACE_BUF_LEN];
>  static unsigned int trace_idx;
> +static unsigned int writeout_idx;
> +static uint64_t dropped_events;
>  static FILE *trace_fp;
>  static char *trace_file_name = NULL;
>
> +/* * Trace buffer entry */
> +typedef struct {
> +    uint64_t event; /*   TraceEventID */
> +    uint64_t timestamp_ns;
> +    uint32_t length;   /*    in bytes */
> +    uint32_t reserved; /*    unused */
> +    uint8_t arguments[]; /*  arguments position affects ST_REC_HDR_LEN */
> +} TraceRecord;
> +
> +typedef struct {
> +    uint64_t header_event_id; /* HEADER_EVENT_ID */
> +    uint64_t header_magic;    /* HEADER_MAGIC    */
> +    uint64_t header_version;  /* HEADER_VERSION  */
> +} TraceRecordHeader;
> +
> +/* * Trace record header length */
> +#define ST_REC_HDR_LEN sizeof(TraceRecord)

Why #define this?  It's confusing because TraceRecordHeader (notice
the word "header") is what comes to mind since this is named
ST_REC_HDR_LEN.  I suggest using sizeof(TraceRecord) explicitly in the
code and dropping this macro.

> +
> +int trace_alloc_record(TraceBufferRecord *rec, TraceEventID event, uint32_t datasize);
> +static void read_from_buffer(unsigned int idx, uint8_t *dataptr, uint32_t size);
> +static void write_to_buffer(unsigned int idx, uint8_t *dataptr, uint32_t size);
> +void trace_mark_record_complete(TraceBufferRecord *rec);
> +
> +uint32_t safe_strlen(const char* str)
> +{
> +    if (str == NULL) {
> +        return 0;
> +    }
> +    return strlen(str);
> +}
> +
>  /**
>  * Read a trace record from the trace buffer
>  *
> @@ -75,16 +96,22 @@ static char *trace_file_name = NULL;
>  *
>  * Returns false if the record is not valid.
>  */
> -static bool get_trace_record(unsigned int idx, TraceRecord *record)
> +static bool get_trace_record(unsigned int idx, TraceRecord **recordptr)
>  {
> -    if (!(trace_buf[idx].event & TRACE_RECORD_VALID)) {
> +    uint8_t temp_rec[ST_REC_HDR_LEN];
> +    TraceRecord *record = (TraceRecord *) temp_rec;
> +    read_from_buffer(idx, temp_rec, ST_REC_HDR_LEN);
> +
> +    if (!(record->event & TRACE_RECORD_VALID)) {
>         return false;
>     }
>
>     __sync_synchronize(); /* read memory barrier before accessing record */

The need for the memory barrier is no longer clear.  Previously we
were directly accessing the trace ring buffer, and therefore needed to
ensure fields were settled before accessing them.  Now we use
read_from_buffer() which copies the data into our temporary struct on
the stack.

I think the best way of doing it is to read the event field first in a
separate step, then do the read memory barrier, and then read the rest
of the record.  This ensures that the event field is at least as "old"
as the other fields we read.

>
> -    *record = trace_buf[idx];
> -    record->event &= ~TRACE_RECORD_VALID;
> +    *recordptr = g_malloc(record->length);
> +    /* make a copy of record to avoid being overwritten */
> +    read_from_buffer(idx, (uint8_t *)*recordptr, record->length);
> +    (*recordptr)->event &= ~TRACE_RECORD_VALID;
>     return true;
>  }
>
> @@ -120,29 +147,31 @@ static void wait_for_trace_records_available(void)
>
>  static gpointer writeout_thread(gpointer opaque)
>  {
> -    TraceRecord record;
> -    unsigned int writeout_idx = 0;
> -    unsigned int num_available, idx;
> +    TraceRecord *recordptr;
> +    unsigned int idx = 0;
>     size_t unused __attribute__ ((unused));
>
>     for (;;) {
>         wait_for_trace_records_available();
>
> -        num_available = trace_idx - writeout_idx;
> -        if (num_available > TRACE_BUF_LEN) {
> -            record = (TraceRecord){
> -                .event = DROPPED_EVENT_ID,
> -                .x1 = num_available,
> -            };
> -            unused = fwrite(&record, sizeof(record), 1, trace_fp);
> -            writeout_idx += num_available;
> +        if (dropped_events) {
> +            recordptr = g_malloc(ST_REC_HDR_LEN + sizeof(dropped_events));
> +            recordptr->event = DROPPED_EVENT_ID,
> +            recordptr->timestamp_ns = get_clock();
> +            recordptr->length = ST_REC_HDR_LEN + sizeof(dropped_events),
> +            recordptr->reserved = 0;
> +            *(uint64_t *) &(recordptr->arguments[0]) = dropped_events;

memcpy(recordptr->arguments, &dropped_events, sizeof(dropped_events));

About the same ugliness but avoids the unaligned store.

> +            dropped_events = 0;
> +            unused = fwrite(recordptr, recordptr->length, 1, trace_fp);

recordptr is leaked.  It's easiest to allocate the dropped event
record on the stack instead.

>         }
>
> -        idx = writeout_idx % TRACE_BUF_LEN;
> -        while (get_trace_record(idx, &record)) {
> -            trace_buf[idx].event = 0; /* clear valid bit */
> -            unused = fwrite(&record, sizeof(record), 1, trace_fp);
> -            idx = ++writeout_idx % TRACE_BUF_LEN;
> +        while (get_trace_record(idx, &recordptr)) {
> +            unused = fwrite(recordptr, recordptr->length, 1, trace_fp);
> +            writeout_idx += recordptr->length;
> +            g_free(recordptr);
> +            recordptr = (TraceRecord *) &trace_buf[idx];
> +            recordptr->event = 0;

It would be cleaner for get_trace_record() to clear the
TRACE_RECORD_VALID bit.  This way the caller doesn't have to access
the raw trace buffer - and this is critical because recordptr->event
will write outside the bounds of the buffer when idx >= TRACE_BUF_LEN
- 4!

> +            idx = writeout_idx % TRACE_BUF_LEN;
>         }
>
>         fflush(trace_fp);
> @@ -150,72 +179,101 @@ static gpointer writeout_thread(gpointer opaque)
>     return NULL;
>  }
>
> -static void trace(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3,
> -                  uint64_t x4, uint64_t x5, uint64_t x6)
> +int trace_record_start(TraceBufferRecord *rec, TraceEventID id, size_t arglen)
>  {
> -    unsigned int idx;
> -    uint64_t timestamp;
> -
> -    if (!trace_list[event].state) {
> -        return;
> -    }
> -
> -    timestamp = get_clock();
> -
> -    idx = g_atomic_int_exchange_and_add((gint *)&trace_idx, 1) % TRACE_BUF_LEN;
> -    trace_buf[idx] = (TraceRecord){
> -        .event = event,
> -        .timestamp_ns = timestamp,
> -        .x1 = x1,
> -        .x2 = x2,
> -        .x3 = x3,
> -        .x4 = x4,
> -        .x5 = x5,
> -        .x6 = x6,
> -    };
> -    __sync_synchronize(); /* write barrier before marking as valid */
> -    trace_buf[idx].event |= TRACE_RECORD_VALID;
> -
> -    if ((idx + 1) % TRACE_BUF_FLUSH_THRESHOLD == 0) {
> -        flush_trace_file(false);
> -    }
> +    return trace_alloc_record(rec, id, arglen); /* return 0 on success */
>  }
>
> -void trace0(TraceEventID event)
> +void trace_record_write_u64(TraceBufferRecord *rec, uint64_t val)
>  {
> -    trace(event, 0, 0, 0, 0, 0, 0);
> +    write_to_buffer(rec->rec_off, (uint8_t *)&val, sizeof(uint64_t));

It's fine to make the second argument void* if you get tired of
casting to uint8_t.

> +    rec->rec_off += sizeof(uint64_t);
>  }
>
> -void trace1(TraceEventID event, uint64_t x1)
> +void trace_record_write_str(TraceBufferRecord *rec, const char *s)
>  {
> -    trace(event, x1, 0, 0, 0, 0, 0);
> +    /* Write string length first */
> +    uint32_t slen = (s == NULL ? 0 : strlen(s));
> +    write_to_buffer(rec->rec_off, (uint8_t *)&slen, sizeof(slen));
> +    rec->rec_off += sizeof(slen);
> +    /* Write actual string now */
> +    write_to_buffer(rec->rec_off, (uint8_t *)s, slen);
> +    rec->rec_off += slen;
>  }
>
> -void trace2(TraceEventID event, uint64_t x1, uint64_t x2)
> +void trace_record_finish(TraceBufferRecord *rec)
>  {
> -    trace(event, x1, x2, 0, 0, 0, 0);
> +    trace_mark_record_complete(rec);
>  }
>
> -void trace3(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3)
> +int trace_alloc_record(TraceBufferRecord *rec, TraceEventID event, uint32_t datasize)
>  {
> -    trace(event, x1, x2, x3, 0, 0, 0);
> +    unsigned int idx, rec_off;
> +    uint32_t rec_len = ST_REC_HDR_LEN + datasize;
> +    uint64_t timestamp_ns = get_clock();
> +
> +    if ((rec_len + trace_idx - writeout_idx) > TRACE_BUF_LEN) {
> +        /* Trace Buffer Full, Event dropped ! */
> +        dropped_events++;
> +        return 1;

-ENOSPC?  A negative errno is clearer than 0 - success, 1 - failure.

> +    }
> +    idx = g_atomic_int_exchange_and_add((gint *)&trace_idx, rec_len) % TRACE_BUF_LEN;

How does this deal with the race condition of two or more threads
running trace_alloc_record() concurrently when the buffer reaches max
size?  I think two or more threads could think they have enough space
because trace_idx hasn't been updated yet between the buffer length
check and the atomic update.

> +
> +    /*  To check later if threshold crossed */
> +    rec->next_tbuf_idx = trace_idx % TRACE_BUF_LEN;
> +
> +    rec_off = idx;
> +    write_to_buffer(rec_off, (uint8_t*)&event, sizeof(event));
> +    rec_off += sizeof(event);
> +    write_to_buffer(rec_off, (uint8_t*)&timestamp_ns, sizeof(timestamp_ns));
> +    rec_off += sizeof(timestamp_ns);
> +    write_to_buffer(rec_off, (uint8_t*)&rec_len, sizeof(rec_len));
> +    rec_off += sizeof(rec_len);

This suggests a pattern where write_to_buffer() returns the new index:

rec_off = write_to_buffer(rec_off, (uint8_t*)&event, sizeof(event));

That way you don't need to do the separate sizeof() additions.

> +
> +    rec->tbuf_idx = idx;
> +    rec->rec_off  = (idx + ST_REC_HDR_LEN) % TRACE_BUF_LEN;
> +    return 0;
>  }
>
> -void trace4(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4)
> +static void read_from_buffer(unsigned int idx, uint8_t *dataptr, uint32_t size)

size_t is the type to use for memory size units.

>  {
> -    trace(event, x1, x2, x3, x4, 0, 0);
> +    uint32_t x = 0;
> +    while (x < size) {
> +        if (idx >= TRACE_BUF_LEN) {
> +            idx = idx % TRACE_BUF_LEN;
> +        }
> +        dataptr[x++] = trace_buf[idx++];
> +    }
>  }
>
> -void trace5(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4, uint64_t x5)
> +static void write_to_buffer(unsigned int idx, uint8_t *dataptr, uint32_t size)
>  {
> -    trace(event, x1, x2, x3, x4, x5, 0);
> +    uint32_t x = 0;
> +    while (x < size) {
> +        if (idx >= TRACE_BUF_LEN) {
> +            idx = idx % TRACE_BUF_LEN;
> +        }
> +        trace_buf[idx++] = dataptr[x++];
> +    }
>  }
>
> -void trace6(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4, uint64_t x5, uint64_t x6)
> +void trace_mark_record_complete(TraceBufferRecord *rec)
>  {
> -    trace(event, x1, x2, x3, x4, x5, x6);
> +    uint8_t temp_rec[ST_REC_HDR_LEN];
> +    TraceRecord *record = (TraceRecord *) temp_rec;
> +    read_from_buffer(rec->tbuf_idx, temp_rec, ST_REC_HDR_LEN);
> +    __sync_synchronize(); /* write barrier before marking as valid */
> +    record->event |= TRACE_RECORD_VALID;
> +    write_to_buffer(rec->tbuf_idx, temp_rec, ST_REC_HDR_LEN);
> +
> +    if (rec->next_tbuf_idx > TRACE_BUF_FLUSH_THRESHOLD &&
> +        rec->tbuf_idx <= TRACE_BUF_FLUSH_THRESHOLD) {
> +        flush_trace_file(false);
> +    }
>  }
>
> +
> +
>  void st_set_trace_file_enabled(bool enable)
>  {
>     if (enable == !!trace_fp) {
> @@ -228,10 +286,11 @@ void st_set_trace_file_enabled(bool enable)
>     flush_trace_file(true);
>
>     if (enable) {
> -        static const TraceRecord header = {
> -            .event = HEADER_EVENT_ID,
> -            .timestamp_ns = HEADER_MAGIC,
> -            .x1 = HEADER_VERSION,
> +        static const TraceRecordHeader header = {
> +            .header_event_id = HEADER_EVENT_ID,
> +            .header_magic = HEADER_MAGIC,
> +            /* Older log readers will check for version at next location */
> +            .header_version = HEADER_VERSION,
>         };
>
>         trace_fp = fopen(trace_file_name, "wb");
> diff --git a/trace/simple.h b/trace/simple.h
> index 6b5358c..06ea778 100644
> --- a/trace/simple.h
> +++ b/trace/simple.h
> @@ -22,16 +22,40 @@ typedef struct {
>     bool state;
>  } TraceEvent;
>
> -void trace0(TraceEventID event);
> -void trace1(TraceEventID event, uint64_t x1);
> -void trace2(TraceEventID event, uint64_t x1, uint64_t x2);
> -void trace3(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3);
> -void trace4(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4);
> -void trace5(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4, uint64_t x5);
> -void trace6(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4, uint64_t x5, uint64_t x6);
>  void st_print_trace_file_status(FILE *stream, fprintf_function stream_printf);
>  void st_set_trace_file_enabled(bool enable);
>  bool st_set_trace_file(const char *file);
>  void st_flush_trace_buffer(void);
>
> +typedef struct {
> +    unsigned int tbuf_idx;
> +    unsigned int next_tbuf_idx;
> +    unsigned int rec_off;
> +} TraceBufferRecord;
> +
> +/* *

Any reason why you use "/* *", usually it's "/**".

> + * Initialize a trace record and claim space for it in the buffer
> + *
> + * @arglen  number of bytes required for arguments
> + */
> +int trace_record_start(TraceBufferRecord *rec, TraceEventID id, size_t arglen);
> +
> +/* *
> + * Append a 64-bit argument to a trace record
> + */
> +void trace_record_write_u64(TraceBufferRecord *rec, uint64_t val);
> +
> +/* *
> + * Append a string argument to a trace record
> + */
> +void trace_record_write_str(TraceBufferRecord *rec, const char *s);
> +
> +/* *
> + * Mark a trace record completed
> + *
> + * Don't append any more arguments to the trace record after calling this.
> + */
> +void trace_record_finish(TraceBufferRecord *rec);
> +uint32_t safe_strlen(const char* str);

Given that safe_strlen() is only used once and a generic utility (not
specific to the simple trace backend), I suggest dropping it an just
open coding the tristate operator: s ? strlen(s) : 0.

Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] Update simpletrace.py to support new v2 log format
  2012-05-24  9:50 ` [Qemu-devel] [PATCH v2 3/3] Update simpletrace.py to support new v2 log format Harsh Prateek Bora
@ 2012-06-07 14:49   ` Stefan Hajnoczi
  0 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2012-06-07 14:49 UTC (permalink / raw)
  To: Harsh Prateek Bora; +Cc: aneesh.kumar, qemu-devel, stefanha

On Thu, May 24, 2012 at 10:50 AM, Harsh Prateek Bora
<harsh@linux.vnet.ibm.com> wrote:
> -            fields = [event[0], '%0.3f' % (delta_ns / 1000.0)]
> -            for i in xrange(1, len(event)):
> -                fields.append('%s=0x%x' % (event[i], rec[i + 1]))
> +            if rec[0] == dropped_event_id:
> +                fields = ['Dropped_Event', '%0.3f' % (delta_ns / 1000.0)]
> +                fields.append('%s=0x%x' % ("dropped_events", rec[2]))
> +            else:

Why the special case?  Can the dropped event be an event so the normal
code path pretty-prints it?

> +                fields = [event.name, '%0.3f' % (delta_ns / 1000.0)]
> +                if log_version == 0:

The global log_version variable splits the code.  A nice solution is
to encapsulate the format-specific behavior in a SimpleTraceV1 and
SimpleTraceV2 class - both provide the same interface and the code
that calls it doesn't care which exact version is used.

That said I'm happy for us to drop simpletrace v1 pretty-printing
entirely.  Users can only generate v2 files with this QEMU binary.  If
they want to view old files they also need the old trace-events file -
they might as well use the old simpletrace.py too!  Feel free to focus
on v2 only and drop the v1 support, you can add a nice error message
explaining that this file uses the v1 format and should be used with
the script that came with that QEMU binary.

> +                    for type, name in event.args:
> +                        fields.append('%s=0x%x' % (name, rec[i + 1]))
> +                        i += 1
> +                elif log_version == 2:
> +                    for type, name in event.args:
> +                        if is_string(type):
> +                             fields.append('%s=%s' % (name, rec[i + 1]))
> +                        else:
> +                             fields.append('%s=0x%x' % (name, rec[i + 1]))
> +                        i += 1
>             print ' '.join(fields)
>
>     run(Formatter())
> --
> 1.7.1.1
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] Simpletrace v2: Add support for multiple args, strings.
  2012-06-07 14:32   ` Stefan Hajnoczi
@ 2012-06-11 12:31     ` Harsh Bora
  2012-06-11 14:55       ` Stefan Hajnoczi
  0 siblings, 1 reply; 9+ messages in thread
From: Harsh Bora @ 2012-06-11 12:31 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: aneesh.kumar, qemu-devel, stefanha

On 06/07/2012 08:02 PM, Stefan Hajnoczi wrote:
> On Thu, May 24, 2012 at 10:50 AM, Harsh Prateek Bora
> <harsh@linux.vnet.ibm.com>  wrote:
>> A newer tracelog format which gets rid of fixed size trace records and
>> therefore allows to trace multiple arguments as well as strings in trace
>> events.
>>
>> Sample trace:
>> v9fs_version 0.000 tag=0xffff id=0x64 msize=0x2000 version=9P2000.L
>> v9fs_version_return 6.705 tag=0xffff id=0x64 msize=0x2000 version=9P2000.L
>> v9fs_attach 174.467 tag=0x1 id=0x68 fid=0x0 afid=0xffffffffffffffff
>> uname=nobody aname=
>> v9fs_attach_return 4720.454 tag=0x1 id=0x68 type=0xffffffffffffff80
>> version=0x4f2a4dd0  path=0x220ea6
>>
>> Signed-off-by: Harsh Prateek Bora<harsh@linux.vnet.ibm.com>
>> ---
>>   scripts/tracetool/backend/simple.py |   84 ++++++++++---
>>   trace/simple.c                      |  229 ++++++++++++++++++++++-------------
>>   trace/simple.h                      |   38 +++++-
>>   3 files changed, 240 insertions(+), 111 deletions(-)
>
> Sorry for the delay.  Mostly easy to fix comments, some care still
> needed to avoid race conditions and use memory barriers in the right
> places.
>

Hi Stefan,
Thanks for reviewing, I have some questions below:

>> @@ -75,16 +96,22 @@ static char *trace_file_name = NULL;
>>   *
>>   * Returns false if the record is not valid.
>>   */
>> -static bool get_trace_record(unsigned int idx, TraceRecord *record)
>> +static bool get_trace_record(unsigned int idx, TraceRecord **recordptr)
>>   {
>> -    if (!(trace_buf[idx].event&  TRACE_RECORD_VALID)) {
>> +    uint8_t temp_rec[ST_REC_HDR_LEN];
>> +    TraceRecord *record = (TraceRecord *) temp_rec;
>> +    read_from_buffer(idx, temp_rec, ST_REC_HDR_LEN);
>> +
>> +    if (!(record->event&  TRACE_RECORD_VALID)) {
>>          return false;
>>      }
>>
>>      __sync_synchronize(); /* read memory barrier before accessing record */
>
> The need for the memory barrier is no longer clear.  Previously we
> were directly accessing the trace ring buffer, and therefore needed to
> ensure fields were settled before accessing them.  Now we use
> read_from_buffer() which copies the data into our temporary struct on
> the stack.
>
> I think the best way of doing it is to read the event field first in a
> separate step, then do the read memory barrier, and then read the rest
> of the record.  This ensures that the event field is at least as "old"
> as the other fields we read.

Currently, it does a two step read:

read header (which includes event field and length of record as well)
sync
read rest of record (using length from header)

Are you suggesting this:

read event field only
sync (if event valid)
read header (for length)
sync
read rest of record (using length)

or

read event field only
sync (if event valid)
read header (for length)
read rest of record

As we cant read the whole record in one step without knowing its length.



>
>>
>> -    *record = trace_buf[idx];
>> -    record->event&= ~TRACE_RECORD_VALID;
>> +    *recordptr = g_malloc(record->length);
>> +    /* make a copy of record to avoid being overwritten */
>> +    read_from_buffer(idx, (uint8_t *)*recordptr, record->length);
>> +    (*recordptr)->event&= ~TRACE_RECORD_VALID;
>>      return true;
>>   }
>>

[....]

>>
>> -void trace3(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3)
>> +int trace_alloc_record(TraceBufferRecord *rec, TraceEventID event, uint32_t datasize)
>>   {
>> -    trace(event, x1, x2, x3, 0, 0, 0);
>> +    unsigned int idx, rec_off;
>> +    uint32_t rec_len = ST_REC_HDR_LEN + datasize;
>> +    uint64_t timestamp_ns = get_clock();
>> +
>> +    if ((rec_len + trace_idx - writeout_idx)>  TRACE_BUF_LEN) {
>> +        /* Trace Buffer Full, Event dropped ! */
>> +        dropped_events++;
>> +        return 1;
>
> -ENOSPC?  A negative errno is clearer than 0 - success, 1 - failure.
>
>> +    }
>> +    idx = g_atomic_int_exchange_and_add((gint *)&trace_idx, rec_len) % TRACE_BUF_LEN;
>
> How does this deal with the race condition of two or more threads
> running trace_alloc_record() concurrently when the buffer reaches max
> size?  I think two or more threads could think they have enough space
> because trace_idx hasn't been updated yet between the buffer length
> check and the atomic update.

Are you suggesting to use locking here ? I couldnt find a test-and-set 
alternative in glib2. Does qemu have access to any such interface ?

>
>> +
>> +    /*  To check later if threshold crossed */
>> +    rec->next_tbuf_idx = trace_idx % TRACE_BUF_LEN;
>> +
>> +    rec_off = idx;
>> +    write_to_buffer(rec_off, (uint8_t*)&event, sizeof(event));
>> +    rec_off += sizeof(event);
>> +    write_to_buffer(rec_off, (uint8_t*)&timestamp_ns, sizeof(timestamp_ns));
>> +    rec_off += sizeof(timestamp_ns);
>> +    write_to_buffer(rec_off, (uint8_t*)&rec_len, sizeof(rec_len));
>> +    rec_off += sizeof(rec_len);
>
> This suggests a pattern where write_to_buffer() returns the new index:
>
> rec_off = write_to_buffer(rec_off, (uint8_t*)&event, sizeof(event));
>
> That way you don't need to do the separate sizeof() additions.
>
>> +
>> +    rec->tbuf_idx = idx;
>> +    rec->rec_off  = (idx + ST_REC_HDR_LEN) % TRACE_BUF_LEN;
>> +    return 0;
>>   }
>>
>> -void trace4(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3, uint64_t x4)
>> +static void read_from_buffer(unsigned int idx, uint8_t *dataptr, uint32_t size)
>
> size_t is the type to use for memory size units.
>

[...]

>> +void trace_record_finish(TraceBufferRecord *rec);
>> +uint32_t safe_strlen(const char* str);
>
> Given that safe_strlen() is only used once and a generic utility (not
> specific to the simple trace backend), I suggest dropping it an just
> open coding the tristate operator: s ? strlen(s) : 0.
>

safe_strlen is used multiple times in auto-generated code by tracetool 
in expression for calculating the sum of size of trace arguments which 
as of now looks like:

8 + 8 + 4 + safe_strlen(str1) + 8 + 4 + safe_strlen(str2) ...

for tracing events with string arguments. For trace events with multiple 
strings, this expression is more readable as compared to having an 
expression with tristate operator like this:

8 + 8 + 4 + (str1 ? strlen(str1) : 0) + 8  + 4 + (str2 ? strlen(str2) : 
0) ...


I agree that its a generic function and need not be placed in tracing 
code, let me know the right place where to put it if you think its worth 
keeping it.

regards,
Harsh


> Stefan
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] Simpletrace v2: Add support for multiple args, strings.
  2012-06-11 12:31     ` Harsh Bora
@ 2012-06-11 14:55       ` Stefan Hajnoczi
  2012-06-12 11:31         ` Harsh Bora
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2012-06-11 14:55 UTC (permalink / raw)
  To: Harsh Bora; +Cc: aneesh.kumar, qemu-devel, stefanha

On Mon, Jun 11, 2012 at 1:31 PM, Harsh Bora <harsh@linux.vnet.ibm.com> wrote:
> On 06/07/2012 08:02 PM, Stefan Hajnoczi wrote:
>>
>> On Thu, May 24, 2012 at 10:50 AM, Harsh Prateek Bora
>> <harsh@linux.vnet.ibm.com>  wrote:
>>> @@ -75,16 +96,22 @@ static char *trace_file_name = NULL;
>>>  *
>>>  * Returns false if the record is not valid.
>>>  */
>>> -static bool get_trace_record(unsigned int idx, TraceRecord *record)
>>> +static bool get_trace_record(unsigned int idx, TraceRecord **recordptr)
>>>  {
>>> -    if (!(trace_buf[idx].event&  TRACE_RECORD_VALID)) {
>>>
>>> +    uint8_t temp_rec[ST_REC_HDR_LEN];
>>> +    TraceRecord *record = (TraceRecord *) temp_rec;
>>> +    read_from_buffer(idx, temp_rec, ST_REC_HDR_LEN);
>>> +
>>> +    if (!(record->event&  TRACE_RECORD_VALID)) {
>>>
>>>         return false;
>>>     }
>>>
>>>     __sync_synchronize(); /* read memory barrier before accessing record
>>> */
>>
>>
>> The need for the memory barrier is no longer clear.  Previously we
>> were directly accessing the trace ring buffer, and therefore needed to
>> ensure fields were settled before accessing them.  Now we use
>> read_from_buffer() which copies the data into our temporary struct on
>> the stack.
>>
>> I think the best way of doing it is to read the event field first in a
>> separate step, then do the read memory barrier, and then read the rest
>> of the record.  This ensures that the event field is at least as "old"
>> as the other fields we read.
>
>
> Currently, it does a two step read:
>
> read header (which includes event field and length of record as well)
> sync
> read rest of record (using length from header)
>
> Are you suggesting this:
>
> read event field only
> sync (if event valid)
> read header (for length)
> sync
> read rest of record (using length)
>
> or
>
> read event field only
> sync (if event valid)
> read header (for length)
> read rest of record

header, sync, rest of header + payload

The reason for the read barrier is to ensure that we don't read stale
header fields (e.g. length) together with an up-to-date "valid" event
field.

>>> -    *record = trace_buf[idx];
>>> -    record->event&= ~TRACE_RECORD_VALID;
>>>
>>> +    *recordptr = g_malloc(record->length);
>>> +    /* make a copy of record to avoid being overwritten */
>>> +    read_from_buffer(idx, (uint8_t *)*recordptr, record->length);
>>> +    (*recordptr)->event&= ~TRACE_RECORD_VALID;
>>>     return true;
>>>  }
>>>
>
> [....]
>
>
>>>
>>> -void trace3(TraceEventID event, uint64_t x1, uint64_t x2, uint64_t x3)
>>> +int trace_alloc_record(TraceBufferRecord *rec, TraceEventID event,
>>> uint32_t datasize)
>>>  {
>>> -    trace(event, x1, x2, x3, 0, 0, 0);
>>> +    unsigned int idx, rec_off;
>>> +    uint32_t rec_len = ST_REC_HDR_LEN + datasize;
>>> +    uint64_t timestamp_ns = get_clock();
>>> +
>>> +    if ((rec_len + trace_idx - writeout_idx)>  TRACE_BUF_LEN) {
>>> +        /* Trace Buffer Full, Event dropped ! */
>>> +        dropped_events++;
>>> +        return 1;
>>
>>
>> -ENOSPC?  A negative errno is clearer than 0 - success, 1 - failure.
>>
>>> +    }
>>> +    idx = g_atomic_int_exchange_and_add((gint *)&trace_idx, rec_len) %
>>> TRACE_BUF_LEN;
>>
>>
>> How does this deal with the race condition of two or more threads
>> running trace_alloc_record() concurrently when the buffer reaches max
>> size?  I think two or more threads could think they have enough space
>> because trace_idx hasn't been updated yet between the buffer length
>> check and the atomic update.
>
>
> Are you suggesting to use locking here ? I couldnt find a test-and-set
> alternative in glib2. Does qemu have access to any such interface ?

How about:
http://developer.gnome.org/glib/2.32/glib-Atomic-Operations.html#g-atomic-int-compare-and-exchange

I think this becomes:

while True:
    old_idx = trace_idx
    rmb()
    new_idx = old_idx + rec_len

    if new_idx - write_idx > TRACE_BUF_LEN:
        dropped_events++
        return

    if g_atomic_int_compare_and_exchange((gint *)&trace_idx, old_idx, new_idx):
        break

Now we've reserved [new_idx, new_idx + rec_len).

>>> +void trace_record_finish(TraceBufferRecord *rec);
>>> +uint32_t safe_strlen(const char* str);
>>
>>
>> Given that safe_strlen() is only used once and a generic utility (not
>> specific to the simple trace backend), I suggest dropping it an just
>> open coding the tristate operator: s ? strlen(s) : 0.
>>
>
> safe_strlen is used multiple times in auto-generated code by tracetool in
> expression for calculating the sum of size of trace arguments which as of
> now looks like:
>
> 8 + 8 + 4 + safe_strlen(str1) + 8 + 4 + safe_strlen(str2) ...
>
> for tracing events with string arguments. For trace events with multiple
> strings, this expression is more readable as compared to having an
> expression with tristate operator like this:
>
> 8 + 8 + 4 + (str1 ? strlen(str1) : 0) + 8  + 4 + (str2 ? strlen(str2) : 0)
> ...
>
>
> I agree that its a generic function and need not be placed in tracing code,
> let me know the right place where to put it if you think its worth keeping
> it.

The generic place to put it is cutils.c.  It's up to you if you want to keep it.

Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] Simpletrace v2: Add support for multiple args, strings.
  2012-06-11 14:55       ` Stefan Hajnoczi
@ 2012-06-12 11:31         ` Harsh Bora
  0 siblings, 0 replies; 9+ messages in thread
From: Harsh Bora @ 2012-06-12 11:31 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: aneesh.kumar, stefanha, qemu-devel

On 06/11/2012 08:25 PM, Stefan Hajnoczi wrote:
> On Mon, Jun 11, 2012 at 1:31 PM, Harsh Bora<harsh@linux.vnet.ibm.com>  wrote:
>> On 06/07/2012 08:02 PM, Stefan Hajnoczi wrote:
>>>
>>> On Thu, May 24, 2012 at 10:50 AM, Harsh Prateek Bora
>>> <harsh@linux.vnet.ibm.com>    wrote:
>>>> @@ -75,16 +96,22 @@ static char *trace_file_name = NULL;
>>>>   *
>>>>   * Returns false if the record is not valid.
>>>>   */
>>>> -static bool get_trace_record(unsigned int idx, TraceRecord *record)
>>>> +static bool get_trace_record(unsigned int idx, TraceRecord **recordptr)
>>>>   {
>>>> -    if (!(trace_buf[idx].event&    TRACE_RECORD_VALID)) {
>>>>
>>>> +    uint8_t temp_rec[ST_REC_HDR_LEN];
>>>> +    TraceRecord *record = (TraceRecord *) temp_rec;
>>>> +    read_from_buffer(idx, temp_rec, ST_REC_HDR_LEN);
>>>> +
>>>> +    if (!(record->event&    TRACE_RECORD_VALID)) {
>>>>
>>>>          return false;
>>>>      }
>>>>
>>>>      __sync_synchronize(); /* read memory barrier before accessing record
>>>> */
>>>
>>>
>>> The need for the memory barrier is no longer clear.  Previously we
>>> were directly accessing the trace ring buffer, and therefore needed to
>>> ensure fields were settled before accessing them.  Now we use
>>> read_from_buffer() which copies the data into our temporary struct on
>>> the stack.
>>>
>>> I think the best way of doing it is to read the event field first in a
>>> separate step, then do the read memory barrier, and then read the rest
>>> of the record.  This ensures that the event field is at least as "old"
>>> as the other fields we read.
>>
>>
>> Currently, it does a two step read:
>>
>> read header (which includes event field and length of record as well)
>> sync
>> read rest of record (using length from header)
>>
>> Are you suggesting this:
>>
>> read event field only
>> sync (if event valid)
>> read header (for length)
>> sync
>> read rest of record (using length)
>>
>> or
>>
>> read event field only
>> sync (if event valid)
>> read header (for length)
>> read rest of record
>
> header, sync, rest of header + payload

I guess, we should read complete header after sync as well.

>
> The reason for the read barrier is to ensure that we don't read stale
> header fields (e.g. length) together with an up-to-date "valid" event
> field.
>
>>>> -    *record = trace_buf[idx];
>>>> -    record->event&= ~TRACE_RECORD_VALID;
>>>>
>>>> +    *recordptr = g_malloc(record->length);
>>>> +    /* make a copy of record to avoid being overwritten */
>>>> +    read_from_buffer(idx, (uint8_t *)*recordptr, record->length);
>>>> +    (*recordptr)->event&= ~TRACE_RECORD_VALID;
>>>>      return true;
>>>>   }
>>>>
>>
>> [....]

[...]

>>
>>
>> Are you suggesting to use locking here ? I couldnt find a test-and-set
>> alternative in glib2. Does qemu have access to any such interface ?
>
> How about:
> http://developer.gnome.org/glib/2.32/glib-Atomic-Operations.html#g-atomic-int-compare-and-exchange
>
> I think this becomes:
>
> while True:
>      old_idx = trace_idx
>      rmb()
>      new_idx = old_idx + rec_len
>
>      if new_idx - write_idx>  TRACE_BUF_LEN:
>          dropped_events++
>          return
>
>      if g_atomic_int_compare_and_exchange((gint *)&trace_idx, old_idx, new_idx):
>          break
>
> Now we've reserved [new_idx, new_idx + rec_len).

Hmm :) I realized that after sending mail, however thanks for the algo.

>
>>>> +void trace_record_finish(TraceBufferRecord *rec);
>>>> +uint32_t safe_strlen(const char* str);
>>>
>>>
>>> Given that safe_strlen() is only used once and a generic utility (not
>>> specific to the simple trace backend), I suggest dropping it an just
>>> open coding the tristate operator: s ? strlen(s) : 0.
>>>
>>
>> safe_strlen is used multiple times in auto-generated code by tracetool in
>> expression for calculating the sum of size of trace arguments which as of
>> now looks like:
>>
>> 8 + 8 + 4 + safe_strlen(str1) + 8 + 4 + safe_strlen(str2) ...
>>
>> for tracing events with string arguments. For trace events with multiple
>> strings, this expression is more readable as compared to having an
>> expression with tristate operator like this:
>>
>> 8 + 8 + 4 + (str1 ? strlen(str1) : 0) + 8  + 4 + (str2 ? strlen(str2) : 0)
>> ...
>>
>>
>> I agree that its a generic function and need not be placed in tracing code,
>> let me know the right place where to put it if you think its worth keeping
>> it.
>
> The generic place to put it is cutils.c.  It's up to you if you want to keep it.

Hmm, I gave it a second thought and realized that having a generic 
qemu_strlen would require to return a negative value when string is null 
as non-null strings can return 0 as well, however tracing code required 
value 0 when string is NULL, so I dropped the idea of qemu_strlen 
(safe_strlen) and using tristate operators only.

Sending the updated patch series in a while.

regards,
Harsh

>
> Stefan
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-06-12 11:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-24  9:50 [Qemu-devel] [PATCH v2 0/3] Simpletrace v2: Support multiple args, strings Harsh Prateek Bora
2012-05-24  9:50 ` [Qemu-devel] [PATCH v2 1/3] monitor: remove unused do_info_trace Harsh Prateek Bora
2012-05-24  9:50 ` [Qemu-devel] [PATCH v2 2/3] Simpletrace v2: Add support for multiple args, strings Harsh Prateek Bora
2012-06-07 14:32   ` Stefan Hajnoczi
2012-06-11 12:31     ` Harsh Bora
2012-06-11 14:55       ` Stefan Hajnoczi
2012-06-12 11:31         ` Harsh Bora
2012-05-24  9:50 ` [Qemu-devel] [PATCH v2 3/3] Update simpletrace.py to support new v2 log format Harsh Prateek Bora
2012-06-07 14:49   ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).