linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv3 0/5] perf, tool: Fix endian issues
@ 2012-05-07 11:29 Jiri Olsa
  2012-05-07 11:29 ` [PATCH 1/5] perf, tool: Handle different endians properly during symbol load Jiri Olsa
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Jiri Olsa @ 2012-05-07 11:29 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: linux-kernel, dsahern, Caspar Zhang

hi,
sending fixies to properly handle perf.data endians.

v3 changes:
  - added patch 5 to fix addons bitmask handling

v2 changes:
  - added patches 3 and 4 to handle sample_id_all header endianity


Attached patches:
  1/5 perf, tool: Handle different endians properly during symbol load
  2/5 perf, tool: Carry perf_event_attr bitfield throught different endians
  3/5 perf, tool: Handle endianity swap on sample_id_all header data
  4/5 perf, tool: Fix 32 bit values endianity swap for sample_id_all header
  5/5 perf, tool: Fix endianity trick for adds_features bitmask


Tested by running following usecase:
  - origin system:
    # perf record -a -- sleep 10 (any perf record will do)
    # perf report > report.origin
    # perf archive perf.data
    
  - copy the perf.data, report.origin and perf.data.tar.bz2
    to a target system and run:
    # tar xjvf perf.data.tar.bz2 -C ~/.debug
    # perf report > report.target
    # diff -u report.origin report.target
    
  - the diff should produce no output
    (besides some white space stuff and possibly different
     date/TZ output)

Tested by above usecase cross following architectures:
  i386, x86_64, s390x, ppc64

Big thank to Caspar Zhang who verified this within RH QE testsuites.

thanks,
jirka

CC: Caspar Zhang <czhang@redhat.com>
---
 tools/perf/util/evsel.c                |   32 ++++++++--
 tools/perf/util/header.c               |   20 +++++--
 tools/perf/util/include/linux/bitops.h |    1 +
 tools/perf/util/session.c              |  101 ++++++++++++++++++++++++++++----
 tools/perf/util/symbol.c               |   33 ++++++++++-
 tools/perf/util/symbol.h               |   30 +++++++++
 6 files changed, 192 insertions(+), 25 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/5] perf, tool: Handle different endians properly during symbol load
  2012-05-07 11:29 [PATCHv3 0/5] perf, tool: Fix endian issues Jiri Olsa
@ 2012-05-07 11:29 ` Jiri Olsa
  2012-05-07 11:29 ` [PATCH 2/5] perf, tool: Carry perf_event_attr bitfield throught different endians Jiri Olsa
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Jiri Olsa @ 2012-05-07 11:29 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: linux-kernel, dsahern, Jiri Olsa

Currently we dont care about the file object's endianness. It's possible
we read buildid file object from different architecture than we are
currentlly running on. So we need to care about properly reading such
object's data - handle different endianness properly.

Adding:
	needs_swap DSO field
	dso__swap_init function to initialize DSO's needs_swap
	DSO__READ to read the data with proper swaps

Note, running following to test perf endianity handling:
  - origin system:
    # perf record -a -- sleep 10 (any perf record will do)
    # perf report > report.origin
    # perf archive perf.data

  - copy the perf.data, report.origin and perf.data.tar.bz2
    to a target system and run:
    # tar xjvf perf.data.tar.bz2 -C ~/.debug
    # perf report > report.target
    # diff -u report.origin report.target

  - the diff should produce no output
    (besides some white space stuff and possibly different
     date/TZ output)

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/util/symbol.c |   33 ++++++++++++++++++++++++++++++++-
 tools/perf/util/symbol.h |   30 ++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 1 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index ab9867b..a63f15e 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -323,6 +323,7 @@ struct dso *dso__new(const char *name)
 		dso->sorted_by_name = 0;
 		dso->has_build_id = 0;
 		dso->kernel = DSO_TYPE_USER;
+		dso->needs_swap = DSO_SWAP__UNSET;
 		INIT_LIST_HEAD(&dso->node);
 	}
 
@@ -1156,6 +1157,33 @@ static size_t elf_addr_to_index(Elf *elf, GElf_Addr addr)
 	return -1;
 }
 
+static int dso__swap_init(struct dso *dso, unsigned char eidata)
+{
+	static unsigned int const endian = 1;
+
+	dso->needs_swap = DSO_SWAP__NO;
+
+	switch (eidata) {
+	case ELFDATA2LSB:
+		/* We are big endian, DSO is little endian. */
+		if (*(unsigned char const *)&endian != 1)
+			dso->needs_swap = DSO_SWAP__YES;
+		break;
+
+	case ELFDATA2MSB:
+		/* We are little endian, DSO is big endian. */
+		if (*(unsigned char const *)&endian != 0)
+			dso->needs_swap = DSO_SWAP__YES;
+		break;
+
+	default:
+		pr_err("unrecognized DSO data encoding %d\n", eidata);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int dso__load_sym(struct dso *dso, struct map *map, const char *name,
 			 int fd, symbol_filter_t filter, int kmodule,
 			 int want_symtab)
@@ -1187,6 +1215,9 @@ static int dso__load_sym(struct dso *dso, struct map *map, const char *name,
 		goto out_elf_end;
 	}
 
+	if (dso__swap_init(dso, ehdr.e_ident[EI_DATA]))
+		goto out_elf_end;
+
 	/* Always reject images with a mismatched build-id: */
 	if (dso->has_build_id) {
 		u8 build_id[BUILD_ID_SIZE];
@@ -1272,7 +1303,7 @@ static int dso__load_sym(struct dso *dso, struct map *map, const char *name,
 		if (opdsec && sym.st_shndx == opdidx) {
 			u32 offset = sym.st_value - opdshdr.sh_addr;
 			u64 *opd = opddata->d_buf + offset;
-			sym.st_value = *opd;
+			sym.st_value = DSO__READ(dso, u64, *opd);
 			sym.st_shndx = elf_addr_to_index(elf, sym.st_value);
 		}
 
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 1f00388..40a3254 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -9,6 +9,7 @@
 #include <linux/list.h>
 #include <linux/rbtree.h>
 #include <stdio.h>
+#include <byteswap.h>
 
 #ifdef HAVE_CPLUS_DEMANGLE
 extern char *cplus_demangle(const char *, int);
@@ -160,11 +161,18 @@ enum dso_kernel_type {
 	DSO_TYPE_GUEST_KERNEL
 };
 
+enum dso_swap_type {
+	DSO_SWAP__UNSET,
+	DSO_SWAP__NO,
+	DSO_SWAP__YES,
+};
+
 struct dso {
 	struct list_head node;
 	struct rb_root	 symbols[MAP__NR_TYPES];
 	struct rb_root	 symbol_names[MAP__NR_TYPES];
 	enum dso_kernel_type	kernel;
+	enum dso_swap_type	needs_swap;
 	u8		 adjust_symbols:1;
 	u8		 has_build_id:1;
 	u8		 hit:1;
@@ -182,6 +190,28 @@ struct dso {
 	char		 name[0];
 };
 
+#define DSO__READ(dso, type, val)			\
+({							\
+	type ____r = val;				\
+	BUG_ON(dso->needs_swap == DSO_SWAP__UNSET);	\
+	if (dso->needs_swap == DSO_SWAP__YES) {		\
+		switch (sizeof(____r)) {		\
+		case 2:					\
+			____r = bswap_16(val);		\
+			break;				\
+		case 4:					\
+			____r = bswap_32(val);		\
+			break;				\
+		case 8:					\
+			____r = bswap_64(val);		\
+			break;				\
+		default:				\
+			BUG_ON(1);			\
+		}					\
+	}						\
+	____r;						\
+})
+
 struct dso *dso__new(const char *name);
 void dso__delete(struct dso *dso);
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/5] perf, tool: Carry perf_event_attr bitfield throught different endians
  2012-05-07 11:29 [PATCHv3 0/5] perf, tool: Fix endian issues Jiri Olsa
  2012-05-07 11:29 ` [PATCH 1/5] perf, tool: Handle different endians properly during symbol load Jiri Olsa
@ 2012-05-07 11:29 ` Jiri Olsa
  2012-05-07 11:29 ` [PATCH 3/5] perf, tool: Handle endianity swap on sample_id_all header data Jiri Olsa
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Jiri Olsa @ 2012-05-07 11:29 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: linux-kernel, dsahern, Jiri Olsa

When the perf data file is read cross architectures, the perf_event__attr_swap
function takes care about endianness of all the struct fields except the
bitfield flags.

The bitfield flags need to be transformed as well, since the bitfield
binary storage differs for both endians.

ABI says:
  Bit-fields are allocated from right to left (least to most significant)
  on little-endian implementations and from left to right (most to least
  significant) on big-endian implementations.

The above seems to be byte specific, so we need to reverse each
byte of the bitfield. 'Internet' also says this might be implementation
specific and we probably need proper fix and carry perf_event_attr
bitfield flags in separate data file FEAT_ section. Thought this seems
to work for now.

Note, running following to test perf endianity handling:
  - origin system:
    # perf record -a -- sleep 10 (any perf record will do)
    # perf report > report.origin
    # perf archive perf.data

 - copy the perf.data, report.origin and perf.data.tar.bz2
   to a target system and run:
   # tar xjvf perf.data.tar.bz2 -C ~/.debug
   # perf report > report.target
   # diff -u report.origin report.target

 - the diff should produce no output
   (besides some white space stuff and possibly different
    date/TZ output)

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/util/session.c |   34 ++++++++++++++++++++++++++++++++++
 1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 1efd3be..07fda7c 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -481,6 +481,38 @@ static void perf_event__read_swap(union perf_event *event)
 	event->read.id		 = bswap_64(event->read.id);
 }
 
+static u8 revbyte(u8 b)
+{
+	int rev = (b >> 4) | ((b & 0xf) << 4);
+	rev = ((rev & 0xcc) >> 2) | ((rev & 0x33) << 2);
+	rev = ((rev & 0xaa) >> 1) | ((rev & 0x55) << 1);
+	return (u8) rev;
+}
+
+/*
+ * XXX this is hack in attempt to carry flags bitfield
+ * throught endian village. ABI says:
+ *
+ * Bit-fields are allocated from right to left (least to most significant)
+ * on little-endian implementations and from left to right (most to least
+ * significant) on big-endian implementations.
+ *
+ * The above seems to be byte specific, so we need to reverse each
+ * byte of the bitfield. 'Internet' also says this might be implementation
+ * specific and we probably need proper fix and carry perf_event_attr
+ * bitfield flags in separate data file FEAT_ section. Thought this seems
+ * to work for now.
+ */
+static void swap_bitfield(u8 *p, unsigned len)
+{
+	unsigned i;
+
+	for (i = 0; i < len; i++) {
+		*p = revbyte(*p);
+		p++;
+	}
+}
+
 /* exported for swapping attributes in file header */
 void perf_event__attr_swap(struct perf_event_attr *attr)
 {
@@ -494,6 +526,8 @@ void perf_event__attr_swap(struct perf_event_attr *attr)
 	attr->bp_type		= bswap_32(attr->bp_type);
 	attr->bp_addr		= bswap_64(attr->bp_addr);
 	attr->bp_len		= bswap_64(attr->bp_len);
+
+	swap_bitfield((u8 *) (&attr->read_format + 1), sizeof(u64));
 }
 
 static void perf_event__hdr_attr_swap(union perf_event *event)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/5] perf, tool: Handle endianity swap on sample_id_all header data
  2012-05-07 11:29 [PATCHv3 0/5] perf, tool: Fix endian issues Jiri Olsa
  2012-05-07 11:29 ` [PATCH 1/5] perf, tool: Handle different endians properly during symbol load Jiri Olsa
  2012-05-07 11:29 ` [PATCH 2/5] perf, tool: Carry perf_event_attr bitfield throught different endians Jiri Olsa
@ 2012-05-07 11:29 ` Jiri Olsa
  2012-05-07 11:29 ` [PATCH 4/5] perf, tool: Fix 32 bit values endianity swap for sample_id_all header Jiri Olsa
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Jiri Olsa @ 2012-05-07 11:29 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: linux-kernel, dsahern, Jiri Olsa

Adding endianity swapping for event header attached via sample_id_all.

Currently we dont do that and it's causing wrong data to be read when
running report on architecture with different endianity than the record.

Note, running following to test perf endianity handling:
  - origin system:
    # perf record -a -- sleep 10 (any perf record will do)
    # perf report > report.origin
    # perf archive perf.data

  - copy the perf.data, report.origin and perf.data.tar.bz2
    to a target system and run:
    # tar xjvf perf.data.tar.bz2 -C ~/.debug
    # perf report > report.target
    # diff -u report.origin report.target

  - the diff should produce no output
    (besides some white space stuff and possibly different
     date/TZ output)

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/util/session.c |   67 +++++++++++++++++++++++++++++++++++++--------
 1 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 07fda7c..72ce86d 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -441,37 +441,65 @@ void mem_bswap_64(void *src, int byte_size)
 	}
 }
 
-static void perf_event__all64_swap(union perf_event *event)
+static void swap_sample_id_all(union perf_event *event, void *data)
+{
+	void *end = (void *) event + event->header.size;
+	int size = end - data;
+
+	BUG_ON(size % sizeof(u64));
+	mem_bswap_64(data, size);
+}
+
+static void perf_event__all64_swap(union perf_event *event,
+				   bool sample_id_all __used)
 {
 	struct perf_event_header *hdr = &event->header;
 	mem_bswap_64(hdr + 1, event->header.size - sizeof(*hdr));
 }
 
-static void perf_event__comm_swap(union perf_event *event)
+static void perf_event__comm_swap(union perf_event *event, bool sample_id_all)
 {
 	event->comm.pid = bswap_32(event->comm.pid);
 	event->comm.tid = bswap_32(event->comm.tid);
+
+	if (sample_id_all) {
+		void *data = &event->comm.comm;
+
+		data += ALIGN(strlen(data) + 1, sizeof(u64));
+		swap_sample_id_all(event, data);
+	}
 }
 
-static void perf_event__mmap_swap(union perf_event *event)
+static void perf_event__mmap_swap(union perf_event *event,
+				  bool sample_id_all)
 {
 	event->mmap.pid	  = bswap_32(event->mmap.pid);
 	event->mmap.tid	  = bswap_32(event->mmap.tid);
 	event->mmap.start = bswap_64(event->mmap.start);
 	event->mmap.len	  = bswap_64(event->mmap.len);
 	event->mmap.pgoff = bswap_64(event->mmap.pgoff);
+
+	if (sample_id_all) {
+		void *data = &event->mmap.filename;
+
+		data += ALIGN(strlen(data) + 1, sizeof(u64));
+		swap_sample_id_all(event, data);
+	}
 }
 
-static void perf_event__task_swap(union perf_event *event)
+static void perf_event__task_swap(union perf_event *event, bool sample_id_all)
 {
 	event->fork.pid	 = bswap_32(event->fork.pid);
 	event->fork.tid	 = bswap_32(event->fork.tid);
 	event->fork.ppid = bswap_32(event->fork.ppid);
 	event->fork.ptid = bswap_32(event->fork.ptid);
 	event->fork.time = bswap_64(event->fork.time);
+
+	if (sample_id_all)
+		swap_sample_id_all(event, &event->fork + 1);
 }
 
-static void perf_event__read_swap(union perf_event *event)
+static void perf_event__read_swap(union perf_event *event, bool sample_id_all)
 {
 	event->read.pid		 = bswap_32(event->read.pid);
 	event->read.tid		 = bswap_32(event->read.tid);
@@ -479,6 +507,9 @@ static void perf_event__read_swap(union perf_event *event)
 	event->read.time_enabled = bswap_64(event->read.time_enabled);
 	event->read.time_running = bswap_64(event->read.time_running);
 	event->read.id		 = bswap_64(event->read.id);
+
+	if (sample_id_all)
+		swap_sample_id_all(event, &event->read + 1);
 }
 
 static u8 revbyte(u8 b)
@@ -530,7 +561,8 @@ void perf_event__attr_swap(struct perf_event_attr *attr)
 	swap_bitfield((u8 *) (&attr->read_format + 1), sizeof(u64));
 }
 
-static void perf_event__hdr_attr_swap(union perf_event *event)
+static void perf_event__hdr_attr_swap(union perf_event *event,
+				      bool sample_id_all __used)
 {
 	size_t size;
 
@@ -541,18 +573,21 @@ static void perf_event__hdr_attr_swap(union perf_event *event)
 	mem_bswap_64(event->attr.id, size);
 }
 
-static void perf_event__event_type_swap(union perf_event *event)
+static void perf_event__event_type_swap(union perf_event *event,
+					bool sample_id_all __used)
 {
 	event->event_type.event_type.event_id =
 		bswap_64(event->event_type.event_type.event_id);
 }
 
-static void perf_event__tracing_data_swap(union perf_event *event)
+static void perf_event__tracing_data_swap(union perf_event *event,
+					  bool sample_id_all __used)
 {
 	event->tracing_data.size = bswap_32(event->tracing_data.size);
 }
 
-typedef void (*perf_event__swap_op)(union perf_event *event);
+typedef void (*perf_event__swap_op)(union perf_event *event,
+				    bool sample_id_all);
 
 static perf_event__swap_op perf_event__swap_ops[] = {
 	[PERF_RECORD_MMAP]		  = perf_event__mmap_swap,
@@ -986,6 +1021,15 @@ static int perf_session__process_user_event(struct perf_session *session, union
 	}
 }
 
+static void event_swap(union perf_event *event, bool sample_id_all)
+{
+	perf_event__swap_op swap;
+
+	swap = perf_event__swap_ops[event->header.type];
+	if (swap)
+		swap(event, sample_id_all);
+}
+
 static int perf_session__process_event(struct perf_session *session,
 				       union perf_event *event,
 				       struct perf_tool *tool,
@@ -994,9 +1038,8 @@ static int perf_session__process_event(struct perf_session *session,
 	struct perf_sample sample;
 	int ret;
 
-	if (session->header.needs_swap &&
-	    perf_event__swap_ops[event->header.type])
-		perf_event__swap_ops[event->header.type](event);
+	if (session->header.needs_swap)
+		event_swap(event, session->sample_id_all);
 
 	if (event->header.type >= PERF_RECORD_HEADER_MAX)
 		return -EINVAL;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/5] perf, tool: Fix 32 bit values endianity swap for sample_id_all header
  2012-05-07 11:29 [PATCHv3 0/5] perf, tool: Fix endian issues Jiri Olsa
                   ` (2 preceding siblings ...)
  2012-05-07 11:29 ` [PATCH 3/5] perf, tool: Handle endianity swap on sample_id_all header data Jiri Olsa
@ 2012-05-07 11:29 ` Jiri Olsa
  2012-05-07 11:29 ` [PATCH 5/5] perf, tool: Fix endianity trick for adds_features bitmask Jiri Olsa
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Jiri Olsa @ 2012-05-07 11:29 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: linux-kernel, dsahern, Jiri Olsa

We swap the sample_id_all header by u64 pointers. Some members
of the header happen to be 32 bit values. We need to handle them
separatelly.

Note, running following to test perf endianity handling:
  - origin system:
    # perf record -a -- sleep 10 (any perf record will do)
    # perf report > report.origin
    # perf archive perf.data

  - copy the perf.data, report.origin and perf.data.tar.bz2
    to a target system and run:
    # tar xjvf perf.data.tar.bz2 -C ~/.debug
    # perf report > report.target
    # diff -u report.origin report.target

  - the diff should produce no output
    (besides some white space stuff and possibly different
     date/TZ output)

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/util/evsel.c |   32 +++++++++++++++++++++++++-------
 1 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 8c13dbc..ea053b7 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -403,16 +403,27 @@ int perf_evsel__open_per_thread(struct perf_evsel *evsel,
 }
 
 static int perf_event__parse_id_sample(const union perf_event *event, u64 type,
-				       struct perf_sample *sample)
+				       struct perf_sample *sample,
+				       bool swapped)
 {
 	const u64 *array = event->sample.array;
+	union {
+		u64 val64;
+		u32 val32[2];
+	} u;
 
 	array += ((event->header.size -
 		   sizeof(event->header)) / sizeof(u64)) - 1;
 
 	if (type & PERF_SAMPLE_CPU) {
-		u32 *p = (u32 *)array;
-		sample->cpu = *p;
+		u.val64 = *array;
+		if (swapped) {
+			/* undo swap of u64, then swap on individual u32s */
+			u.val64 = bswap_64(u.val64);
+			u.val32[0] = bswap_32(u.val32[0]);
+		}
+
+		sample->cpu = u.val32[0];
 		array--;
 	}
 
@@ -432,9 +443,16 @@ static int perf_event__parse_id_sample(const union perf_event *event, u64 type,
 	}
 
 	if (type & PERF_SAMPLE_TID) {
-		u32 *p = (u32 *)array;
-		sample->pid = p[0];
-		sample->tid = p[1];
+		u.val64 = *array;
+		if (swapped) {
+			/* undo swap of u64, then swap on individual u32s */
+			u.val64 = bswap_64(u.val64);
+			u.val32[0] = bswap_32(u.val32[0]);
+			u.val32[1] = bswap_32(u.val32[1]);
+		}
+
+		sample->pid = u.val32[0];
+		sample->tid = u.val32[1];
 	}
 
 	return 0;
@@ -474,7 +492,7 @@ int perf_event__parse_sample(const union perf_event *event, u64 type,
 	if (event->header.type != PERF_RECORD_SAMPLE) {
 		if (!sample_id_all)
 			return 0;
-		return perf_event__parse_id_sample(event, type, data);
+		return perf_event__parse_id_sample(event, type, data, swapped);
 	}
 
 	array = event->sample.array;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/5] perf, tool: Fix endianity trick for adds_features bitmask
  2012-05-07 11:29 [PATCHv3 0/5] perf, tool: Fix endian issues Jiri Olsa
                   ` (3 preceding siblings ...)
  2012-05-07 11:29 ` [PATCH 4/5] perf, tool: Fix 32 bit values endianity swap for sample_id_all header Jiri Olsa
@ 2012-05-07 11:29 ` Jiri Olsa
  2012-05-07 16:15   ` David Ahern
  2012-05-08  2:12 ` [PATCHv3 0/5] perf, tool: Fix endian issues David Ahern
  2012-05-08  3:49 ` David Ahern
  6 siblings, 1 reply; 12+ messages in thread
From: Jiri Olsa @ 2012-05-07 11:29 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: linux-kernel, dsahern, Jiri Olsa

Addons bitmask is stored as array of unsigned long values. The size
of the unsigned long is same as pointer size for architecture, so it
could differ for each architecture.

To handle the endianity for adds_features bitmask, we first swap the
bitmaks as u64 values and check for HEADER_HOSTNAME bit. If not set we
want to unswap the u64 values and swap the adds_features as u32 values.

This is currently buggy, since we swap just first 32bits of each u64
value. Adding swap of the next 32 bits as well.  Also adding & using
BITS_TO_U64 instead of BITS_TO_LONGS as counter max due to the different
size of unsigned longs per architecture.

Note, running following to test perf endianity handling:
  - origin system:
    # perf record -a -- sleep 10 (any perf record will do)
    # perf report > report.origin
    # perf archive perf.data

  - copy the perf.data, report.origin and perf.data.tar.bz2
    to a target system and run:
    # tar xjvf perf.data.tar.bz2 -C ~/.debug
    # perf report > report.target
    # diff -u report.origin report.target

  - the diff should produce no output
    (besides some white space stuff and possibly different
     date/TZ output)

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/util/header.c               |   20 +++++++++++++++-----
 tools/perf/util/include/linux/bitops.h |    1 +
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index c0b70c6..a56db7e 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1963,13 +1963,23 @@ int perf_file_header__read(struct perf_file_header *header,
 		 * file), punt and fallback to the original behavior --
 		 * clearing all feature bits and setting buildid.
 		 */
-		for (i = 0; i < BITS_TO_LONGS(HEADER_FEAT_BITS); ++i)
-			header->adds_features[i] = bswap_64(header->adds_features[i]);
+
+		mem_bswap_64(&header->adds_features,
+			     BITS_TO_U64(HEADER_FEAT_BITS));
 
 		if (!test_bit(HEADER_HOSTNAME, header->adds_features)) {
-			for (i = 0; i < BITS_TO_LONGS(HEADER_FEAT_BITS); ++i) {
-				header->adds_features[i] = bswap_64(header->adds_features[i]);
-				header->adds_features[i] = bswap_32(header->adds_features[i]);
+			u64 *v = (u64 *) &header->adds_features;
+
+			for (i = 0; i < BITS_TO_U64(HEADER_FEAT_BITS); ++i) {
+				union {
+					u64 val64;
+					u32 val32[2];
+				} u;
+
+				u.val64 = *v++;
+				u.val64 = bswap_64(u.val64);
+				u.val32[0] = bswap_32(u.val32[0]);
+				u.val32[1] = bswap_32(u.val32[1]);
 			}
 		}
 
diff --git a/tools/perf/util/include/linux/bitops.h b/tools/perf/util/include/linux/bitops.h
index f1584833..10096cb 100644
--- a/tools/perf/util/include/linux/bitops.h
+++ b/tools/perf/util/include/linux/bitops.h
@@ -8,6 +8,7 @@
 #define BITS_PER_LONG __WORDSIZE
 #define BITS_PER_BYTE           8
 #define BITS_TO_LONGS(nr)       DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
+#define BITS_TO_U64(nr)         DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(u64))
 
 #define for_each_set_bit(bit, addr, size) \
 	for ((bit) = find_first_bit((addr), (size));		\
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/5] perf, tool: Fix endianity trick for adds_features bitmask
  2012-05-07 11:29 ` [PATCH 5/5] perf, tool: Fix endianity trick for adds_features bitmask Jiri Olsa
@ 2012-05-07 16:15   ` David Ahern
  0 siblings, 0 replies; 12+ messages in thread
From: David Ahern @ 2012-05-07 16:15 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec,
	linux-kernel

On 5/7/12 5:29 AM, Jiri Olsa wrote:
> Addons bitmask is stored as array of unsigned long values. The size
> of the unsigned long is same as pointer size for architecture, so it
> could differ for each architecture.
>
> To handle the endianity for adds_features bitmask, we first swap the
> bitmaks as u64 values and check for HEADER_HOSTNAME bit. If not set we
> want to unswap the u64 values and swap the adds_features as u32 values.
>
> This is currently buggy, since we swap just first 32bits of each u64
> value. Adding swap of the next 32 bits as well.  Also adding&  using
> BITS_TO_U64 instead of BITS_TO_LONGS as counter max due to the different
> size of unsigned longs per architecture.
>
> Note, running following to test perf endianity handling:
>    - origin system:
>      # perf record -a -- sleep 10 (any perf record will do)
>      # perf report>  report.origin
>      # perf archive perf.data
>
>    - copy the perf.data, report.origin and perf.data.tar.bz2
>      to a target system and run:
>      # tar xjvf perf.data.tar.bz2 -C ~/.debug
>      # perf report>  report.target
>      # diff -u report.origin report.target
>
>    - the diff should produce no output
>      (besides some white space stuff and possibly different
>       date/TZ output)

Did you cross bitness and architectures in your test? When I added the 
code you are modifying I verified it worked by analyzing 32-bit PPC 
files on 64-bit x86 and 32-bit x86.

Did you also test old versus new perf commands? perf data collected with 
older command and analyzed with a newer version with this patch? Be sure 
to add the -I switch to get all the processing of the feature mask.

David

>
> Signed-off-by: Jiri Olsa<jolsa@redhat.com>
> ---
>   tools/perf/util/header.c               |   20 +++++++++++++++-----
>   tools/perf/util/include/linux/bitops.h |    1 +
>   2 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index c0b70c6..a56db7e 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -1963,13 +1963,23 @@ int perf_file_header__read(struct perf_file_header *header,
>   		 * file), punt and fallback to the original behavior --
>   		 * clearing all feature bits and setting buildid.
>   		 */
> -		for (i = 0; i<  BITS_TO_LONGS(HEADER_FEAT_BITS); ++i)
> -			header->adds_features[i] = bswap_64(header->adds_features[i]);
> +
> +		mem_bswap_64(&header->adds_features,
> +			     BITS_TO_U64(HEADER_FEAT_BITS));
>
>   		if (!test_bit(HEADER_HOSTNAME, header->adds_features)) {
> -			for (i = 0; i<  BITS_TO_LONGS(HEADER_FEAT_BITS); ++i) {
> -				header->adds_features[i] = bswap_64(header->adds_features[i]);
> -				header->adds_features[i] = bswap_32(header->adds_features[i]);
> +			u64 *v = (u64 *)&header->adds_features;
> +
> +			for (i = 0; i<  BITS_TO_U64(HEADER_FEAT_BITS); ++i) {
> +				union {
> +					u64 val64;
> +					u32 val32[2];
> +				} u;
> +
> +				u.val64 = *v++;
> +				u.val64 = bswap_64(u.val64);
> +				u.val32[0] = bswap_32(u.val32[0]);
> +				u.val32[1] = bswap_32(u.val32[1]);
>   			}
>   		}
>
> diff --git a/tools/perf/util/include/linux/bitops.h b/tools/perf/util/include/linux/bitops.h
> index f1584833..10096cb 100644
> --- a/tools/perf/util/include/linux/bitops.h
> +++ b/tools/perf/util/include/linux/bitops.h
> @@ -8,6 +8,7 @@
>   #define BITS_PER_LONG __WORDSIZE
>   #define BITS_PER_BYTE           8
>   #define BITS_TO_LONGS(nr)       DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
> +#define BITS_TO_U64(nr)         DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(u64))
>
>   #define for_each_set_bit(bit, addr, size) \
>   	for ((bit) = find_first_bit((addr), (size));		\


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv3 0/5] perf, tool: Fix endian issues
  2012-05-07 11:29 [PATCHv3 0/5] perf, tool: Fix endian issues Jiri Olsa
                   ` (4 preceding siblings ...)
  2012-05-07 11:29 ` [PATCH 5/5] perf, tool: Fix endianity trick for adds_features bitmask Jiri Olsa
@ 2012-05-08  2:12 ` David Ahern
  2012-05-08  3:37   ` David Ahern
  2012-05-08  3:49 ` David Ahern
  6 siblings, 1 reply; 12+ messages in thread
From: David Ahern @ 2012-05-08  2:12 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec,
	linux-kernel, Caspar Zhang

On 5/7/12 5:29 AM, Jiri Olsa wrote:
> Tested by running following usecase:
>    - origin system:
>      # perf record -a -- sleep 10 (any perf record will do)
>      # perf report>  report.origin
>      # perf archive perf.data
>
>    - copy the perf.data, report.origin and perf.data.tar.bz2
>      to a target system and run:
>      # tar xjvf perf.data.tar.bz2 -C ~/.debug
>      # perf report>  report.target
>      # diff -u report.origin report.target
>
>    - the diff should produce no output
>      (besides some white space stuff and possibly different
>       date/TZ output)
>
> Tested by above usecase cross following architectures:
>    i386, x86_64, s390x, ppc64

What version of ppc64? I'm getting 2 compile failures using Fedora 12 
PPC, 32-bit. I sent a patch for the easy one with perf-report. I'm a bit 
stumped on this one:

     LINK /tmp/pbuild/perf
/tmp/pbuild/libperf.a(pmu.o): In function `pmu_format_parse':
/mnt/src/tools/perf/util/pmu.c:47: undefined reference to `perf_pmu_in'
/mnt/src/tools/perf/util/pmu.c:47: undefined reference to `perf_pmu_in'
/tmp/pbuild/libperf.a(pmu-bison.o): In function `perf_pmu_parse':
/tmp/pbuild/util/pmu-bison.c:1287: undefined reference to `perf_pmu_lex'
collect2: ld returned 1 exit status
make: *** [/tmp/pbuild/perf] Error 1

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv3 0/5] perf, tool: Fix endian issues
  2012-05-08  2:12 ` [PATCHv3 0/5] perf, tool: Fix endian issues David Ahern
@ 2012-05-08  3:37   ` David Ahern
  0 siblings, 0 replies; 12+ messages in thread
From: David Ahern @ 2012-05-08  3:37 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec,
	linux-kernel, Caspar Zhang

On 5/7/12 8:12 PM, David Ahern wrote:
>
> LINK /tmp/pbuild/perf
> /tmp/pbuild/libperf.a(pmu.o): In function `pmu_format_parse':
> /mnt/src/tools/perf/util/pmu.c:47: undefined reference to `perf_pmu_in'
> /mnt/src/tools/perf/util/pmu.c:47: undefined reference to `perf_pmu_in'
> /tmp/pbuild/libperf.a(pmu-bison.o): In function `perf_pmu_parse':
> /tmp/pbuild/util/pmu-bison.c:1287: undefined reference to `perf_pmu_lex'
> collect2: ld returned 1 exit status
> make: *** [/tmp/pbuild/perf] Error 1

nevermind. This one is a weird side effect of not having flex and bison 
installed the first time I tried compiling it. After nuking the build 
directory, installing flex and bison, perf compiled fine.

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv3 0/5] perf, tool: Fix endian issues
  2012-05-07 11:29 [PATCHv3 0/5] perf, tool: Fix endian issues Jiri Olsa
                   ` (5 preceding siblings ...)
  2012-05-08  2:12 ` [PATCHv3 0/5] perf, tool: Fix endian issues David Ahern
@ 2012-05-08  3:49 ` David Ahern
  2012-05-10 10:38   ` Jiri Olsa
  6 siblings, 1 reply; 12+ messages in thread
From: David Ahern @ 2012-05-08  3:49 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec,
	linux-kernel, Caspar Zhang

On 5/7/12 5:29 AM, Jiri Olsa wrote:
> Tested by running following usecase:
>    - origin system:
>      # perf record -a -- sleep 10 (any perf record will do)
>      # perf report>  report.origin
>      # perf archive perf.data
>
>    - copy the perf.data, report.origin and perf.data.tar.bz2
>      to a target system and run:
>      # tar xjvf perf.data.tar.bz2 -C ~/.debug
>      # perf report>  report.target
>      # diff -u report.origin report.target
>
>    - the diff should produce no output
>      (besides some white space stuff and possibly different
>       date/TZ output)
>
> Tested by above usecase cross following architectures:
>    i386, x86_64, s390x, ppc64

Short answer: Now that I have perf compiling in a 32-bit PPC VM, applied 
the patchset and rebuilding -- it does not work for me.

Long answer to reproduce:

In the PPC VM:
1. build perf without your patches

2. perf record -ag -fo /tmp/perf.data -- sleep 1


In the host (x86_64) using the same code base:
1. build perf without your patches

2. mount ppc filesystem (e.g.,
sshfs root@172.16.128.67:/ /tmp/f12-ppc/ -o direct_io)

3. analyze ppc file. Verify life is good:
/tmp/jiri/perf script --symfs /tmp/f12-ppc \
     -I -i /tmp/f12-ppc/tmp/perf.data \
      --kallsyms /tmp/f12-ppc/proc/kallsyms


# ========
# captured on: Mon May  7 21:32:15 2012
# hostname : f12-ppc
# os release : 2.6.32.26-175.fc12.ppc
# perf version : 3.4.0-rc2
# arch : ppc
# nrcpus online : 1
# nrcpus avail : 1
# cpudesc : 740/750
# cpuid : 8,769
# total memory : 1024572 kB
# cmdline : /tmp/pbuild/perf record -ag -fo /tmp/perf.data -v -e 
cpu-clock -- sleep 5
# event : name = cpu-clock, type = 1, config = 0x0, config1 = 0x0, 
config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 6 }
# ========
#
...

Apply your patches and rebuild the host (x86_64 version). Life is not so 
good:

/tmp/jiri/perf script --symfs /tmp/f12-ppc \
     -I -i /tmp/f12-ppc/tmp/perf.data \
      --kallsyms /tmp/f12-ppc/proc/kallsyms

# ========
# captured on: Mon May  7 21:32:15 2012
# ========
#


i.e, all the feature data has gone missing.

David


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv3 0/5] perf, tool: Fix endian issues
  2012-05-08  3:49 ` David Ahern
@ 2012-05-10 10:38   ` Jiri Olsa
  2012-05-10 16:30     ` Jiri Olsa
  0 siblings, 1 reply; 12+ messages in thread
From: Jiri Olsa @ 2012-05-10 10:38 UTC (permalink / raw)
  To: David Ahern
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec,
	linux-kernel, Caspar Zhang

On Mon, May 07, 2012 at 09:49:17PM -0600, David Ahern wrote:
> On 5/7/12 5:29 AM, Jiri Olsa wrote:
> >Tested by running following usecase:
> >   - origin system:
> >     # perf record -a -- sleep 10 (any perf record will do)
> >     # perf report>  report.origin
> >     # perf archive perf.data
> >
> >   - copy the perf.data, report.origin and perf.data.tar.bz2
> >     to a target system and run:
> >     # tar xjvf perf.data.tar.bz2 -C ~/.debug
> >     # perf report>  report.target
> >     # diff -u report.origin report.target
> >
> >   - the diff should produce no output
> >     (besides some white space stuff and possibly different
> >      date/TZ output)
> >
> >Tested by above usecase cross following architectures:
> >   i386, x86_64, s390x, ppc64
> 
> Short answer: Now that I have perf compiling in a 32-bit PPC VM,
> applied the patchset and rebuilding -- it does not work for me.

hi,
Could I ask what VM did you use? My qemu setup does not seem
to offer this one.

so it's [32 bits big endian] into [64 bits small endian]

I tested i386 to ppc64. The endianity is the other way round,
I wonder whats wrong with your tested direction.

hunting testing machine ;)

Also I did not test older versions.

thanks,
jirka

> 
> Long answer to reproduce:
> 
> In the PPC VM:
> 1. build perf without your patches
> 
> 2. perf record -ag -fo /tmp/perf.data -- sleep 1
> 
> 
> In the host (x86_64) using the same code base:
> 1. build perf without your patches
> 
> 2. mount ppc filesystem (e.g.,
> sshfs root@172.16.128.67:/ /tmp/f12-ppc/ -o direct_io)
> 
> 3. analyze ppc file. Verify life is good:
> /tmp/jiri/perf script --symfs /tmp/f12-ppc \
>     -I -i /tmp/f12-ppc/tmp/perf.data \
>      --kallsyms /tmp/f12-ppc/proc/kallsyms
> 
> 
> # ========
> # captured on: Mon May  7 21:32:15 2012
> # hostname : f12-ppc
> # os release : 2.6.32.26-175.fc12.ppc
> # perf version : 3.4.0-rc2
> # arch : ppc
> # nrcpus online : 1
> # nrcpus avail : 1
> # cpudesc : 740/750
> # cpuid : 8,769
> # total memory : 1024572 kB
> # cmdline : /tmp/pbuild/perf record -ag -fo /tmp/perf.data -v -e
> cpu-clock -- sleep 5
> # event : name = cpu-clock, type = 1, config = 0x0, config1 = 0x0,
> config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 6 }
> # ========
> #
> ...
> 
> Apply your patches and rebuild the host (x86_64 version). Life is
> not so good:
> 
> /tmp/jiri/perf script --symfs /tmp/f12-ppc \
>     -I -i /tmp/f12-ppc/tmp/perf.data \
>      --kallsyms /tmp/f12-ppc/proc/kallsyms
> 
> # ========
> # captured on: Mon May  7 21:32:15 2012
> # ========
> #
> 
> 
> i.e, all the feature data has gone missing.
> 
> David
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv3 0/5] perf, tool: Fix endian issues
  2012-05-10 10:38   ` Jiri Olsa
@ 2012-05-10 16:30     ` Jiri Olsa
  0 siblings, 0 replies; 12+ messages in thread
From: Jiri Olsa @ 2012-05-10 16:30 UTC (permalink / raw)
  To: David Ahern
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec,
	linux-kernel, Caspar Zhang

On Thu, May 10, 2012 at 12:38:25PM +0200, Jiri Olsa wrote:
> On Mon, May 07, 2012 at 09:49:17PM -0600, David Ahern wrote:
> > On 5/7/12 5:29 AM, Jiri Olsa wrote:
> > >Tested by running following usecase:
> > >   - origin system:
> > >     # perf record -a -- sleep 10 (any perf record will do)
> > >     # perf report>  report.origin
> > >     # perf archive perf.data
> > >
> > >   - copy the perf.data, report.origin and perf.data.tar.bz2
> > >     to a target system and run:
> > >     # tar xjvf perf.data.tar.bz2 -C ~/.debug
> > >     # perf report>  report.target
> > >     # diff -u report.origin report.target
> > >
> > >   - the diff should produce no output
> > >     (besides some white space stuff and possibly different
> > >      date/TZ output)
> > >
> > >Tested by above usecase cross following architectures:
> > >   i386, x86_64, s390x, ppc64
> > 
> > Short answer: Now that I have perf compiling in a 32-bit PPC VM,
> > applied the patchset and rebuilding -- it does not work for me.
> 
> hi,
> Could I ask what VM did you use? My qemu setup does not seem
> to offer this one.
> 
> so it's [32 bits big endian] into [64 bits small endian]
> 
> I tested i386 to ppc64. The endianity is the other way round,
> I wonder whats wrong with your tested direction.
> 
> hunting testing machine ;)

reproduced ;) working on fix

jirka

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-05-10 16:31 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-07 11:29 [PATCHv3 0/5] perf, tool: Fix endian issues Jiri Olsa
2012-05-07 11:29 ` [PATCH 1/5] perf, tool: Handle different endians properly during symbol load Jiri Olsa
2012-05-07 11:29 ` [PATCH 2/5] perf, tool: Carry perf_event_attr bitfield throught different endians Jiri Olsa
2012-05-07 11:29 ` [PATCH 3/5] perf, tool: Handle endianity swap on sample_id_all header data Jiri Olsa
2012-05-07 11:29 ` [PATCH 4/5] perf, tool: Fix 32 bit values endianity swap for sample_id_all header Jiri Olsa
2012-05-07 11:29 ` [PATCH 5/5] perf, tool: Fix endianity trick for adds_features bitmask Jiri Olsa
2012-05-07 16:15   ` David Ahern
2012-05-08  2:12 ` [PATCHv3 0/5] perf, tool: Fix endian issues David Ahern
2012-05-08  3:37   ` David Ahern
2012-05-08  3:49 ` David Ahern
2012-05-10 10:38   ` Jiri Olsa
2012-05-10 16:30     ` Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).