* [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules [not found] <1322775683-8741-1-git-send-email-mathieu.desnoyers@efficios.com> @ 2011-12-01 21:41 ` Mathieu Desnoyers 2011-12-01 21:57 ` Christoph Hellwig 2011-12-01 21:41 ` [PATCH 03/11] fs/splice: export splice_to_pipe " Mathieu Desnoyers 2011-12-01 21:41 ` [PATCH 09/11] sched: export task_prio " Mathieu Desnoyers 2 siblings, 1 reply; 51+ messages in thread From: Mathieu Desnoyers @ 2011-12-01 21:41 UTC (permalink / raw) To: Greg KH, Mathieu Desnoyers Cc: devel, lttng-dev, Mathieu Desnoyers, Linus Torvalds, Christoph Hellwig, Christoph Lameter, Tejun Heo, David Howells, David McCullough, D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel, Greg KH LTTng needs this symbol exported. It calls it to ensure its tracing buffers and allocated data structures never trigger a page fault. This is required to handle page fault handler tracing and NMI tracing gracefully. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Christoph Hellwig <hch@infradead.org> CC: Christoph Lameter <cl@linux-foundation.org> CC: Tejun Heo <tj@kernel.org> CC: David Howells <dhowells@redhat.com> CC: David McCullough <davidm@snapgear.com> CC: D Jeff Dionne <jeff@uClinux.org> CC: Greg Ungerer <gerg@snapgear.com> CC: Paul Mundt <lethal@linux-sh.org> CC: linux-mm@kvack.org CC: linux-kernel@vger.kernel.org CC: Greg KH <greg@kroah.com> --- mm/nommu.c | 1 + mm/vmalloc.c | 1 + 2 files changed, 2 insertions(+), 0 deletions(-) diff --git a/mm/nommu.c b/mm/nommu.c index b982290..b22a0d9 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -441,6 +441,7 @@ EXPORT_SYMBOL_GPL(vm_unmap_aliases); void __attribute__((weak)) vmalloc_sync_all(void) { } +EXPORT_SYMBOL_GPL(vmalloc_sync_all); /** * alloc_vm_area - allocate a range of kernel address space diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 3231bf3..37ddce5 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2137,6 +2137,7 @@ EXPORT_SYMBOL(remap_vmalloc_range); void __attribute__((weak)) vmalloc_sync_all(void) { } +EXPORT_SYMBOL_GPL(vmalloc_sync_all); static int f(pte_t *pte, pgtable_t table, unsigned long addr, void *data) -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules 2011-12-01 21:41 ` [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules Mathieu Desnoyers @ 2011-12-01 21:57 ` Christoph Hellwig 2011-12-01 22:13 ` Greg KH 0 siblings, 1 reply; 51+ messages in thread From: Christoph Hellwig @ 2011-12-01 21:57 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Greg KH, devel, lttng-dev, Linus Torvalds, Christoph Hellwig, Christoph Lameter, Tejun Heo, David Howells, David McCullough, D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote: > LTTng needs this symbol exported. It calls it to ensure its tracing > buffers and allocated data structures never trigger a page fault. This > is required to handle page fault handler tracing and NMI tracing > gracefully. We: a) don't export symbols unless they have an intree-user b) especially don't export something as lowlevel as this one. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules 2011-12-01 21:57 ` Christoph Hellwig @ 2011-12-01 22:13 ` Greg KH 2011-12-01 22:19 ` Mathieu Desnoyers 2011-12-01 22:28 ` Christoph Hellwig 0 siblings, 2 replies; 51+ messages in thread From: Greg KH @ 2011-12-01 22:13 UTC (permalink / raw) To: Christoph Hellwig Cc: Mathieu Desnoyers, devel, lttng-dev, Linus Torvalds, Christoph Lameter, Tejun Heo, David Howells, David McCullough, D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel On Thu, Dec 01, 2011 at 04:57:00PM -0500, Christoph Hellwig wrote: > On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote: > > LTTng needs this symbol exported. It calls it to ensure its tracing > > buffers and allocated data structures never trigger a page fault. This > > is required to handle page fault handler tracing and NMI tracing > > gracefully. > > We: > > a) don't export symbols unless they have an intree-user lttng is now in-tree in the drivers/staging/ area. See linux-next for details if you are curious. > b) especially don't export something as lowlevel as this one. Mathieu, there's nothing else you can do to get this information? Or does lttng really want such lowlevel data? thanks, greg k-h ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules 2011-12-01 22:13 ` Greg KH @ 2011-12-01 22:19 ` Mathieu Desnoyers 2011-12-01 22:41 ` Greg KH 2011-12-01 22:28 ` Christoph Hellwig 1 sibling, 1 reply; 51+ messages in thread From: Mathieu Desnoyers @ 2011-12-01 22:19 UTC (permalink / raw) To: Greg KH Cc: Christoph Hellwig, devel, lttng-dev, Linus Torvalds, Christoph Lameter, Tejun Heo, David Howells, David McCullough, D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel * Greg KH (greg@kroah.com) wrote: > On Thu, Dec 01, 2011 at 04:57:00PM -0500, Christoph Hellwig wrote: > > On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote: > > > LTTng needs this symbol exported. It calls it to ensure its tracing > > > buffers and allocated data structures never trigger a page fault. This > > > is required to handle page fault handler tracing and NMI tracing > > > gracefully. > > > > We: > > > > a) don't export symbols unless they have an intree-user > > lttng is now in-tree in the drivers/staging/ area. See linux-next for > details if you are curious. > > > b) especially don't export something as lowlevel as this one. > > Mathieu, there's nothing else you can do to get this information? Or > does lttng really want such lowlevel data? LTTng calls vmalloc_sync_all() to make sure it won't crash the system (due to recursive page fault) when hooking on the page fault handler and on any hook that would happen to sit in a function hit by NMI context. So it really goes beyond just extracting information for this one I'm afraid: it's a matter of execution correctness. This is a point I'm really anal about: the tracer should _never_ crash the traced system, _ever_, in any foreseeable condition. Thanks, Mathieu > > thanks, > > greg k-h -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules 2011-12-01 22:19 ` Mathieu Desnoyers @ 2011-12-01 22:41 ` Greg KH 0 siblings, 0 replies; 51+ messages in thread From: Greg KH @ 2011-12-01 22:41 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Christoph Hellwig, devel, lttng-dev, Linus Torvalds, Christoph Lameter, Tejun Heo, David Howells, David McCullough, D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel On Thu, Dec 01, 2011 at 05:19:40PM -0500, Mathieu Desnoyers wrote: > * Greg KH (greg@kroah.com) wrote: > > On Thu, Dec 01, 2011 at 04:57:00PM -0500, Christoph Hellwig wrote: > > > On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote: > > > > LTTng needs this symbol exported. It calls it to ensure its tracing > > > > buffers and allocated data structures never trigger a page fault. This > > > > is required to handle page fault handler tracing and NMI tracing > > > > gracefully. > > > > > > We: > > > > > > a) don't export symbols unless they have an intree-user > > > > lttng is now in-tree in the drivers/staging/ area. See linux-next for > > details if you are curious. > > > > > b) especially don't export something as lowlevel as this one. > > > > Mathieu, there's nothing else you can do to get this information? Or > > does lttng really want such lowlevel data? > > LTTng calls vmalloc_sync_all() to make sure it won't crash the system > (due to recursive page fault) when hooking on the page fault handler and > on any hook that would happen to sit in a function hit by NMI context. > So it really goes beyond just extracting information for this one I'm > afraid: it's a matter of execution correctness. Ok, fair enough. Christoph, is there any other way to achive something like this without this symbol being exported that you know of? thanks, greg k-h ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules 2011-12-01 22:13 ` Greg KH 2011-12-01 22:19 ` Mathieu Desnoyers @ 2011-12-01 22:28 ` Christoph Hellwig 2011-12-01 23:00 ` Greg KH 1 sibling, 1 reply; 51+ messages in thread From: Christoph Hellwig @ 2011-12-01 22:28 UTC (permalink / raw) To: Greg KH Cc: Christoph Hellwig, Mathieu Desnoyers, devel, lttng-dev, Linus Torvalds, Christoph Lameter, Tejun Heo, David Howells, David McCullough, D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel On Thu, Dec 01, 2011 at 02:13:37PM -0800, Greg KH wrote: > On Thu, Dec 01, 2011 at 04:57:00PM -0500, Christoph Hellwig wrote: > > On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote: > > > LTTng needs this symbol exported. It calls it to ensure its tracing > > > buffers and allocated data structures never trigger a page fault. This > > > is required to handle page fault handler tracing and NMI tracing > > > gracefully. > > > > We: > > > > a) don't export symbols unless they have an intree-user > > lttng is now in-tree in the drivers/staging/ area. See linux-next for > details if you are curious. Eww - merging stuff without discussion on lkml is more than evil. Either way, it was guaranteed that drivers/staging is considered out of tree for core code. I'm defintively dead set against exporting anything for staging and opening that slippery slope. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules 2011-12-01 22:28 ` Christoph Hellwig @ 2011-12-01 23:00 ` Greg KH 0 siblings, 0 replies; 51+ messages in thread From: Greg KH @ 2011-12-01 23:00 UTC (permalink / raw) To: Christoph Hellwig Cc: Mathieu Desnoyers, devel, lttng-dev, Linus Torvalds, Christoph Lameter, Tejun Heo, David Howells, David McCullough, D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel On Thu, Dec 01, 2011 at 05:28:03PM -0500, Christoph Hellwig wrote: > On Thu, Dec 01, 2011 at 02:13:37PM -0800, Greg KH wrote: > > On Thu, Dec 01, 2011 at 04:57:00PM -0500, Christoph Hellwig wrote: > > > On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote: > > > > LTTng needs this symbol exported. It calls it to ensure its tracing > > > > buffers and allocated data structures never trigger a page fault. This > > > > is required to handle page fault handler tracing and NMI tracing > > > > gracefully. > > > > > > We: > > > > > > a) don't export symbols unless they have an intree-user > > > > lttng is now in-tree in the drivers/staging/ area. See linux-next for > > details if you are curious. > > Eww - merging stuff without discussion on lkml is more than evil. Do you really want discussing all staging driver crap on lkml? Core changes, like this one, for stuff in staging should be done on lkml, which is what this conversation is :) > Either way, it was guaranteed that drivers/staging is considered out of > tree for core code. The zram and zcache code would tend to disagree with you there :) > I'm defintively dead set against exporting anything for staging and > opening that slippery slope. How else should we handle something like this then? Some code, this one specifically, is trying to get merged, so taking it slowly, through staging, and getting it reviewed and cleaned up better before it can go into the "real" part of the kernel, is the whole goal here. Here's a real need for a symbol that an existing, shipping, useful kernel module is wanting to use. If you can provide a way that this can be handled without such an export, that does not require digging through the symbol table (which is what it was doing and I rightfully objected to that), then please let us know. Otherwise, what are our alternatives here, to just forbid this code from ever being merged? thanks, greg k-h ^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 03/11] fs/splice: export splice_to_pipe to GPL modules [not found] <1322775683-8741-1-git-send-email-mathieu.desnoyers@efficios.com> 2011-12-01 21:41 ` [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules Mathieu Desnoyers @ 2011-12-01 21:41 ` Mathieu Desnoyers 2011-12-02 7:19 ` Jens Axboe 2011-12-01 21:41 ` [PATCH 09/11] sched: export task_prio " Mathieu Desnoyers 2 siblings, 1 reply; 51+ messages in thread From: Mathieu Desnoyers @ 2011-12-01 21:41 UTC (permalink / raw) To: Greg KH, Mathieu Desnoyers Cc: devel, lttng-dev, Mathieu Desnoyers, Linus Torvalds, Ingo Molnar, Jens Axboe, linux-kernel, Greg KH The LTTng driver needs this symbol exported because it implements its own splice actor. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Ingo Molnar <mingo@elte.hu> CC: Jens Axboe <axboe@kernel.dk> CC: linux-kernel@vger.kernel.org CC: Greg KH <greg@kroah.com> --- fs/splice.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index fa2defa..9eb15b5 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -263,6 +263,7 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe, return ret; } +EXPORT_SYMBOL_GPL(splice_to_pipe); void spd_release_page(struct splice_pipe_desc *spd, unsigned int i) { -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH 03/11] fs/splice: export splice_to_pipe to GPL modules 2011-12-01 21:41 ` [PATCH 03/11] fs/splice: export splice_to_pipe " Mathieu Desnoyers @ 2011-12-02 7:19 ` Jens Axboe 2011-12-02 12:32 ` Mathieu Desnoyers 0 siblings, 1 reply; 51+ messages in thread From: Jens Axboe @ 2011-12-02 7:19 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Greg KH, devel, lttng-dev, Linus Torvalds, Ingo Molnar, Jens Axboe, linux-kernel On 2011-12-01 22:41, Mathieu Desnoyers wrote: > The LTTng driver needs this symbol exported because it implements its > own splice actor. > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > CC: Linus Torvalds <torvalds@linux-foundation.org> > CC: Ingo Molnar <mingo@elte.hu> > CC: Jens Axboe <axboe@kernel.dk> > CC: linux-kernel@vger.kernel.org > CC: Greg KH <greg@kroah.com> > --- > fs/splice.c | 1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/fs/splice.c b/fs/splice.c > index fa2defa..9eb15b5 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -263,6 +263,7 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe, > > return ret; > } > +EXPORT_SYMBOL_GPL(splice_to_pipe); The rest of the splice symbols are regular exports, please do the same for this one. Thanks. -- Jens Axboe ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 03/11] fs/splice: export splice_to_pipe to GPL modules 2011-12-02 7:19 ` Jens Axboe @ 2011-12-02 12:32 ` Mathieu Desnoyers 0 siblings, 0 replies; 51+ messages in thread From: Mathieu Desnoyers @ 2011-12-02 12:32 UTC (permalink / raw) To: Jens Axboe Cc: Greg KH, devel, lttng-dev, Linus Torvalds, Ingo Molnar, Jens Axboe, linux-kernel * Jens Axboe (jens@axboe.dk) wrote: > On 2011-12-01 22:41, Mathieu Desnoyers wrote: > > The LTTng driver needs this symbol exported because it implements its > > own splice actor. > > > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > CC: Linus Torvalds <torvalds@linux-foundation.org> > > CC: Ingo Molnar <mingo@elte.hu> > > CC: Jens Axboe <axboe@kernel.dk> > > CC: linux-kernel@vger.kernel.org > > CC: Greg KH <greg@kroah.com> > > --- > > fs/splice.c | 1 + > > 1 files changed, 1 insertions(+), 0 deletions(-) > > > > diff --git a/fs/splice.c b/fs/splice.c > > index fa2defa..9eb15b5 100644 > > --- a/fs/splice.c > > +++ b/fs/splice.c > > @@ -263,6 +263,7 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe, > > > > return ret; > > } > > +EXPORT_SYMBOL_GPL(splice_to_pipe); > > The rest of the splice symbols are regular exports, please do the same > for this one. Thanks. I've been wondering about this one, but thought it would be better to let you decide on opening up the symbol more than with _GPL. Will do! Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 09/11] sched: export task_prio to GPL modules [not found] <1322775683-8741-1-git-send-email-mathieu.desnoyers@efficios.com> 2011-12-01 21:41 ` [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules Mathieu Desnoyers 2011-12-01 21:41 ` [PATCH 03/11] fs/splice: export splice_to_pipe " Mathieu Desnoyers @ 2011-12-01 21:41 ` Mathieu Desnoyers 2011-12-01 21:56 ` Peter Zijlstra 2 siblings, 1 reply; 51+ messages in thread From: Mathieu Desnoyers @ 2011-12-01 21:41 UTC (permalink / raw) To: Greg KH, Mathieu Desnoyers Cc: devel, lttng-dev, Mathieu Desnoyers, Ingo Molnar, Peter Zijlstra, linux-kernel, Greg KH LTTng needs this symbol to prepend the current task dynamic priority value to events (optional context information). Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Ingo Molnar <mingo@elte.hu> CC: Peter Zijlstra <peterz@infradead.org> CC: linux-kernel@vger.kernel.org CC: Greg KH <greg@kroah.com> --- kernel/sched.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index 0e9344a..80dbb09 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -5142,6 +5142,7 @@ int task_prio(const struct task_struct *p) { return p->prio - MAX_RT_PRIO; } +EXPORT_SYMBOL_GPL(task_prio); /** * task_nice - return the nice value of a given task. -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-01 21:41 ` [PATCH 09/11] sched: export task_prio " Mathieu Desnoyers @ 2011-12-01 21:56 ` Peter Zijlstra 2011-12-01 22:04 ` Mathieu Desnoyers 2011-12-01 22:14 ` Greg KH 0 siblings, 2 replies; 51+ messages in thread From: Peter Zijlstra @ 2011-12-01 21:56 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote: > LTTng needs this symbol to prepend the current task dynamic priority > value to events (optional context information). I absolutely detest exporting such stuff. It propagates the idea that task prio actually means something. Also, modules really shouldn't care. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-01 21:56 ` Peter Zijlstra @ 2011-12-01 22:04 ` Mathieu Desnoyers 2011-12-01 22:10 ` Peter Zijlstra 2011-12-01 22:14 ` Greg KH 1 sibling, 1 reply; 51+ messages in thread From: Mathieu Desnoyers @ 2011-12-01 22:04 UTC (permalink / raw) To: Peter Zijlstra Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart * Peter Zijlstra (peterz@infradead.org) wrote: > On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote: > > LTTng needs this symbol to prepend the current task dynamic priority > > value to events (optional context information). > > I absolutely detest exporting such stuff. It propagates the idea that > task prio actually means something. Also, modules really shouldn't care. People debugging their SCHED_FIFO/SCHED_RR applications, as well as users of priority-inheritance futexes, may happen to find this information extremely useful. Just saying... Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-01 22:04 ` Mathieu Desnoyers @ 2011-12-01 22:10 ` Peter Zijlstra 2011-12-01 22:15 ` Mathieu Desnoyers 0 siblings, 1 reply; 51+ messages in thread From: Peter Zijlstra @ 2011-12-01 22:10 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart On Thu, 2011-12-01 at 17:04 -0500, Mathieu Desnoyers wrote: > * Peter Zijlstra (peterz@infradead.org) wrote: > > On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote: > > > LTTng needs this symbol to prepend the current task dynamic priority > > > value to events (optional context information). > > > > I absolutely detest exporting such stuff. It propagates the idea that > > task prio actually means something. Also, modules really shouldn't care. > > People debugging their SCHED_FIFO/SCHED_RR applications, as well as > users of priority-inheritance futexes, may happen to find this > information extremely useful. > > Just saying... Right until the moment we go do deadlines.. Anyway, it still doesn't make sense, your sched_switch() tracepoint handler gets this information, why do you need this export at all? ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-01 22:10 ` Peter Zijlstra @ 2011-12-01 22:15 ` Mathieu Desnoyers 2011-12-01 22:36 ` Mathieu Desnoyers 2011-12-01 23:06 ` Peter Zijlstra 0 siblings, 2 replies; 51+ messages in thread From: Mathieu Desnoyers @ 2011-12-01 22:15 UTC (permalink / raw) To: Peter Zijlstra Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart * Peter Zijlstra (peterz@infradead.org) wrote: > On Thu, 2011-12-01 at 17:04 -0500, Mathieu Desnoyers wrote: > > * Peter Zijlstra (peterz@infradead.org) wrote: > > > On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote: > > > > LTTng needs this symbol to prepend the current task dynamic priority > > > > value to events (optional context information). > > > > > > I absolutely detest exporting such stuff. It propagates the idea that > > > task prio actually means something. Also, modules really shouldn't care. > > > > People debugging their SCHED_FIFO/SCHED_RR applications, as well as > > users of priority-inheritance futexes, may happen to find this > > information extremely useful. > > > > Just saying... > > Right until the moment we go do deadlines.. Anyway, it still doesn't > make sense, your sched_switch() tracepoint handler gets this > information, why do you need this export at all? If you don't want to trace sched_switch, but just conveniently prepend this information to all your events, then lttng lets you dynamically target this extra bit of information. Note that it's not a mandatory event field: I call those "context" fields that the tracer prepends to events, as requested by the user. Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-01 22:15 ` Mathieu Desnoyers @ 2011-12-01 22:36 ` Mathieu Desnoyers 2011-12-01 23:05 ` Peter Zijlstra 2011-12-01 23:06 ` Peter Zijlstra 1 sibling, 1 reply; 51+ messages in thread From: Mathieu Desnoyers @ 2011-12-01 22:36 UTC (permalink / raw) To: Peter Zijlstra Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart * Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote: > * Peter Zijlstra (peterz@infradead.org) wrote: > > On Thu, 2011-12-01 at 17:04 -0500, Mathieu Desnoyers wrote: > > > * Peter Zijlstra (peterz@infradead.org) wrote: > > > > On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote: > > > > > LTTng needs this symbol to prepend the current task dynamic priority > > > > > value to events (optional context information). > > > > > > > > I absolutely detest exporting such stuff. It propagates the idea that > > > > task prio actually means something. Also, modules really shouldn't care. > > > > > > People debugging their SCHED_FIFO/SCHED_RR applications, as well as > > > users of priority-inheritance futexes, may happen to find this > > > information extremely useful. > > > > > > Just saying... > > > > Right until the moment we go do deadlines.. Anyway, it still doesn't > > make sense, your sched_switch() tracepoint handler gets this > > information, why do you need this export at all? > > If you don't want to trace sched_switch, but just conveniently prepend > this information to all your events, then lttng lets you dynamically > target this extra bit of information. Note that it's not a mandatory > event field: I call those "context" fields that the tracer prepends to > events, as requested by the user. One more point: compudj@thinkos:/proc/204$ cat sched khubd (204, #threads: 1) --------------------------------------------------------- se.exec_start : 3355267.749529 se.vruntime : 113843.899081 se.sum_exec_runtime : 12.820702 nr_switches : 386 nr_voluntary_switches : 385 nr_involuntary_switches : 1 se.load.weight : 1024 policy : 0 prio : 120 clock-delta : 130 So what you are saying is that it is fine to export task_prio to _userspace_, thus making it part of the ABI, but it's not OK to export it to GPL modules ? Weird huh ? Mathieu > > Thanks, > > Mathieu > > > -- > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-01 22:36 ` Mathieu Desnoyers @ 2011-12-01 23:05 ` Peter Zijlstra 2011-12-02 13:51 ` Mathieu Desnoyers 0 siblings, 1 reply; 51+ messages in thread From: Peter Zijlstra @ 2011-12-01 23:05 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart On Thu, 2011-12-01 at 17:36 -0500, Mathieu Desnoyers wrote: > So what you are saying is that it is fine to export task_prio to > _userspace_, thus making it part of the ABI, but it's not OK to export > it to GPL modules ? that's a SCHED_DEBUG proc file. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-01 23:05 ` Peter Zijlstra @ 2011-12-02 13:51 ` Mathieu Desnoyers 0 siblings, 0 replies; 51+ messages in thread From: Mathieu Desnoyers @ 2011-12-02 13:51 UTC (permalink / raw) To: Peter Zijlstra Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart * Peter Zijlstra (peterz@infradead.org) wrote: > On Thu, 2011-12-01 at 17:36 -0500, Mathieu Desnoyers wrote: > > So what you are saying is that it is fine to export task_prio to > > _userspace_, thus making it part of the ABI, but it's not OK to export > > it to GPL modules ? > > that's a SCHED_DEBUG proc file. Fair point. You'll then notice that /proc/<pid>/stat (18th field) exports it too, and it's not under SCHED_DEBUG: ok:/proc/20# cat stat 20 (migration/5) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 -100 0 1 0 70 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744071579371389 0 0 17 5 99 1 0 0 0 (see -100 above) as defined in Documentation/filesystems/proc.txt: "Table 1-4: Contents of the stat files (as of 2.6.30-rc7) [...] priority priority level" Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-01 22:15 ` Mathieu Desnoyers 2011-12-01 22:36 ` Mathieu Desnoyers @ 2011-12-01 23:06 ` Peter Zijlstra 2011-12-01 23:18 ` Greg KH 1 sibling, 1 reply; 51+ messages in thread From: Peter Zijlstra @ 2011-12-01 23:06 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart On Thu, 2011-12-01 at 17:15 -0500, Mathieu Desnoyers wrote: > > If you don't want to trace sched_switch, but just conveniently prepend > this information to all your events Oh so you want to debug a scheduler issue but don't want to use the scheduler tracepoint, I guess that makes perfect sense for clueless people. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-01 23:06 ` Peter Zijlstra @ 2011-12-01 23:18 ` Greg KH 2011-12-01 23:47 ` Mathieu Desnoyers 0 siblings, 1 reply; 51+ messages in thread From: Greg KH @ 2011-12-01 23:18 UTC (permalink / raw) To: Peter Zijlstra Cc: Mathieu Desnoyers, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart On Fri, Dec 02, 2011 at 12:06:37AM +0100, Peter Zijlstra wrote: > On Thu, 2011-12-01 at 17:15 -0500, Mathieu Desnoyers wrote: > > > > If you don't want to trace sched_switch, but just conveniently prepend > > this information to all your events > > Oh so you want to debug a scheduler issue but don't want to use the > scheduler tracepoint, I guess that makes perfect sense for clueless > people. Matheiu, can't lttng use the scheduler tracepoint for this information? ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-01 23:18 ` Greg KH @ 2011-12-01 23:47 ` Mathieu Desnoyers 0 siblings, 0 replies; 51+ messages in thread From: Mathieu Desnoyers @ 2011-12-01 23:47 UTC (permalink / raw) To: Greg KH Cc: Peter Zijlstra, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart * Greg KH (greg@kroah.com) wrote: > On Fri, Dec 02, 2011 at 12:06:37AM +0100, Peter Zijlstra wrote: > > On Thu, 2011-12-01 at 17:15 -0500, Mathieu Desnoyers wrote: > > > > > > If you don't want to trace sched_switch, but just conveniently prepend > > > this information to all your events > > > > Oh so you want to debug a scheduler issue but don't want to use the > > scheduler tracepoint, I guess that makes perfect sense for clueless > > people. > > Matheiu, can't lttng use the scheduler tracepoint for this information? LTTng allows user to choose between both methods, each one being suited to a particular use of the tracer: A) Extraction through the scheduler tracepoint: LTTng viewers have a full-fledged current state reconstruction of the traced OS (for any point in time during the trace) performed as one of the bottom layers of our trace analysis tools. This makes sense for use-cases where the data needs to be transported, and/or stored, and where the amount of data throughput needs to be minimized. We use this technique a lot, of course. This state-tracking requires CPU/memory resource usage by the viewer. B) Extraction through "optional" event context information: We have, in development, a new "enhanced top" called lttngtop that uses tracing information, directly read from mmap'd buffers, to provide second-by-second profile information of the system. It is not as sensitive to data compactness as the transport/disk storage use-case, mainly because no data copy is ever required -- the buffers simply get overwritten after lttngtop has finished aggregating the information. This has less performance overhead that the big hammer "top" that periodically reads all files in /proc, and can provide much more detailed profiles. This use-case favors sending additional data from kernel to user-space rather than recomputing the OS state within lttngtop, due to the very low overhead of direct mmap data transport, over recomputing state needlessly. We could very well "cheat" and use a scheduler tracepoint to keep a duplicate of the current priority value for each CPU within the tracer kernel module. Let me know if you want me to do this. Also, as a matter of fact, the "prio" information exported from the sched_switch event in mainline trace events does not match the prio shown in /proc stat files. The "MAX_RT_PRIO" offset is missing. Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-01 21:56 ` Peter Zijlstra 2011-12-01 22:04 ` Mathieu Desnoyers @ 2011-12-01 22:14 ` Greg KH 2011-12-01 22:20 ` Mathieu Desnoyers 2011-12-01 23:07 ` Peter Zijlstra 1 sibling, 2 replies; 51+ messages in thread From: Greg KH @ 2011-12-01 22:14 UTC (permalink / raw) To: Peter Zijlstra Cc: Mathieu Desnoyers, devel, lttng-dev, Ingo Molnar, linux-kernel On Thu, Dec 01, 2011 at 10:56:08PM +0100, Peter Zijlstra wrote: > On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote: > > LTTng needs this symbol to prepend the current task dynamic priority > > value to events (optional context information). > > I absolutely detest exporting such stuff. It propagates the idea that > task prio actually means something. Also, modules really shouldn't care. Mathieu, if you don't have this information, does anything really care? thanks, greg k-h ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-01 22:14 ` Greg KH @ 2011-12-01 22:20 ` Mathieu Desnoyers 2011-12-01 23:07 ` Peter Zijlstra 1 sibling, 0 replies; 51+ messages in thread From: Mathieu Desnoyers @ 2011-12-01 22:20 UTC (permalink / raw) To: Greg KH; +Cc: Peter Zijlstra, devel, lttng-dev, Ingo Molnar, linux-kernel * Greg KH (greg@kroah.com) wrote: > On Thu, Dec 01, 2011 at 10:56:08PM +0100, Peter Zijlstra wrote: > > On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote: > > > LTTng needs this symbol to prepend the current task dynamic priority > > > value to events (optional context information). > > > > I absolutely detest exporting such stuff. It propagates the idea that > > task prio actually means something. Also, modules really shouldn't care. > > Mathieu, if you don't have this information, does anything really care? I can just remove this specific context module, nothing else will care except the end users, but it's a shame to lose this option. Thanks, Mathieu > > thanks, > > greg k-h -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-01 22:14 ` Greg KH 2011-12-01 22:20 ` Mathieu Desnoyers @ 2011-12-01 23:07 ` Peter Zijlstra 2011-12-01 23:17 ` Greg KH 1 sibling, 1 reply; 51+ messages in thread From: Peter Zijlstra @ 2011-12-01 23:07 UTC (permalink / raw) To: Greg KH; +Cc: Mathieu Desnoyers, devel, lttng-dev, Ingo Molnar, linux-kernel On Thu, 2011-12-01 at 14:14 -0800, Greg KH wrote: > greg k-h Greg, why are you merging this crap anyway? Aren't there enough tracer thingies around already? ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-01 23:07 ` Peter Zijlstra @ 2011-12-01 23:17 ` Greg KH 2011-12-05 14:17 ` Ingo Molnar 0 siblings, 1 reply; 51+ messages in thread From: Greg KH @ 2011-12-01 23:17 UTC (permalink / raw) To: Peter Zijlstra Cc: Mathieu Desnoyers, devel, lttng-dev, Ingo Molnar, linux-kernel On Fri, Dec 02, 2011 at 12:07:10AM +0100, Peter Zijlstra wrote: > On Thu, 2011-12-01 at 14:14 -0800, Greg KH wrote: > > greg k-h > > Greg, why are you merging this crap anyway? Aren't there enough tracer > thingies around already? I don't know, is there? There's some reason the distros, and users, still use lttng, so I'm guessing that it fits the needs of quite a few people. That's why I'm merging it, if that the in-kernel stuff obsoletes lttng, great, let me, and the distros know. thanks, greg k-h ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-01 23:17 ` Greg KH @ 2011-12-05 14:17 ` Ingo Molnar 2011-12-06 21:44 ` Greg KH 2011-12-07 22:57 ` [PATCH 09/11] sched: export task_prio to GPL modules Mathieu Desnoyers 0 siblings, 2 replies; 51+ messages in thread From: Ingo Molnar @ 2011-12-05 14:17 UTC (permalink / raw) To: Greg KH Cc: Peter Zijlstra, Mathieu Desnoyers, devel, lttng-dev, linux-kernel, Linus Torvalds, Andrew Morton * Greg KH <greg@kroah.com> wrote: > On Fri, Dec 02, 2011 at 12:07:10AM +0100, Peter Zijlstra wrote: > > On Thu, 2011-12-01 at 14:14 -0800, Greg KH wrote: > > > greg k-h > > > > Greg, why are you merging this crap anyway? Aren't there enough tracer > > thingies around already? > > I don't know, is there? > > There's some reason the distros, and users, still use lttng, > so I'm guessing that it fits the needs of quite a few people. Same goes for a whole lot of other crap that distros are carrying. Would we want to merge a different CPU scheduler or the 4g:4g patch or a completely new networking stack into drivers/staging/? I don't think so. I.e. putting LTTNG into drivers/staging/ will not really solve anything - and in may in fact delay any sane technical resolution: There's a difference between a driver that has to go into drivers/staging/ because nobody cares enough [and the driver isnt high quality enough yet], and a core kernel feature that we DO care about and which HAS BEEN REJECTED IN ITS FORM. > That's why I'm merging it, if that the in-kernel stuff > obsoletes lttng, great, let me, and the distros know. I'm NAK-ing the LTTNG driver really, as it's a workaround for a core kernel NAK. Mathieu, please work with the tracing folks who DO care about this stuff. It's not like there's a lack of interest in this area, nor is there a lack of willingness to take patches. What there is a lack of is your willingness to actually work on getting something unified, integrated to users... LTTNG has been going on for how many years? I havent seen many steps towards actually *merging* its functionality - you insist on doing your own random thing, which is different in random ways. Yes, some of those random ways may in fact be better than what we have upstream - would you be interested in filtering those out and pushing them upstream? I certainly would like to see that happen. We want to pick the best features, and throw away current upstream code in favor of superior out of tree code - this concept of letting crap sit alongside each other when people do care i cannot agree with. Thanks, Ingo ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-05 14:17 ` Ingo Molnar @ 2011-12-06 21:44 ` Greg KH 2011-12-08 5:23 ` Ingo Molnar 2011-12-07 22:57 ` [PATCH 09/11] sched: export task_prio to GPL modules Mathieu Desnoyers 1 sibling, 1 reply; 51+ messages in thread From: Greg KH @ 2011-12-06 21:44 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Mathieu Desnoyers, devel, lttng-dev, linux-kernel, Linus Torvalds, Andrew Morton On Mon, Dec 05, 2011 at 03:17:49PM +0100, Ingo Molnar wrote: > > * Greg KH <greg@kroah.com> wrote: > > > On Fri, Dec 02, 2011 at 12:07:10AM +0100, Peter Zijlstra wrote: > > > On Thu, 2011-12-01 at 14:14 -0800, Greg KH wrote: > > > > greg k-h > > > > > > Greg, why are you merging this crap anyway? Aren't there enough tracer > > > thingies around already? > > > > I don't know, is there? > > > > There's some reason the distros, and users, still use lttng, > > so I'm guessing that it fits the needs of quite a few people. > > Same goes for a whole lot of other crap that distros are > carrying. Would we want to merge a different CPU scheduler or > the 4g:4g patch or a completely new networking stack into > drivers/staging/? I don't think so. Distros have new CPU schedulers and are still dragging the 4g split around? A whole new networking stack would be interesting, and if self-contained, possible :) > I.e. putting LTTNG into drivers/staging/ will not really solve > anything - and in may in fact delay any sane technical > resolution: > > There's a difference between a driver that has to go into > drivers/staging/ because nobody cares enough [and the driver > isnt high quality enough yet], and a core kernel feature that we > DO care about and which HAS BEEN REJECTED IN ITS FORM. I didn't realize that lttng was rejected, when was that done? I couldn't find it in the archives anywhere. That's why I took this. It's a way for the code to get cleaned up, and into "mergable" state, much easier, with more help than if it was out-of-tree. The fact that distros have been shipping and relying on it for years shows that it is something that is needed, and it being self-contained, makes it eligible for the staging tree. > > That's why I'm merging it, if that the in-kernel stuff > > obsoletes lttng, great, let me, and the distros know. > > I'm NAK-ing the LTTNG driver really, as it's a workaround for a > core kernel NAK. Huh? > Mathieu, please work with the tracing folks who DO care about > this stuff. It's not like there's a lack of interest in this > area, nor is there a lack of willingness to take patches. What > there is a lack of is your willingness to actually work on > getting something unified, integrated to users... > > LTTNG has been going on for how many years? I havent seen many > steps towards actually *merging* its functionality - you insist > on doing your own random thing, which is different in random > ways. Yes, some of those random ways may in fact be better than > what we have upstream - would you be interested in filtering > those out and pushing them upstream? I certainly would like to > see that happen. > > We want to pick the best features, and throw away current > upstream code in favor of superior out of tree code - this > concept of letting crap sit alongside each other when people do > care i cannot agree with. Mathieu, a good explaination of what lttng has that the in-kernel tracing and perf doesn't have would be a good place to start. thanks, greg k-h ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-06 21:44 ` Greg KH @ 2011-12-08 5:23 ` Ingo Molnar 2011-12-08 23:27 ` Greg KH 0 siblings, 1 reply; 51+ messages in thread From: Ingo Molnar @ 2011-12-08 5:23 UTC (permalink / raw) To: Greg KH Cc: Peter Zijlstra, Mathieu Desnoyers, devel, lttng-dev, linux-kernel, Linus Torvalds, Andrew Morton, Thomas Gleixner * Greg KH <greg@kroah.com> wrote: > > Same goes for a whole lot of other crap that distros are > > carrying. Would we want to merge a different CPU scheduler > > or the 4g:4g patch or a completely new networking stack into > > drivers/staging/? I don't think so. > > Distros have new CPU schedulers and are still dragging the 4g > split around? A whole new networking stack would be > interesting, and if self-contained, possible :) The point being, there's legitimate reasons to refuse crap to an area that *people care about* in a constructive manner. There's no rejection of LTTNG in the "hey, go away, you are doing it wrong" fashion - we are not holding a monopoly on how instrumentation is supposed to be done and we've been wrong before. There's a highly constructive, open attitude towards LTTNG and has been for years: " Mathieu, please split it up and integrate/unify it with the existing instrumentation features of Linux - and if it replaces existing stuff because an LTTNG component is superior then so be it. " Let me repeat it: there's no lack of willingness of cooperation from the kernel instrumentation subsystem side. There's a lack of movement from Mathieu - *he* is keeping LTTNG fragmented for barely justifyable technological reasons. Thus there's absolutely no forward movement from having this in drivers/staging/ - in fact there's backwards movement: yet another instrumentation gadget with its own separate ABI and highly overlapping functionality, plus even less incentive for it to cooperate... It is not the typical drivers/staging/ situation where there's either lack of work on a piece of code or some fundamental disagreement about the right model. LTTNG has been *intentionally* kept a separate entity, a separate brand, for whatever non-technical reasons. How will drivers/staging/ change that? It won't. It's a bit like VirtualBox really. In short: this move only *increases* the incentive for LTTNG to stay fragmented and/or force modularization crap like the highly unfortunate situation of security modules ... > > I.e. putting LTTNG into drivers/staging/ will not really > > solve anything - and in may in fact delay any sane technical > > resolution: > > > > There's a difference between a driver that has to go into > > drivers/staging/ because nobody cares enough [and the driver > > isnt high quality enough yet], and a core kernel feature > > that we DO care about and which HAS BEEN REJECTED IN ITS > > FORM. > > I didn't realize that lttng was rejected, when was that done? > I couldn't find it in the archives anywhere. It wasnt resubmitted for years - see the pattern and see the problem? :-) Merging it will cause even *less* cooperation, because of the reasons above and because LTTNG adds a parallel ABI. > The fact that distros have been shipping and relying on it for > years shows that it is something that is needed, and it being > self-contained, makes it eligible for the staging tree. LTT(NG) was simply the historically first tracing toolkit that embedded people got used to and there's still some inertia - and distros add a lot of crap that people find marginally useful which perpetuates the fork if there's at least one active developer behind it. Most of its functionality is available via existing upstream functionality - and where not we are more than willing to accomodate patches! drivers/staging/ is a tool that i support in many (in fact most) cases - but i don't support it if it does harm. I'm supposed to say 'no' to extra complexity more often, and this is definitely one of those cases: Nacked-by: Ingo Molnar <mingo@elte.hu> Also obviously NAK to the scheduler symbol export - that alone should tell you that it's not just a "driver" - it deeply hooks into the core kernel... Please respect the NAK. Thanks, Ingo ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-08 5:23 ` Ingo Molnar @ 2011-12-08 23:27 ` Greg KH 2011-12-19 10:49 ` Ingo Molnar 0 siblings, 1 reply; 51+ messages in thread From: Greg KH @ 2011-12-08 23:27 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Mathieu Desnoyers, devel, lttng-dev, linux-kernel, Linus Torvalds, Andrew Morton, Thomas Gleixner On Thu, Dec 08, 2011 at 06:23:54AM +0100, Ingo Molnar wrote: > > * Greg KH <greg@kroah.com> wrote: > > > > Same goes for a whole lot of other crap that distros are > > > carrying. Would we want to merge a different CPU scheduler > > > or the 4g:4g patch or a completely new networking stack into > > > drivers/staging/? I don't think so. > > > > Distros have new CPU schedulers and are still dragging the 4g > > split around? A whole new networking stack would be > > interesting, and if self-contained, possible :) > > The point being, there's legitimate reasons to refuse crap to an > area that *people care about* in a constructive manner. > > There's no rejection of LTTNG in the "hey, go away, you are > doing it wrong" fashion - we are not holding a monopoly on how > instrumentation is supposed to be done and we've been wrong > before. > > There's a highly constructive, open attitude towards LTTNG and > has been for years: > > " Mathieu, please split it up and integrate/unify it with the > existing instrumentation features of Linux - and if it > replaces existing stuff because an LTTNG component is > superior then so be it. " Ok, that's fair enough. Mathieu, will you please work on this? Or is there some reason you don't feel this is possible? > drivers/staging/ is a tool that i support in many (in fact most) > cases - but i don't support it if it does harm. > > I'm supposed to say 'no' to extra complexity more often, and > this is definitely one of those cases: > > Nacked-by: Ingo Molnar <mingo@elte.hu> > > Also obviously NAK to the scheduler symbol export - that alone > should tell you that it's not just a "driver" - it deeply hooks > into the core kernel... > > Please respect the NAK. Will do, I'll go delete it from the staging-next tree now. greg k-h ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-08 23:27 ` Greg KH @ 2011-12-19 10:49 ` Ingo Molnar 2011-12-19 15:30 ` [lttng-dev] " Mathieu Desnoyers 0 siblings, 1 reply; 51+ messages in thread From: Ingo Molnar @ 2011-12-19 10:49 UTC (permalink / raw) To: Greg KH Cc: Peter Zijlstra, Mathieu Desnoyers, devel, lttng-dev, linux-kernel, Linus Torvalds, Andrew Morton, Thomas Gleixner * Greg KH <greg@kroah.com> wrote: > On Thu, Dec 08, 2011 at 06:23:54AM +0100, Ingo Molnar wrote: > > > > * Greg KH <greg@kroah.com> wrote: > > > > > > Same goes for a whole lot of other crap that distros are > > > > carrying. Would we want to merge a different CPU scheduler > > > > or the 4g:4g patch or a completely new networking stack into > > > > drivers/staging/? I don't think so. > > > > > > Distros have new CPU schedulers and are still dragging the 4g > > > split around? A whole new networking stack would be > > > interesting, and if self-contained, possible :) > > > > The point being, there's legitimate reasons to refuse crap to an > > area that *people care about* in a constructive manner. > > > > There's no rejection of LTTNG in the "hey, go away, you are > > doing it wrong" fashion - we are not holding a monopoly on how > > instrumentation is supposed to be done and we've been wrong > > before. > > > > There's a highly constructive, open attitude towards LTTNG and > > has been for years: > > > > " Mathieu, please split it up and integrate/unify it with the > > existing instrumentation features of Linux - and if it > > replaces existing stuff because an LTTNG component is > > superior then so be it. " > > Ok, that's fair enough. > > Mathieu, will you please work on this? Or is there some > reason you don't feel this is possible? Mathieu, any update on this? I don't want the LTTNG goodies to drop on the floor - we just have to integrate them properly. If you 100% disagree with how specific things are done upstream right now then don't hold back: just replace existing mechanisms - that gives a starting point to discuss what the best way is forward. > > drivers/staging/ is a tool that i support in many (in fact most) > > cases - but i don't support it if it does harm. > > > > I'm supposed to say 'no' to extra complexity more often, and > > this is definitely one of those cases: > > > > Nacked-by: Ingo Molnar <mingo@elte.hu> > > > > Also obviously NAK to the scheduler symbol export - that alone > > should tell you that it's not just a "driver" - it deeply hooks > > into the core kernel... > > > > Please respect the NAK. > > Will do, I'll go delete it from the staging-next tree now. Thanks Greg! Ingo ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-19 10:49 ` Ingo Molnar @ 2011-12-19 15:30 ` Mathieu Desnoyers 2011-12-20 11:08 ` Ingo Molnar 0 siblings, 1 reply; 51+ messages in thread From: Mathieu Desnoyers @ 2011-12-19 15:30 UTC (permalink / raw) To: Ingo Molnar Cc: Greg KH, devel, Peter Zijlstra, linux-kernel, lttng-dev, Mathieu Desnoyers, Andrew Morton, Linus Torvalds, Thomas Gleixner, Steven Rostedt * Ingo Molnar (mingo@elte.hu) wrote: > > * Greg KH <greg@kroah.com> wrote: > > > On Thu, Dec 08, 2011 at 06:23:54AM +0100, Ingo Molnar wrote: [...] > > > There's a highly constructive, open attitude towards LTTNG and > > > has been for years: > > > > > > " Mathieu, please split it up and integrate/unify it with the > > > existing instrumentation features of Linux - and if it > > > replaces existing stuff because an LTTNG component is > > > superior then so be it. " > > > > Ok, that's fair enough. > > > > Mathieu, will you please work on this? Or is there some > > reason you don't feel this is possible? > > Mathieu, any update on this? I don't want the LTTNG goodies to > drop on the floor - we just have to integrate them properly. > > If you 100% disagree with how specific things are done upstream > right now then don't hold back: just replace existing mechanisms > - that gives a starting point to discuss what the best way is > forward. I'm bringing a though question then: what should we do if I strongly think that the current ABIs should be replaced ? To support this, let's note that the current perf ABI: - lacks versioning information to handle change. I think shipping the tracer tools within the Linux tools/ directory made sense for an initial phase that made tracer solutions more popular for kernel developers (and it did a great job a that), but if we want to move on to build tools that target a wider audience, we should leave the tools/ sandbox and create separate projects, with clearly defined ABIs, using ABI versioning to manage changes. At this point, I think that perf tool shipped within tools/ is more than anything a pain for non-kernel-developer users, and favors design of sloppy ABIs. - makes it impossible to move to CTF (Common Trace Format) and benefit from the added features it allows, - makes it needlessly hard, if not impossible, for perf to move to something that would have the benefits brought by the fast unified ring buffer code I created 2 years ago, - makes it impossible to benefit from the LTTng fast trace clocks. Also, it should be noted that I am finding that the way perf evolved into a large monolithic binary blob that needs to be all enabled or all disabled makes it quite hard to extend and re-use. As a matter of fact, there are various cases where Steven and I tried to create performance tests for the perf ring buffer and just could not do it without hacking the perf code. I would definitely prefer to go for a modular approach for the in-kernel code, and an approach based on user-level libraries for low-level tracer interaction, with applications depending on those libraries, again all handled with ABI versioning and library versioning. I have to give recognition to perf: it's a fantastic performance counter management/sampling tool, but it has clearly never been geared towards low-overhead tracing, and this shows. One possible way for moving things forward is to leave the current perf/ftrace implementation and ABIs in place along with the existing tools. We could create a new ABI merging perf, ftrace and LTTng best features into one (e.g. kstrace for Kernel System Trace -- just made it up, better ideas are welcome), and gradually move the user-space part of the 3 tools to the new ABI. It is worth noting that the need for a new ABI is something many people involved in tracing -- by that I mean those doing most of the actual upstream tracer implementation work -- agreed upon in the last 2 years when meetings at conferences. This would allow a deprecation phase to take place, and would allow removal of the maintenance burden of the duplicated Perf/Ftrace ABIs, all that while also bringing in an ABI that allows handling of change and innovation, which is, IMHO, the key limiting factor of the current ABIs. By doing so, perf could become the set of tools targeting what it does best: performance counters management and sampling, ftrace could keep on targeting function tracing, and lttng could be used for all-system tracing, everyone sharing the same kernel-level implementation and ABIs (kstrace ABI). Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-19 15:30 ` [lttng-dev] " Mathieu Desnoyers @ 2011-12-20 11:08 ` Ingo Molnar 2011-12-20 21:46 ` Frank Rowand ` (2 more replies) 0 siblings, 3 replies; 51+ messages in thread From: Ingo Molnar @ 2011-12-20 11:08 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Greg KH, devel, Peter Zijlstra, linux-kernel, lttng-dev, Mathieu Desnoyers, Andrew Morton, Linus Torvalds, Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo (Cc:-ing Arnaldo on this as well.) * Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote: > > Mathieu, any update on this? I don't want the LTTNG goodies > > to drop on the floor - we just have to integrate them > > properly. > > > > If you 100% disagree with how specific things are done > > upstream right now then don't hold back: just replace > > existing mechanisms - that gives a starting point to discuss > > what the best way is forward. > > I'm bringing a though question then: what should we do if I > strongly think that the current ABIs should be replaced ? To > support this, let's note that the current perf ABI: > > - lacks versioning information to handle change. [...] That's not actually true on *any* level: we are changing, evolving and extending the perf ABIs all the time. There's two main API/ABI components: 1) the perf syscall which is part of the Linux syscall ABI. Individual versions of the ABI have (monotonically increasing) sizes for "struct perf_event_attr" - you can consider these natural ABI versioning. So the 'versioning' is not done via some inflexible and ugly, Windows-alike 'explicit ABI version' field, but done via structure sizes and -ENOSYS. We've iterated and versioned it numerous times in the past 10 kernel releases, in a backwards compatible manner. 2) the perf.data file The versioning there is capability bitmask based - modelled after ext2/ext3/ext4 capability bitmasks. It's extensible as well. I think your concentration on ABIs is missing a very fundamental property of instrumentation: the life-time and persistence of instrumentation data is typically very short ('days' is already an exception - typical is minutes, at most hours), and for that reason we havent been getting much pressure from users to maintain a perf.data ABI - but we are doing it nevertheless. Instrumentation is fundamentally about the 'here and now' and so it fundamentally differs from things like backup formats and database formats. An ABI does not hurt and we are maintaining it, but you are overrating its importance significantly. > [...] I think shipping the tracer tools within the Linux > tools/ directory made sense for an initial phase that made > tracer solutions more popular for kernel developers (and it > did a great job a that), but if we want to move on to build > tools that target a wider audience, we should leave the > tools/ sandbox and create separate projects, with clearly > defined ABIs, using ABI versioning to manage changes. At > this point, I think that perf tool shipped within tools/ is > more than anything a pain for non-kernel-developer users, > and favors design of sloppy ABIs. I think you've thoroughly misunderstood the upstream ABI versioning status quo, which makes your argument out of this world. The perf ABIs are well-defined and well-maintained. See an ad-hoc ABI and tool compatibility experiment i made here: [F.A.Q.] perf ABI backwards and forwards compatibility https://lkml.org/lkml/2011/11/8/77 > - makes it impossible to move to CTF (Common Trace Format) > and benefit from the added features it allows, "CTF" was mainly written by yourself, right? If there's any tool worth caring about that wants to deal in CTF then it can be converted just fine. I don't think it matters nearly as much as you seem to imply, see my reply further below. > - makes it needlessly hard, if not impossible, for perf to > move to something that would have the benefits brought by > the fast unified ring buffer code I created 2 years ago, The current upstream code actually has a fast unified ring-buffer, mmap()-ed to user-space, so you'd have to be a bit more specific about that point. > - makes it impossible to benefit from the LTTng fast trace > clocks. We have various trace clocks upstream as well - so you'd have to outline it specifically why it's "impossible". > Also, it should be noted that I am finding that the way perf > evolved into a large monolithic binary blob that needs to be > all enabled or all disabled makes it quite hard to extend and > re-use. [...] There's a (very) healthy in-flux of features - it's one of the most active kernel and userpace projects we have. So *others* don't find it hard to work with. If you have specific observations i'm sure Arnaldo will appreciate them. [ I snipped the rest of your reply - you seem to have deep rooted misconceptions about what the current upstream principles and practices are in this area: you are banging on open doors! ] Anyway, my prior request+offer stands: please split LTTNG up into individual feature blocks done to extend or replace existing instrumentation features and offer them as changes to existing upstream instrumentation code. We want every conceivable useful feature, but we *really* don't want schizophrenic duplication in this area. Thanks, Ingo ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-20 11:08 ` Ingo Molnar @ 2011-12-20 21:46 ` Frank Rowand 2011-12-23 10:51 ` Ingo Molnar 2011-12-21 18:47 ` Aaron Spear 2011-12-23 16:46 ` Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) Mathieu Desnoyers 2 siblings, 1 reply; 51+ messages in thread From: Frank Rowand @ 2011-12-20 21:46 UTC (permalink / raw) To: Ingo Molnar Cc: Mathieu Desnoyers, Greg KH, devel@driverdev.osuosl.org, Peter Zijlstra, linux-kernel@vger.kernel.org, lttng-dev@lists.lttng.org, Mathieu Desnoyers, Andrew Morton, Linus Torvalds, Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo On 12/20/11 03:08, Ingo Molnar wrote: > > (Cc:-ing Arnaldo on this as well.) > > * Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote: > < snip > > I think your concentration on ABIs is missing a very fundamental > property of instrumentation: > > the life-time and persistence of instrumentation data is > typically very short ('days' is already an exception - typical > is minutes, at most hours), and for that reason we havent been > getting much pressure from users to maintain a perf.data ABI - > but we are doing it nevertheless. > > Instrumentation is fundamentally about the 'here and now' and so > it fundamentally differs from things like backup formats and > database formats. An ABI does not hurt and we are maintaining > it, but you are overrating its importance significantly. Just to provide visibility to a different use case... The life time of my data is typically weeks, months, or years (though I am not likely to re-process year old raw data). < snip > -Frank ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-20 21:46 ` Frank Rowand @ 2011-12-23 10:51 ` Ingo Molnar 0 siblings, 0 replies; 51+ messages in thread From: Ingo Molnar @ 2011-12-23 10:51 UTC (permalink / raw) To: Frank Rowand Cc: Mathieu Desnoyers, Greg KH, devel@driverdev.osuosl.org, Peter Zijlstra, linux-kernel@vger.kernel.org, lttng-dev@lists.lttng.org, Mathieu Desnoyers, Andrew Morton, Linus Torvalds, Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo * Frank Rowand <frank.rowand@am.sony.com> wrote: > On 12/20/11 03:08, Ingo Molnar wrote: > > > > (Cc:-ing Arnaldo on this as well.) > > > > * Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote: > > > > < snip > > > > I think your concentration on ABIs is missing a very fundamental > > property of instrumentation: > > > > the life-time and persistence of instrumentation data is > > typically very short ('days' is already an exception - typical > > is minutes, at most hours), and for that reason we havent been > > getting much pressure from users to maintain a perf.data ABI - > > but we are doing it nevertheless. > > > > Instrumentation is fundamentally about the 'here and now' and so > > it fundamentally differs from things like backup formats and > > database formats. An ABI does not hurt and we are maintaining > > it, but you are overrating its importance significantly. > > Just to provide visibility to a different use case... > > The life time of my data is typically weeks, months, or years > (though I am not likely to re-process year old raw data). I'm not saying that it's absolutely never done: for example monitoring/logging on a production box and evaluating events only once per month would certainly qualify. I just say that the overwhelming majority of usecases utilize traces on a short time-span and that we must keep the common usecase in mind when supporting not so common usecases. It's the same deal as with -rt: compared to the 'normal' usage of Linux -rt is somewhat of a special case - yet it's still something very much worth doing, as long as the main usecase is always kept in mind. Thanks, Ingo ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-20 11:08 ` Ingo Molnar 2011-12-20 21:46 ` Frank Rowand @ 2011-12-21 18:47 ` Aaron Spear 2011-12-21 18:58 ` Christoph Hellwig 2011-12-23 16:46 ` Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) Mathieu Desnoyers 2 siblings, 1 reply; 51+ messages in thread From: Aaron Spear @ 2011-12-21 18:47 UTC (permalink / raw) To: Ingo Molnar Cc: devel, Peter Zijlstra, Greg KH, linux-kernel, Steven Rostedt, Arnaldo Carvalho de Melo, lttng-dev, Mathieu Desnoyers, Andrew Morton, Linus Torvalds, Thomas Gleixner, Mathieu Desnoyers * Ingo Molnar <mingo@elte.hu> wrote: > "CTF" was mainly written by yourself, right? > > If there's any tool worth caring about that wants to deal in CTF > then it can be converted just fine. I don't think it matters > nearly as much as you seem to imply, see my reply further below. Hi Ingo, I thought it might be a useful point of reference to mention that there is a commitment to CTF for more than just LTTng. The Multicore Association and member companies including TI, Freescale, Samsung, Mentor Graphics, Wind River Systems, VMware and others intend to use CTF as a lingua franca for correlation of traces taken from different tracing technologies in heterogeneous multi-core systems. Linux is pivotal here of course, but we are also aggregating various types of hardware traces as well as instrumentation trace from bare metal, RTOS's, and other OS's. Many of the requirements that went into the draft CTF specification were driven by this working groups experience in the embedded industry and many different legacy tracing technologies. While Mathieu has been instrumental in creating CTF, he is certainly not the only one with a vested interest in its future. respectfully, Aaron Spear - VMware Chairman, Multicore Association Tools Infrastructure Working Group ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-21 18:47 ` Aaron Spear @ 2011-12-21 18:58 ` Christoph Hellwig 0 siblings, 0 replies; 51+ messages in thread From: Christoph Hellwig @ 2011-12-21 18:58 UTC (permalink / raw) To: Aaron Spear Cc: Ingo Molnar, devel, Peter Zijlstra, Greg KH, linux-kernel, Steven Rostedt, Arnaldo Carvalho de Melo, lttng-dev, Mathieu Desnoyers, Andrew Morton, Linus Torvalds, Thomas Gleixner, Mathieu Desnoyers Vmware using it is more a reason to avoid it than using it.. :) And most certainly not a reason to export internal kernel details. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) 2011-12-20 11:08 ` Ingo Molnar 2011-12-20 21:46 ` Frank Rowand 2011-12-21 18:47 ` Aaron Spear @ 2011-12-23 16:46 ` Mathieu Desnoyers 2011-12-23 17:21 ` Ted Ts'o 2 siblings, 1 reply; 51+ messages in thread From: Mathieu Desnoyers @ 2011-12-23 16:46 UTC (permalink / raw) To: Ingo Molnar Cc: Greg KH, devel, Peter Zijlstra, linux-kernel, lttng-dev, Andrew Morton, Linus Torvalds, Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo Hi Ingo, I'll break down my reply in various sub-topics, and address them separately in the following weeks. Let's start with the ABIs. * Ingo Molnar (mingo@elte.hu) wrote: > > (Cc:-ing Arnaldo on this as well.) > > * Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote: > > > > Mathieu, any update on this? I don't want the LTTNG goodies > > > to drop on the floor - we just have to integrate them > > > properly. > > > > > > If you 100% disagree with how specific things are done > > > upstream right now then don't hold back: just replace > > > existing mechanisms - that gives a starting point to discuss > > > what the best way is forward. > > > > I'm bringing a though question then: what should we do if I > > strongly think that the current ABIs should be replaced ? To > > support this, let's note that the current perf ABI: > > > > - lacks versioning information to handle change. [...] > > That's not actually true on *any* level: we are changing, > evolving and extending the perf ABIs all the time. You may be able to evolve and extend the Perf ABI, but the way this ABI is designed does not allow you to change it in ways that would introduce ABI incompatibility between versions (the equivalent of a major version number change). You're therefore gradually painting yourself in a corner without any ability to go back and revisit previous decisions, and this is bad because revisiting those past decisions will be needed to bring in some LTTng features, because those decisions were taken without having those features in mind. Supporting a new feature is not always as easy as "extending a structure" as you seem to imply. > There's two main API/ABI components: > > 1) the perf syscall which is part of the Linux syscall ABI. > > Individual versions of the ABI have (monotonically increasing) > sizes for "struct perf_event_attr" - you can consider these > natural ABI versioning. > > So the 'versioning' is not done via some inflexible and ugly, > Windows-alike 'explicit ABI version' field, but done via > structure sizes and -ENOSYS. Judging versions as inflexibile and ugly is merely a matter of taste. However, the inability to do any kind of major change due to the way the Perf ABI is made has a clear direct impact on the ability to innovate within this project. > We've iterated and versioned it numerous times in the past 10 > kernel releases, in a backwards compatible manner. > > 2) the perf.data file > > The versioning there is capability bitmask based - modelled > after ext2/ext3/ext4 capability bitmasks. It's extensible as > well. AFAIU, filesystems have very strict compatibility requirements because they sit on hard drives for years on live systems that cannot always easily permit migration between incompatible layouts. Traces don't have the same constraints (see below), > > I think your concentration on ABIs is missing a very fundamental > property of instrumentation: > > the life-time and persistence of instrumentation data is > typically very short ('days' is already an exception - typical > is minutes, at most hours), and for that reason we havent been > getting much pressure from users to maintain a perf.data ABI - > but we are doing it nevertheless. > > Instrumentation is fundamentally about the 'here and now' and so > it fundamentally differs from things like backup formats and > database formats. An ABI does not hurt and we are maintaining > it, but you are overrating its importance significantly. I think you are really focusing on a developer use-case, which might be why you are missing the big picture. How many Linux developers are out there ? How many Linux system administrators are out there ? Many, many more. With all due respect, I'm afraid your definition of "typically" is limited by your developer-centric vision. So far, I came up with the following breakdown of use-cases in terms of trace data life-span: - Long-persistence traces (old traces): for this use-case, a conversion phase is usually OK. These long-persistance traces are useful in production system monitoring scenarios, and for finding delta in execution between different runs of a test suite (for instance). This use-case allows format breakage if the old format can be identified by a trace converter. - Short-lived traces (debugging use-case): pretty much anything would do, as long as the user-level tool can detect if it understands the layout. - Live traces: we want to minimize the overhead, both on the trace producer and on the machine performing the data analysis (which can be either the traced machine or a separate host), while still providing a live stream of data. This is useful for applications like lttngtop (showing a live report of the system) and for production system monitoring. In this case, we want the tools to be able to find out if they can read the trace format (or report an error, asking for upgrade if they can't). Trace conversion is not appropriate in this scenario due to the added timing complexity and overhead. As you will notice, none of these use-cases require a filesystem-alike bitmask-based compatibility ABI at the trace format level. Using explicit versioning allows drastic changes to be done when they are required, in the process allowing a trace converter to be used to deal with "old" legacy traces, and allowing a live trace aggregator/analyzer to detect if it can support the live trace stream. > > [...] I think shipping the tracer tools within the Linux > > tools/ directory made sense for an initial phase that made > > tracer solutions more popular for kernel developers (and it > > did a great job a that), but if we want to move on to build > > tools that target a wider audience, we should leave the > > tools/ sandbox and create separate projects, with clearly > > defined ABIs, using ABI versioning to manage changes. At > > this point, I think that perf tool shipped within tools/ is > > more than anything a pain for non-kernel-developer users, > > and favors design of sloppy ABIs. > > I think you've thoroughly misunderstood the upstream ABI > versioning status quo, which makes your argument out of this > world. > > The perf ABIs are well-defined and well-maintained. See an > ad-hoc ABI and tool compatibility experiment i made here: > > [F.A.Q.] perf ABI backwards and forwards compatibility > https://lkml.org/lkml/2011/11/8/77 I hope my answer above explains why I think the what perf handles ABI changes is a terrible choice. In summary: - Perf is painting itself in a corner, not allowing any ABI breakage, only "extensions", which limits integration of features that require core changes, - It's doing so without even needing it: Perf is using an ABI versioning scheme designed for filesystems, when it is not in fact driven by the same constraints. Best regards, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) 2011-12-23 16:46 ` Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) Mathieu Desnoyers @ 2011-12-23 17:21 ` Ted Ts'o 2011-12-23 18:16 ` Mathieu Desnoyers 0 siblings, 1 reply; 51+ messages in thread From: Ted Ts'o @ 2011-12-23 17:21 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Ingo Molnar, Greg KH, devel, Peter Zijlstra, linux-kernel, lttng-dev, Andrew Morton, Linus Torvalds, Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo On Fri, Dec 23, 2011 at 11:46:29AM -0500, Mathieu Desnoyers wrote: > - It's doing so without even needing it: Perf is using an ABI versioning > scheme designed for filesystems, when it is not in fact driven by the > same constraints. Well, there are *some* constraints. I've been assured that despite the fact that the perf client is in the kernel sources (something which I still think is a bad idea, since it's leading to other bad choices like kvm-tool wanting to be bundled with kernel sources), that it is *not* a license to jerk the format around wildly --- that people will have installed userspace binaries that shouldn't randomly break they boot a new kernel. So I'm *glad* that Perf is using an ABI versioning scheme that accepts the same restraints as file systems. It means we don't randomly break userspace tools. So Mathieu, if you think it is the current standards of backwards compatibility are too rigid, what level of tool breakage do you think is acceptable? It's not just about the backwards compatibility of the trace files, it's also about compatibility of userspace utilities. For example, systemtap, where you had to recompile from source at each kernel revision, and pray it would still build goes too far in the other direction, wouldn't you agree? What is the correct level of kernel developer annoyance you think is appropriate to inflict on ourselves? Regards, - Ted ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) 2011-12-23 17:21 ` Ted Ts'o @ 2011-12-23 18:16 ` Mathieu Desnoyers 2011-12-25 17:46 ` Ted Ts'o 0 siblings, 1 reply; 51+ messages in thread From: Mathieu Desnoyers @ 2011-12-23 18:16 UTC (permalink / raw) To: Ted Ts'o, Ingo Molnar, Greg KH, devel, Peter Zijlstra, linux-kernel, lttng-dev, Andrew Morton, Linus Torvalds, Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo Hi Ted, * Ted Ts'o (tytso@mit.edu) wrote: > On Fri, Dec 23, 2011 at 11:46:29AM -0500, Mathieu Desnoyers wrote: > > - It's doing so without even needing it: Perf is using an ABI versioning > > scheme designed for filesystems, when it is not in fact driven by the > > same constraints. > > Well, there are *some* constraints. I've been assured that despite > the fact that the perf client is in the kernel sources (something > which I still think is a bad idea, since it's leading to other bad > choices like kvm-tool wanting to be bundled with kernel sources), that > it is *not* a license to jerk the format around wildly --- that people > will have installed userspace binaries that shouldn't randomly break > they boot a new kernel. > > So I'm *glad* that Perf is using an ABI versioning scheme that accepts > the same restraints as file systems. It means we don't randomly break > userspace tools. > > So Mathieu, if you think it is the current standards of backwards > compatibility are too rigid, what level of tool breakage do you think > is acceptable? It's not just about the backwards compatibility of the > trace files, it's also about compatibility of userspace utilities. > > For example, systemtap, where you had to recompile from source at > each kernel revision, and pray it would still build goes too far in > the other direction, wouldn't you agree? What is the correct level of > kernel developer annoyance you think is appropriate to inflict on > ourselves? I completely agree that systemtap did not have the right level of compatibility towards changes. It clearly does not make sense to require the tools to be updated whenever the kernel version and instrumentation changes. What makes sense to me, though, is to allow breakage when a newly introduced tracer feature requires the ABI to break. What I currently see as a tradeoff sweet-spot between compatibility burden and ability to innovate is to split the ABI and handle compatibility as follows: - ABIs to control the tracer - Versioned, ideally always incrementally adding features, but still keeping room for major changes if needed. We should expect very, very seldom breakages on this front. This requires update of tracer control tools when the ABI is broken. - ABIs to transport tracing data - Versioned, can and should change when a feature or transport performance enhancement require to break compatibility. This requires update of trace data consumer tools when compability is broken. (note that ABI to control the tracer and ABI to transport data could share the same version numbering if the control tools and transport tools happen to reside in the same user-level packages) - The trace data format - Both versioned _and_ self-described. Self-description of the event/field layout allows the same tools to understand traces gathered on different kernel versions, on different architectures, with different tracer configurations. Versioning on top of the self-described trace format allows changes to what the trace self-description can express. So the breakages would happen only when required by tracer tool capability enhancements, not randomly when a kernel instrumentation source happens to change. Best regards, Mathieu P.S.: my next replies will be slightly delayed, due to Christmas holidays. > > Regards, > > > - Ted -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) 2011-12-23 18:16 ` Mathieu Desnoyers @ 2011-12-25 17:46 ` Ted Ts'o 2012-01-12 14:09 ` Mathieu Desnoyers 0 siblings, 1 reply; 51+ messages in thread From: Ted Ts'o @ 2011-12-25 17:46 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Ingo Molnar, Greg KH, devel, Peter Zijlstra, linux-kernel, lttng-dev, Andrew Morton, Linus Torvalds, Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo On Fri, Dec 23, 2011 at 01:16:41PM -0500, Mathieu Desnoyers wrote: > > (note that ABI to control the tracer and ABI to transport data could > share the same version numbering if the control tools and transport > tools happen to reside in the same user-level packages) Being able to control the tracer but then not being able to look at the trace output is useless. So they might as well be the same thing.... > - The trace data format > - Both versioned _and_ self-described. > Self-description of the event/field layout allows the same tools to > understand traces gathered on different kernel versions, on different > architectures, with different tracer configurations. > Versioning on top of the self-described trace format allows changes > to what the trace self-description can express. So there are two ways to do this. One is to make changes be backwards compatible, so that the trace data format only breaks if you use the new feature; if it doesn't you encode things the old fashioned way. The other way of doing things is to randomly break users whenever the tracing developers decide to add some random new feature, regardless of whether or not a partiuclar user finds that new feature to be useful. The first is acceptable. The second, IMHO, is not. Linus has said quite strongly that WE DO NOT BREAK USERSPACE. Period. Regards, - Ted ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) 2011-12-25 17:46 ` Ted Ts'o @ 2012-01-12 14:09 ` Mathieu Desnoyers 2012-01-12 14:54 ` Steven Rostedt 0 siblings, 1 reply; 51+ messages in thread From: Mathieu Desnoyers @ 2012-01-12 14:09 UTC (permalink / raw) To: Ted Ts'o, Ingo Molnar, Greg KH, devel, Peter Zijlstra, linux-kernel, lttng-dev, Andrew Morton, Linus Torvalds, Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo * Ted Ts'o (tytso@mit.edu) wrote: > On Fri, Dec 23, 2011 at 01:16:41PM -0500, Mathieu Desnoyers wrote: [...] > > - The trace data format > > - Both versioned _and_ self-described. > > Self-description of the event/field layout allows the same tools to > > understand traces gathered on different kernel versions, on different > > architectures, with different tracer configurations. > > Versioning on top of the self-described trace format allows changes > > to what the trace self-description can express. > > So there are two ways to do this. One is to make changes be backwards > compatible, so that the trace data format only breaks if you use the > new feature; if it doesn't you encode things the old fashioned way. > The other way of doing things is to randomly break users whenever the > tracing developers decide to add some random new feature, regardless > of whether or not a partiuclar user finds that new feature to be > useful. > > The first is acceptable. The second, IMHO, is not. Linus has said > quite strongly that WE DO NOT BREAK USERSPACE. Period. Please allow me to look into what needs to be kept compatible for a good user experience (for both Linux end users and kernel developers) in the case of tracing: Let's first describe what we really utterly don't want: random breakages between the kernel and user-level tracing control/transport/analysis tools. Consequently, I think we could say that it would be unacceptable for userspace tools to break for every slight change of kernel code. If that would be the case (as it was with the approach SystemTap was taking before they started hooking into the kernel with tracepoints), then we'd need to regenerate the tools for pretty much every -rc kernel, and for each local development tree, which would make those tools useless to kernel developers. It is important to clarify that tracing is, in my opinion, not part of the runtime support, which makes it very different by nature from filesystems and kernel runtime support. So I agree with Linus' argument about not breaking userspace when applied to runtime support, because being unable to even boot a system due to an ABI breakage is very much unwanted. However, I think it should not be applied as-is to tracing, because you cannot make a system unusable due to a tracer ABI breakage: if a tracer can be packaged in a set of standalone modules, that clearly shows it is not part of the system runtime support. That being said, ABI versioning could still handle ABI changes without significantly impacting the users: when an ABI breakage is needed, we can keep the old code around for a while and expose both the old and new ABIs. This would ensure that the user-level tools can query for the specific ABI major version(s) they support. That should improve the user experience by providing "deprecated" console warnings for a few kernel releases before the old code ends up being removed. So, in summary: * Old kernels vs new tools: New tools can query for the latest ABI they know, and fall-back on older ABIs, with limited features. * New kernels vs old tools: Keeping around the old ABI for a deprecation phase lets old tools work on a bleeding edge kernel while the ABI change is being introduced, which should satisfy the kernel developer use-case. Best regards, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) 2012-01-12 14:09 ` Mathieu Desnoyers @ 2012-01-12 14:54 ` Steven Rostedt 2012-01-12 15:39 ` [lttng-dev] Perf ABI (was: " Mathieu Desnoyers 0 siblings, 1 reply; 51+ messages in thread From: Steven Rostedt @ 2012-01-12 14:54 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Ted Ts'o, Ingo Molnar, Greg KH, devel, Peter Zijlstra, linux-kernel, lttng-dev, Andrew Morton, Linus Torvalds, Thomas Gleixner, Arnaldo Carvalho de Melo On Thu, 2012-01-12 at 09:09 -0500, Mathieu Desnoyers wrote: > It is important to clarify that tracing is, in my opinion, not part of > the runtime support, which makes it very different by nature from > filesystems and kernel runtime support. So I agree with Linus' argument > about not breaking userspace when applied to runtime support, because > being unable to even boot a system due to an ABI breakage is very much > unwanted. However, I think it should not be applied as-is to tracing, > because you cannot make a system unusable due to a tracer ABI breakage: > if a tracer can be packaged in a set of standalone modules, that clearly > shows it is not part of the system runtime support. Correct that tracing is not something that needs to make the system run, but that's still no excuse to make ABI changes any different. Note, we don't change things within the /proc/stat or /proc/*/stat and that's not required to make the system run. We can add onto those files, but we can't change what the current numbers mean. > > That being said, ABI versioning could still handle ABI changes without > significantly impacting the users: when an ABI breakage is needed, we > can keep the old code around for a while and expose both the old and new > ABIs. This would ensure that the user-level tools can query for the > specific ABI major version(s) they support. That should improve the user > experience by providing "deprecated" console warnings for a few kernel > releases before the old code ends up being removed. ABI version numbers are meaningless, and prone to be broken. The change would have to be added with the commit that updates the change otherwise git bisecting can get screwed up too. The way ABI changes in the kernel have always been was to look at the file itself and have the tool be able to determine what version of the ABI is there based on what files exists, or what exists in the file. I've done this with trace-cmd and ftrace. The debugfs system has changed a lot, and trace-cmd can handle each change. I never had a need for a version number to do this. I simply have trace-cmd look at what is available and what isn't. If you need to know if a syscall exists, you try it and if you get -ENOSYS, then you know it doesn't exist. We have no need for an arbitrary version number that is meaningless. The existence of (or lack of) tells us all we need to know. > > So, in summary: > > * Old kernels vs new tools: > > New tools can query for the latest ABI they know, and fall-back on older > ABIs, with limited features. > > * New kernels vs old tools: > > Keeping around the old ABI for a deprecation phase lets old tools work on > a bleeding edge kernel while the ABI change is being introduced, which > should satisfy the kernel developer use-case. We've done this without version numbers. Just look at all the udev changes. -- Steve ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules) 2012-01-12 14:54 ` Steven Rostedt @ 2012-01-12 15:39 ` Mathieu Desnoyers 2012-01-12 15:53 ` Steven Rostedt 2012-01-12 20:00 ` Greg KH 0 siblings, 2 replies; 51+ messages in thread From: Mathieu Desnoyers @ 2012-01-12 15:39 UTC (permalink / raw) To: Steven Rostedt Cc: Mathieu Desnoyers, devel, Ted Ts'o, Peter Zijlstra, Greg KH, linux-kernel, Arnaldo Carvalho de Melo, lttng-dev, Thomas Gleixner, Ingo Molnar, Linus Torvalds, Andrew Morton * Steven Rostedt (rostedt@goodmis.org) wrote: > On Thu, 2012-01-12 at 09:09 -0500, Mathieu Desnoyers wrote: > > > > It is important to clarify that tracing is, in my opinion, not part of > > the runtime support, which makes it very different by nature from > > filesystems and kernel runtime support. So I agree with Linus' argument > > about not breaking userspace when applied to runtime support, because > > being unable to even boot a system due to an ABI breakage is very much > > unwanted. However, I think it should not be applied as-is to tracing, > > because you cannot make a system unusable due to a tracer ABI breakage: > > if a tracer can be packaged in a set of standalone modules, that clearly > > shows it is not part of the system runtime support. > > Correct that tracing is not something that needs to make the system run, > but that's still no excuse to make ABI changes any different. Note, we > don't change things within the /proc/stat or /proc/*/stat and that's not > required to make the system run. We can add onto those files, but we > can't change what the current numbers mean. This is because this stat ABI is volountarily exposed like this. It does not mean that this is the case everywhere else in the kernel. And it might not be the right way to expose it: I bet that PeterZ would really like to get the thread priority value removed from /proc/*/stat, because it exposes something "internal" to the scheduler from his point of view, but this particular ABI has chosen to evolve without ever retiring a value previously exported. > > > > > That being said, ABI versioning could still handle ABI changes without > > significantly impacting the users: when an ABI breakage is needed, we > > can keep the old code around for a while and expose both the old and new > > ABIs. This would ensure that the user-level tools can query for the > > specific ABI major version(s) they support. That should improve the user > > experience by providing "deprecated" console warnings for a few kernel > > releases before the old code ends up being removed. > > ABI version numbers are meaningless, and prone to be broken. The change > would have to be added with the commit that updates the change otherwise > git bisecting can get screwed up too. Of course, the commit that updates the code would "fork" to a new ABI if it ever need to diverge from the old one. > The way ABI changes in the kernel have always been was to look at the > file itself and have the tool be able to determine what version of the > ABI is there based on what files exists, or what exists in the file. > I've done this with trace-cmd and ftrace. The debugfs system has changed > a lot, and trace-cmd can handle each change. I never had a need for a > version number to do this. I simply have trace-cmd look at what is > available and what isn't. > > If you need to know if a syscall exists, you try it and if you get > -ENOSYS, then you know it doesn't exist. We have no need for an > arbitrary version number that is meaningless. The existence of (or lack > of) tells us all we need to know. pipe()/pipe2() dup()/dup2()/dup3() umount()/umount2() mmap()/mmap2() madvise()/madvise1() eventfd()/eventfd2() Those look very much like major version numbers to me. And these are entirely compatible with your statement above about using -ENOSYS to detect if the major version number is implemented or not. If your only concern is that the major version number should be part of the ABI name (as in the examples above), that can be arranged. > > > > > So, in summary: > > > > * Old kernels vs new tools: > > > > New tools can query for the latest ABI they know, and fall-back on older > > ABIs, with limited features. > > > > * New kernels vs old tools: > > > > Keeping around the old ABI for a deprecation phase lets old tools work on > > a bleeding edge kernel while the ABI change is being introduced, which > > should satisfy the kernel developer use-case. > > We've done this without version numbers. Just look at all the udev > changes. Are you seriously refering to udev as an example of how to handle changes, or as one of the worse ABI breakage mess that happened in the Linux kernel history ? My own experience as a Linux users (in the era around 2.6.12 kernels if my memory serves me right) lead me to think it's the latter. And because udev is part of the runtime support, that indeed led to non-bootable systems and lots of frustrated users. Thanks, Mathieu > > -- Steve > > > > _______________________________________________ > lttng-dev mailing list > lttng-dev@lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules) 2012-01-12 15:39 ` [lttng-dev] Perf ABI (was: " Mathieu Desnoyers @ 2012-01-12 15:53 ` Steven Rostedt 2012-01-12 15:59 ` Steven Rostedt 2012-01-12 16:27 ` Mathieu Desnoyers 2012-01-12 20:00 ` Greg KH 1 sibling, 2 replies; 51+ messages in thread From: Steven Rostedt @ 2012-01-12 15:53 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Mathieu Desnoyers, devel, Ted Ts'o, Peter Zijlstra, Greg KH, linux-kernel, Arnaldo Carvalho de Melo, lttng-dev, Thomas Gleixner, Ingo Molnar, Linus Torvalds, Andrew Morton On Thu, 2012-01-12 at 10:39 -0500, Mathieu Desnoyers wrote: > pipe()/pipe2() > dup()/dup2()/dup3() > umount()/umount2() > mmap()/mmap2() > madvise()/madvise1() > eventfd()/eventfd2() > > Those look very much like major version numbers to me. And these are > entirely compatible with your statement above about using -ENOSYS to > detect if the major version number is implemented or not. That's a stretch in calling version numbers. All but the madvise case above are how many parameters it takes, not really a "version" number. It's adding a new syscall, not updating a version and then deprecating the old one. As I believe all the above are still supported. > > If your only concern is that the major version number should be part of > the ABI name (as in the examples above), that can be arranged. > > > > We've done this without version numbers. Just look at all the udev > > changes. > > Are you seriously refering to udev as an example of how to handle > changes, or as one of the worse ABI breakage mess that happened in the > Linux kernel history ? My own experience as a Linux users (in the > era around 2.6.12 kernels if my memory serves me right) lead me to think > it's the latter. And because udev is part of the runtime support, that > indeed led to non-bootable systems and lots of frustrated users. Yeah, I know it sucked, as I got burned by it too. But having "version" numbers wouldn't have helped at all. In fact, it should have kept both ways working much longer, or at least had the new udev support both. What udev did is more like what you want to do than what I did with trace-cmd. -- Steve ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules) 2012-01-12 15:53 ` Steven Rostedt @ 2012-01-12 15:59 ` Steven Rostedt 2012-01-12 16:27 ` Mathieu Desnoyers 1 sibling, 0 replies; 51+ messages in thread From: Steven Rostedt @ 2012-01-12 15:59 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Mathieu Desnoyers, devel, Ted Ts'o, Peter Zijlstra, Greg KH, linux-kernel, Arnaldo Carvalho de Melo, lttng-dev, Thomas Gleixner, Ingo Molnar, Linus Torvalds, Andrew Morton On Thu, 2012-01-12 at 10:53 -0500, Steven Rostedt wrote: > That's a stretch in calling version numbers. All but the madvise case > above are how many parameters it takes, not really a "version" number. > > It's adding a new syscall, not updating a version and then deprecating > the old one. As I believe all the above are still supported. > Actually, the madvise1() isn't supported. But this just shows that it has nothing to do with a version number. What version is madvise()? -- Steve ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules) 2012-01-12 15:53 ` Steven Rostedt 2012-01-12 15:59 ` Steven Rostedt @ 2012-01-12 16:27 ` Mathieu Desnoyers 2012-01-12 16:34 ` Steven Rostedt 1 sibling, 1 reply; 51+ messages in thread From: Mathieu Desnoyers @ 2012-01-12 16:27 UTC (permalink / raw) To: Steven Rostedt Cc: devel, Ted Ts'o, Peter Zijlstra, Greg KH, linux-kernel, Arnaldo Carvalho de Melo, lttng-dev, Thomas Gleixner, Ingo Molnar, Linus Torvalds, Andrew Morton * Steven Rostedt (rostedt@goodmis.org) wrote: > On Thu, 2012-01-12 at 10:39 -0500, Mathieu Desnoyers wrote: > > > pipe()/pipe2() > > dup()/dup2()/dup3() > > umount()/umount2() > > mmap()/mmap2() > > madvise()/madvise1() > > eventfd()/eventfd2() > > > > Those look very much like major version numbers to me. And these are > > entirely compatible with your statement above about using -ENOSYS to > > detect if the major version number is implemented or not. > > That's a stretch in calling version numbers. All but the madvise case > above are how many parameters it takes, not really a "version" number. > > It's adding a new syscall, not updating a version and then deprecating > the old one. As I believe all the above are still supported. > > > > > If your only concern is that the major version number should be part of > > the ABI name (as in the examples above), that can be arranged. > > > > > > > We've done this without version numbers. Just look at all the udev > > > changes. > > > > Are you seriously refering to udev as an example of how to handle > > changes, or as one of the worse ABI breakage mess that happened in the > > Linux kernel history ? My own experience as a Linux users (in the > > era around 2.6.12 kernels if my memory serves me right) lead me to think > > it's the latter. And because udev is part of the runtime support, that > > indeed led to non-bootable systems and lots of frustrated users. > > Yeah, I know it sucked, as I got burned by it too. But having "version" > numbers wouldn't have helped at all. In fact, it should have kept both > ways working much longer, or at least had the new udev support both. > > What udev did is more like what you want to do than what I did with > trace-cmd. OK. Then how can trace-cmd support the LTTng features ? Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules) 2012-01-12 16:27 ` Mathieu Desnoyers @ 2012-01-12 16:34 ` Steven Rostedt 0 siblings, 0 replies; 51+ messages in thread From: Steven Rostedt @ 2012-01-12 16:34 UTC (permalink / raw) To: Mathieu Desnoyers Cc: devel, Ted Ts'o, Peter Zijlstra, Greg KH, linux-kernel, Arnaldo Carvalho de Melo, lttng-dev, Thomas Gleixner, Ingo Molnar, Linus Torvalds, Andrew Morton On Thu, 2012-01-12 at 11:27 -0500, Mathieu Desnoyers wrote: > > What udev did is more like what you want to do than what I did with > > trace-cmd. > > OK. Then how can trace-cmd support the LTTng features ? New syscalls, or new files, and simply check if they exist. New features should not break old ones. -- Steve ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules) 2012-01-12 15:39 ` [lttng-dev] Perf ABI (was: " Mathieu Desnoyers 2012-01-12 15:53 ` Steven Rostedt @ 2012-01-12 20:00 ` Greg KH 2012-01-16 8:55 ` Ingo Molnar 1 sibling, 1 reply; 51+ messages in thread From: Greg KH @ 2012-01-12 20:00 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Steven Rostedt, Mathieu Desnoyers, devel, Ted Ts'o, Peter Zijlstra, linux-kernel, Arnaldo Carvalho de Melo, lttng-dev, Thomas Gleixner, Ingo Molnar, Linus Torvalds, Andrew Morton On Thu, Jan 12, 2012 at 10:39:57AM -0500, Mathieu Desnoyers wrote: > > We've done this without version numbers. Just look at all the udev > > changes. > > Are you seriously refering to udev as an example of how to handle > changes, or as one of the worse ABI breakage mess that happened in the > Linux kernel history ? My own experience as a Linux users (in the > era around 2.6.12 kernels if my memory serves me right) lead me to think > it's the latter. And because udev is part of the runtime support, that > indeed led to non-bootable systems and lots of frustrated users. Really? You fail to remember the fact that we _fixed_ those non-bootable systems by putting the userspace bits back, and symlinks, and all other sorts of gyrations in order to prevent userspace from breaking again. And it worked, and people's machines worked again, and no one since then has reported a problem. So I think udev actually is a good example of how to do it right, we provide proper backwards compatibility in the kernel to keep userspace working. thanks, greg k-h ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules) 2012-01-12 20:00 ` Greg KH @ 2012-01-16 8:55 ` Ingo Molnar 0 siblings, 0 replies; 51+ messages in thread From: Ingo Molnar @ 2012-01-16 8:55 UTC (permalink / raw) To: Greg KH Cc: Mathieu Desnoyers, Steven Rostedt, Mathieu Desnoyers, devel, Ted Ts'o, Peter Zijlstra, linux-kernel, Arnaldo Carvalho de Melo, lttng-dev, Thomas Gleixner, Linus Torvalds, Andrew Morton * Greg KH <greg@kroah.com> wrote: > So I think udev actually is a good example of how to do it > right, we provide proper backwards compatibility in the kernel > to keep userspace working. I agree, i still have a udev system that i installed 5 years ago, and it's working mostly fine with current kernels. Compatibility is a desirable property, it is something that preserves our users - and if done right it's almost never a big issue technically. If it is hindering someone then there must be other problems. Of course to developers the simplest approach is always to just develop without regard for compatibility. The simplest form of that is that people write patches that work fine on their own systems but crash the kernel on other systems. We fix those bugs. Another, subtler form is when the patches work fine on their systems but break apps on other systems. We fix those bugs too. That's why we have testing, regression tracking and maintainers, to control that - compatibility is just another dimension to 'correctness', in the typical case with no inherent restrictions on future features and possibilities. Thanks, Ingo ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-05 14:17 ` Ingo Molnar 2011-12-06 21:44 ` Greg KH @ 2011-12-07 22:57 ` Mathieu Desnoyers 2011-12-08 5:40 ` Ingo Molnar 1 sibling, 1 reply; 51+ messages in thread From: Mathieu Desnoyers @ 2011-12-07 22:57 UTC (permalink / raw) To: Ingo Molnar Cc: Greg KH, Peter Zijlstra, devel, lttng-dev, linux-kernel, Linus Torvalds, Andrew Morton, Thomas Gleixner, Steven Rostedt, Frederic Weisbecker Hi Ingo, * Ingo Molnar (mingo@elte.hu) wrote: [...] > Mathieu, please work with the tracing folks who DO care about > this stuff. It's not like there's a lack of interest in this > area, nor is there a lack of willingness to take patches. What > there is a lack of is your willingness to actually work on > getting something unified, integrated to users... > > LTTNG has been going on for how many years? I havent seen many > steps towards actually *merging* its functionality - you insist > on doing your own random thing, which is different in random > ways. Yes, some of those random ways may in fact be better than > what we have upstream - would you be interested in filtering > those out and pushing them upstream? I certainly would like to > see that happen. > > We want to pick the best features, and throw away current > upstream code in favor of superior out of tree code - this > concept of letting crap sit alongside each other when people do > care i cannot agree with. LTTng 2.0, today, offers a unified interface for kernel and userspace tracing, in the form of libraries and git-alike command line user interface. It produces a trace format (CTF) that has been developed in collaboration with hardware vendors and reviewed by tracing developers of the Linux community, which allows analyzing correlated traces across the software and hardware stacks, and supports being streamed over the network with zero-copy both in TCP, UDP format, with optional encryption, checksum, and more. It supports multiple concurrent users, and hooks with tracepoints, Perf PMU counters, kprobes, kretprobes, and system calls, with the ability to attach "context" information prepended before each event record as selected by the user when setting up a tracing session. It is currently self-contained: it's been designed to be shipped as a stand-alone set of self-contained modules, but I recently received the offer to get it pulled into staging, which I accepted. In my opinion, tracers need to be split into three distinct parts: 1) core tracing infrastructure that _needs to_ be shared. This mainly targets instrumentation, and I've done my share of contribution to mainline on this front already. I think the infrastructure we have today is in pretty good shape. 2) tracing infrastructure that _could_ be shared. I'm mostly targeting ring buffers and trace clocks there. It could be a nice-to-have to share the implementation, as long as it does not get in the way of what each project is trying to achieve. So far, what I noticed is that each project is lacking understanding of the intent and constraints of the other projects, thus either considering what the others are doing as over- or under- engineering, depending on the context. Therefore, as long as there is no agreement on the right amount of care that needs to be put in the design of these components, it might be best to duplicate the implementation and slowly converge as each project gets to understand the other project's constraints. To make progress on this front, you need to have both code-bases into mainline. 3) interfaces to user-space: very much like filesystems, these ABIs don't need to be shared across projects that have different use-cases. Having multiple tracer ABIs, if self-contained, should not hurt anybody and just increase the rate of innovation. Sadly, the ABIs exposed by perf/ftrace do not seem to be a good fit for LTTng use-cases. Since the perf/ftrace ABIs, as well as the LTTng ABI, are all already used by many tools, it will likely be really difficult to change them overnight. As an example of where we could benefit from working together, LTTng is currently using a shadow copy of the TRACE_EVENT macros, because the upstream version is quite limiting with respect to generating compact probe code. It could be good to integrate those changes upstream, and I think the best way to achieve this is if the perf and ftrace developers can have a look at the approach taken by LTTng to achieve this -- which is better done if LTTng is merged into staging. Another example is how LTTng extracts system call arguments types, which is performed by generating TRACE_EVENT description of the system call table with a script. We could definitely help out each other in this area. There are certainly many other areas where we could eventually benefit from working together, listed above as #2 "tracing infrastructure that _could_ be shared", but I think it is better to first focus on the core infrastructure that we need to share before getting into the territory of the infrastructure we could share if took the time to understand each other's requirements fully first. Meanwhile, having a duplicated implementation of these parts that "could" be shared should not hurt anyone -- it would even help understanding each other --, as long as they stay self-contained. In summary, I'm really open to help out on working on common pieces of infrastructures, but for that they need to take into account both the current perf/ftrace use-cases and the LTTng use-cases. Best regards, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules 2011-12-07 22:57 ` [PATCH 09/11] sched: export task_prio to GPL modules Mathieu Desnoyers @ 2011-12-08 5:40 ` Ingo Molnar 0 siblings, 0 replies; 51+ messages in thread From: Ingo Molnar @ 2011-12-08 5:40 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Greg KH, Peter Zijlstra, devel, lttng-dev, linux-kernel, Linus Torvalds, Andrew Morton, Thomas Gleixner, Steven Rostedt, Frederic Weisbecker * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > Hi Ingo, > > * Ingo Molnar (mingo@elte.hu) wrote: > [...] > > Mathieu, please work with the tracing folks who DO care about > > this stuff. It's not like there's a lack of interest in this > > area, nor is there a lack of willingness to take patches. What > > there is a lack of is your willingness to actually work on > > getting something unified, integrated to users... > > > > LTTNG has been going on for how many years? I havent seen many > > steps towards actually *merging* its functionality - you insist > > on doing your own random thing, which is different in random > > ways. Yes, some of those random ways may in fact be better than > > what we have upstream - would you be interested in filtering > > those out and pushing them upstream? I certainly would like to > > see that happen. > > > > We want to pick the best features, and throw away current > > upstream code in favor of superior out of tree code - this > > concept of letting crap sit alongside each other when people do > > care i cannot agree with. > > LTTng 2.0, today, offers a unified interface for kernel and > userspace tracing, in the form of libraries and git-alike > command line user interface. [...] Note that Arnaldo is working on such a perf-alike tracing tool workflow with the new 'trace' utility that we announced and prototyped a couple of months ago. The perf.data data format is now extensible as well and tightened for transportability. Tools such as PowerTop or sysprof have standardized around the perf ABI. So there's a *lot* of overlap with existing upstream efforts and the last thing we need is the parallel LTTNG ABI. Are you willing to merge LTTNG into our existing kernel and userspace infrastructure and ABIs, with the possible end result that LTTNG ceases to be a separately named entity? Mind hooking up with Arnaldo and with Steve regarding how we could best split up the LTTNG bits and move them upstream? Frankly, i've seen a *lot* of talk from you but unfortunately *very* little action on that front, so i think my healthy scepticism is justified. Thanks, Ingo ^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2012-01-16 8:55 UTC | newest]
Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1322775683-8741-1-git-send-email-mathieu.desnoyers@efficios.com>
2011-12-01 21:41 ` [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules Mathieu Desnoyers
2011-12-01 21:57 ` Christoph Hellwig
2011-12-01 22:13 ` Greg KH
2011-12-01 22:19 ` Mathieu Desnoyers
2011-12-01 22:41 ` Greg KH
2011-12-01 22:28 ` Christoph Hellwig
2011-12-01 23:00 ` Greg KH
2011-12-01 21:41 ` [PATCH 03/11] fs/splice: export splice_to_pipe " Mathieu Desnoyers
2011-12-02 7:19 ` Jens Axboe
2011-12-02 12:32 ` Mathieu Desnoyers
2011-12-01 21:41 ` [PATCH 09/11] sched: export task_prio " Mathieu Desnoyers
2011-12-01 21:56 ` Peter Zijlstra
2011-12-01 22:04 ` Mathieu Desnoyers
2011-12-01 22:10 ` Peter Zijlstra
2011-12-01 22:15 ` Mathieu Desnoyers
2011-12-01 22:36 ` Mathieu Desnoyers
2011-12-01 23:05 ` Peter Zijlstra
2011-12-02 13:51 ` Mathieu Desnoyers
2011-12-01 23:06 ` Peter Zijlstra
2011-12-01 23:18 ` Greg KH
2011-12-01 23:47 ` Mathieu Desnoyers
2011-12-01 22:14 ` Greg KH
2011-12-01 22:20 ` Mathieu Desnoyers
2011-12-01 23:07 ` Peter Zijlstra
2011-12-01 23:17 ` Greg KH
2011-12-05 14:17 ` Ingo Molnar
2011-12-06 21:44 ` Greg KH
2011-12-08 5:23 ` Ingo Molnar
2011-12-08 23:27 ` Greg KH
2011-12-19 10:49 ` Ingo Molnar
2011-12-19 15:30 ` [lttng-dev] " Mathieu Desnoyers
2011-12-20 11:08 ` Ingo Molnar
2011-12-20 21:46 ` Frank Rowand
2011-12-23 10:51 ` Ingo Molnar
2011-12-21 18:47 ` Aaron Spear
2011-12-21 18:58 ` Christoph Hellwig
2011-12-23 16:46 ` Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) Mathieu Desnoyers
2011-12-23 17:21 ` Ted Ts'o
2011-12-23 18:16 ` Mathieu Desnoyers
2011-12-25 17:46 ` Ted Ts'o
2012-01-12 14:09 ` Mathieu Desnoyers
2012-01-12 14:54 ` Steven Rostedt
2012-01-12 15:39 ` [lttng-dev] Perf ABI (was: " Mathieu Desnoyers
2012-01-12 15:53 ` Steven Rostedt
2012-01-12 15:59 ` Steven Rostedt
2012-01-12 16:27 ` Mathieu Desnoyers
2012-01-12 16:34 ` Steven Rostedt
2012-01-12 20:00 ` Greg KH
2012-01-16 8:55 ` Ingo Molnar
2011-12-07 22:57 ` [PATCH 09/11] sched: export task_prio to GPL modules Mathieu Desnoyers
2011-12-08 5:40 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).