* [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes [not found] <20180409001936.162706-1-alexander.levin@microsoft.com> @ 2018-04-09 0:19 ` Sasha Levin 2018-04-09 8:22 ` Petr Mladek 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-09 0:19 UTC (permalink / raw) To: stable@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Steven Rostedt (VMware), akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek, Petr Mladek, Sasha Levin From: "Steven Rostedt (VMware)" <rostedt@goodmis.org> [ Upstream commit dbdda842fe96f8932bae554f0adf463c27c42bc7 ] This patch implements what I discussed in Kernel Summit. I added lockdep annotation (hopefully correctly), and it hasn't had any splats (since I fixed some bugs in the first iterations). It did catch problems when I had the owner covering too much. But now that the owner is only set when actively calling the consoles, lockdep has stayed quiet. Here's the design again: I added a "console_owner" which is set to a task that is actively writing to the consoles. It is *not* the same as the owner of the console_lock. It is only set when doing the calls to the console functions. It is protected by a console_owner_lock which is a raw spin lock. There is a console_waiter. This is set when there is an active console owner that is not current, and waiter is not set. This too is protected by console_owner_lock. In printk() when it tries to write to the consoles, we have: if (console_trylock()) console_unlock(); Now I added an else, which will check if there is an active owner, and no current waiter. If that is the case, then console_waiter is set, and the task goes into a spin until it is no longer set. When the active console owner finishes writing the current message to the consoles, it grabs the console_owner_lock and sees if there is a waiter, and clears console_owner. If there is a waiter, then it breaks out of the loop, clears the waiter flag (because that will release the waiter from its spin), and exits. Note, it does *not* release the console semaphore. Because it is a semaphore, there is no owner. Another task may release it. This means that the waiter is guaranteed to be the new console owner! Which it becomes. Then the waiter calls console_unlock() and continues to write to the consoles. If another task comes along and does a printk() it too can become the new waiter, and we wash rinse and repeat! By Petr Mladek about possible new deadlocks: The thing is that we move console_sem only to printk() call that normally calls console_unlock() as well. It means that the transferred owner should not bring new type of dependencies. As Steven said somewhere: "If there is a deadlock, it was there even before." We could look at it from this side. The possible deadlock would look like: CPU0 CPU1 console_unlock() console_owner = current; spin_lockA() printk() spin = true; while (...) call_console_drivers() spin_lockA() This would be a deadlock. CPU0 would wait for the lock A. While CPU1 would own the lockA and would wait for CPU0 to finish calling the console drivers and pass the console_sem owner. But if the above is true than the following scenario was already possible before: CPU0 spin_lockA() printk() console_unlock() call_console_drivers() spin_lockA() By other words, this deadlock was there even before. Such deadlocks are prevented by using printk_deferred() in the sections guarded by the lock A. By Steven Rostedt: To demonstrate the issue, this module has been shown to lock up a system with 4 CPUs and a slow console (like a serial console). It is also able to lock up a 8 CPU system with only a fast (VGA) console, by passing in "loops=100". The changes in this commit prevent this module from locking up the system. #include <linux/module.h> #include <linux/delay.h> #include <linux/sched.h> #include <linux/mutex.h> #include <linux/workqueue.h> #include <linux/hrtimer.h> static bool stop_testing; static unsigned int loops = 1; static void preempt_printk_workfn(struct work_struct *work) { int i; while (!READ_ONCE(stop_testing)) { for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) { preempt_disable(); pr_emerg("%5d%-75s\n", smp_processor_id(), " XXX NOPREEMPT"); preempt_enable(); } msleep(1); } } static struct work_struct __percpu *works; static void finish(void) { int cpu; WRITE_ONCE(stop_testing, true); for_each_online_cpu(cpu) flush_work(per_cpu_ptr(works, cpu)); free_percpu(works); } static int __init test_init(void) { int cpu; works = alloc_percpu(struct work_struct); if (!works) return -ENOMEM; /* * This is just a test module. This will break if you * do any CPU hot plugging between loading and * unloading the module. */ for_each_online_cpu(cpu) { struct work_struct *work = per_cpu_ptr(works, cpu); INIT_WORK(work, &preempt_printk_workfn); schedule_work_on(cpu, work); } return 0; } static void __exit test_exit(void) { finish(); } module_param(loops, uint, 0); module_init(test_init); module_exit(test_exit); MODULE_LICENSE("GPL"); Link: http://lkml.kernel.org/r/20180110132418.7080-2-pmladek@suse.com Cc: akpm@linux-foundation.org Cc: linux-mm@kvack.org Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Tejun Heo <tj@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: linux-kernel@vger.kernel.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> [pmladek@suse.com: Commit message about possible deadlocks] Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> --- kernel/printk/printk.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 107 insertions(+), 1 deletion(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 512f7c2baedd..89c3496975cc 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -86,8 +86,15 @@ EXPORT_SYMBOL_GPL(console_drivers); static struct lockdep_map console_lock_dep_map = { .name = "console_lock" }; +static struct lockdep_map console_owner_dep_map = { + .name = "console_owner" +}; #endif +static DEFINE_RAW_SPINLOCK(console_owner_lock); +static struct task_struct *console_owner; +static bool console_waiter; + enum devkmsg_log_bits { __DEVKMSG_LOG_BIT_ON = 0, __DEVKMSG_LOG_BIT_OFF, @@ -1753,8 +1760,56 @@ asmlinkage int vprintk_emit(int facility, int level, * semaphore. The release will print out buffers and wake up * /dev/kmsg and syslog() users. */ - if (console_trylock()) + if (console_trylock()) { console_unlock(); + } else { + struct task_struct *owner = NULL; + bool waiter; + bool spin = false; + + printk_safe_enter_irqsave(flags); + + raw_spin_lock(&console_owner_lock); + owner = READ_ONCE(console_owner); + waiter = READ_ONCE(console_waiter); + if (!waiter && owner && owner != current) { + WRITE_ONCE(console_waiter, true); + spin = true; + } + raw_spin_unlock(&console_owner_lock); + + /* + * If there is an active printk() writing to the + * consoles, instead of having it write our data too, + * see if we can offload that load from the active + * printer, and do some printing ourselves. + * Go into a spin only if there isn't already a waiter + * spinning, and there is an active printer, and + * that active printer isn't us (recursive printk?). + */ + if (spin) { + /* We spin waiting for the owner to release us */ + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); + /* Owner will clear console_waiter on hand off */ + while (READ_ONCE(console_waiter)) + cpu_relax(); + + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + printk_safe_exit_irqrestore(flags); + + /* + * The owner passed the console lock to us. + * Since we did not spin on console lock, annotate + * this as a trylock. Otherwise lockdep will + * complain. + */ + mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); + console_unlock(); + printk_safe_enter_irqsave(flags); + } + printk_safe_exit_irqrestore(flags); + + } } return printed_len; @@ -2141,6 +2196,7 @@ void console_unlock(void) static u64 seen_seq; unsigned long flags; bool wake_klogd = false; + bool waiter = false; bool do_cond_resched, retry; if (console_suspended) { @@ -2229,14 +2285,64 @@ skip: console_seq++; raw_spin_unlock(&logbuf_lock); + /* + * While actively printing out messages, if another printk() + * were to occur on another CPU, it may wait for this one to + * finish. This task can not be preempted if there is a + * waiter waiting to take over. + */ + raw_spin_lock(&console_owner_lock); + console_owner = current; + raw_spin_unlock(&console_owner_lock); + + /* The waiter may spin on us after setting console_owner */ + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); + stop_critical_timings(); /* don't trace print latency */ call_console_drivers(ext_text, ext_len, text, len); start_critical_timings(); + + raw_spin_lock(&console_owner_lock); + waiter = READ_ONCE(console_waiter); + console_owner = NULL; + raw_spin_unlock(&console_owner_lock); + + /* + * If there is a waiter waiting for us, then pass the + * rest of the work load over to that waiter. + */ + if (waiter) + break; + + /* There was no waiter, and nothing will spin on us here */ + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + printk_safe_exit_irqrestore(flags); if (do_cond_resched) cond_resched(); } + + /* + * If there is an active waiter waiting on the console_lock. + * Pass off the printing to the waiter, and the waiter + * will continue printing on its CPU, and when all writing + * has finished, the last printer will wake up klogd. + */ + if (waiter) { + WRITE_ONCE(console_waiter, false); + /* The waiter is now free to continue */ + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + /* + * Hand off console_lock to waiter. The waiter will perform + * the up(). After this, the waiter is the console_lock owner. + */ + mutex_release(&console_lock_dep_map, 1, _THIS_IP_); + printk_safe_exit_irqrestore(flags); + /* Note, if waiter is set, logbuf_lock is not held */ + return; + } + console_locked = 0; /* Release the exclusive_console once it is used */ -- 2.15.1 ^ permalink raw reply related [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-09 0:19 ` [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes Sasha Levin @ 2018-04-09 8:22 ` Petr Mladek 2018-04-15 14:42 ` Sasha Levin 0 siblings, 1 reply; 113+ messages in thread From: Petr Mladek @ 2018-04-09 8:22 UTC (permalink / raw) To: Sasha Levin Cc: stable@vger.kernel.org, linux-kernel@vger.kernel.org, Steven Rostedt (VMware), akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Mon 2018-04-09 00:19:53, Sasha Levin wrote: > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org> > > [ Upstream commit dbdda842fe96f8932bae554f0adf463c27c42bc7 ] > > This patch implements what I discussed in Kernel Summit. I added > lockdep annotation (hopefully correctly), and it hasn't had any splats > (since I fixed some bugs in the first iterations). It did catch > problems when I had the owner covering too much. But now that the owner > is only set when actively calling the consoles, lockdep has stayed > quiet. Same here. I do not thing that this is a material for stable backport. More details can be found in my reply to the patch for 4.15, see https://lkml.kernel.org/r/20180409081535.dq7p5bfnpvd3xk3t@pathway.suse.cz Best Regards, Petr PS: I wonder how much time you give people to react before releasing this. The number of autosel mails is increasing and I am involved only in very small amount of them. I wonder if some other people gets overwhelmed by this. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-09 8:22 ` Petr Mladek @ 2018-04-15 14:42 ` Sasha Levin 2018-04-16 13:30 ` Steven Rostedt 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-15 14:42 UTC (permalink / raw) To: Petr Mladek Cc: stable@vger.kernel.org, linux-kernel@vger.kernel.org, Steven Rostedt (VMware), akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Mon, Apr 09, 2018 at 10:22:46AM +0200, Petr Mladek wrote: >PS: I wonder how much time you give people to react before releasing >this. The number of autosel mails is increasing and I am involved >only in very small amount of them. I wonder if some other people >gets overwhelmed by this. My review cycle gives at least a week, and there's usually another week until Greg releases them. I know it's a lot of mails, but in reality it's a lot of commits that should go in -stable. Would a different format for review would make it easier? ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-15 14:42 ` Sasha Levin @ 2018-04-16 13:30 ` Steven Rostedt 2018-04-16 15:18 ` Linus Torvalds 0 siblings, 1 reply; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 13:30 UTC (permalink / raw) To: Sasha Levin Cc: Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Sun, 15 Apr 2018 14:42:51 +0000 Sasha Levin <Alexander.Levin@microsoft.com> wrote: > On Mon, Apr 09, 2018 at 10:22:46AM +0200, Petr Mladek wrote: > >PS: I wonder how much time you give people to react before releasing > >this. The number of autosel mails is increasing and I am involved > >only in very small amount of them. I wonder if some other people > >gets overwhelmed by this. > > My review cycle gives at least a week, and there's usually another week > until Greg releases them. > > I know it's a lot of mails, but in reality it's a lot of commits that > should go in -stable. > > Would a different format for review would make it easier? I wonder if the "AUTOSEL" patches should at least have an "ack-by" from someone before they are pulled in. Otherwise there may be some subtle issues that can find their way into stable releases. -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 13:30 ` Steven Rostedt @ 2018-04-16 15:18 ` Linus Torvalds 2018-04-16 15:30 ` Pavel Machek ` (2 more replies) 0 siblings, 3 replies; 113+ messages in thread From: Linus Torvalds @ 2018-04-16 15:18 UTC (permalink / raw) To: Steven Rostedt Cc: Sasha Levin, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > > I wonder if the "AUTOSEL" patches should at least have an "ack-by" from > someone before they are pulled in. Otherwise there may be some subtle > issues that can find their way into stable releases. I don't know about anybody else, but I get so many of the patch-bot patches for stable etc that I will *not* reply to normal cases. Only if there's some issue with a patch will I reply. I probably do get more than most, but still - requiring active participation for the steady flow of normal stable patches is almost pointless. Just look at the subject line of this thread. The numbers are so big that you almost need exponential notation for them. Linus ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 15:18 ` Linus Torvalds @ 2018-04-16 15:30 ` Pavel Machek 2018-04-16 15:50 ` Sasha Levin 2018-04-16 15:36 ` Steven Rostedt 2018-04-16 15:39 ` Sasha Levin 2 siblings, 1 reply; 113+ messages in thread From: Pavel Machek @ 2018-04-16 15:30 UTC (permalink / raw) To: Linus Torvalds Cc: Steven Rostedt, Sasha Levin, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo [-- Attachment #1: Type: text/plain, Size: 1168 bytes --] On Mon 2018-04-16 08:18:09, Linus Torvalds wrote: > On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > I wonder if the "AUTOSEL" patches should at least have an "ack-by" from > > someone before they are pulled in. Otherwise there may be some subtle > > issues that can find their way into stable releases. > > I don't know about anybody else, but I get so many of the patch-bot > patches for stable etc that I will *not* reply to normal cases. Only > if there's some issue with a patch will I reply. > > I probably do get more than most, but still - requiring active > participation for the steady flow of normal stable patches is almost > pointless. > > Just look at the subject line of this thread. The numbers are so big > that you almost need exponential notation for them. Question is if we need that many stable patches? Autosel seems to be picking up race conditions in LED state and W+X page fixes... I'd really like to see less stable patches. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 15:30 ` Pavel Machek @ 2018-04-16 15:50 ` Sasha Levin 2018-04-16 16:06 ` Pavel Machek 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-16 15:50 UTC (permalink / raw) To: Pavel Machek Cc: Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, Apr 16, 2018 at 05:30:31PM +0200, Pavel Machek wrote: >On Mon 2018-04-16 08:18:09, Linus Torvalds wrote: >> On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote: >> > >> > I wonder if the "AUTOSEL" patches should at least have an "ack-by" from >> > someone before they are pulled in. Otherwise there may be some subtle >> > issues that can find their way into stable releases. >> >> I don't know about anybody else, but I get so many of the patch-bot >> patches for stable etc that I will *not* reply to normal cases. Only >> if there's some issue with a patch will I reply. >> >> I probably do get more than most, but still - requiring active >> participation for the steady flow of normal stable patches is almost >> pointless. >> >> Just look at the subject line of this thread. The numbers are so big >> that you almost need exponential notation for them. > >Question is if we need that many stable patches? Autosel seems to be >picking up race conditions in LED state and W+X page fixes... I'd >really like to see less stable patches. Why? Given that the kernel keeps seeing more and more lines of code in each new release, tools around the kernel keep evolving (new fuzzers, testing suites, etc), and code gets more eyes, this guarantees that you'll see more and more stable patches for each release as well. Is there a reason not to take LED fixes if they fix a bug and don't cause a regression? Sure, we can draw some arbitrary line, maybe designate some subsystems that are more "important" than others, but what's the point? ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 15:50 ` Sasha Levin @ 2018-04-16 16:06 ` Pavel Machek 2018-04-16 16:14 ` Sasha Levin 2018-04-16 16:20 ` Steven Rostedt 0 siblings, 2 replies; 113+ messages in thread From: Pavel Machek @ 2018-04-16 16:06 UTC (permalink / raw) To: Sasha Levin Cc: Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo [-- Attachment #1: Type: text/plain, Size: 2381 bytes --] On Mon 2018-04-16 15:50:34, Sasha Levin wrote: > On Mon, Apr 16, 2018 at 05:30:31PM +0200, Pavel Machek wrote: > >On Mon 2018-04-16 08:18:09, Linus Torvalds wrote: > >> On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > >> > > >> > I wonder if the "AUTOSEL" patches should at least have an "ack-by" from > >> > someone before they are pulled in. Otherwise there may be some subtle > >> > issues that can find their way into stable releases. > >> > >> I don't know about anybody else, but I get so many of the patch-bot > >> patches for stable etc that I will *not* reply to normal cases. Only > >> if there's some issue with a patch will I reply. > >> > >> I probably do get more than most, but still - requiring active > >> participation for the steady flow of normal stable patches is almost > >> pointless. > >> > >> Just look at the subject line of this thread. The numbers are so big > >> that you almost need exponential notation for them. > > > >Question is if we need that many stable patches? Autosel seems to be > >picking up race conditions in LED state and W+X page fixes... I'd > >really like to see less stable patches. > > Why? Given that the kernel keeps seeing more and more lines of code in > each new release, tools around the kernel keep evolving (new fuzzers, > testing suites, etc), and code gets more eyes, this guarantees that > you'll see more and more stable patches for each release as well. > > Is there a reason not to take LED fixes if they fix a bug and don't > cause a regression? Sure, we can draw some arbitrary line, maybe > designate some subsystems that are more "important" than others, but > what's the point? There's a tradeoff. You want to fix serious bugs in stable, and you really don't want regressions in stable. And ... stable not having 1000s of patches would be nice, too. That means you want to ignore not-so-serious bugs, because benefit of fixing them is lower than risk of the regressions. I believe bugs that do not bother anyone should _not_ be fixed in stable. That was case of the LED patch. Yes, the commit fixed bug, but it introduced regressions that were fixed by subsequent patches. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:06 ` Pavel Machek @ 2018-04-16 16:14 ` Sasha Levin 2018-04-16 16:22 ` Steven Rostedt ` (2 more replies) 2018-04-16 16:20 ` Steven Rostedt 1 sibling, 3 replies; 113+ messages in thread From: Sasha Levin @ 2018-04-16 16:14 UTC (permalink / raw) To: Pavel Machek Cc: Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, Apr 16, 2018 at 06:06:08PM +0200, Pavel Machek wrote: >On Mon 2018-04-16 15:50:34, Sasha Levin wrote: >> On Mon, Apr 16, 2018 at 05:30:31PM +0200, Pavel Machek wrote: >> >On Mon 2018-04-16 08:18:09, Linus Torvalds wrote: >> >> On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote: >> >> > >> >> > I wonder if the "AUTOSEL" patches should at least have an "ack-by" from >> >> > someone before they are pulled in. Otherwise there may be some subtle >> >> > issues that can find their way into stable releases. >> >> >> >> I don't know about anybody else, but I get so many of the patch-bot >> >> patches for stable etc that I will *not* reply to normal cases. Only >> >> if there's some issue with a patch will I reply. >> >> >> >> I probably do get more than most, but still - requiring active >> >> participation for the steady flow of normal stable patches is almost >> >> pointless. >> >> >> >> Just look at the subject line of this thread. The numbers are so big >> >> that you almost need exponential notation for them. >> > >> >Question is if we need that many stable patches? Autosel seems to be >> >picking up race conditions in LED state and W+X page fixes... I'd >> >really like to see less stable patches. >> >> Why? Given that the kernel keeps seeing more and more lines of code in >> each new release, tools around the kernel keep evolving (new fuzzers, >> testing suites, etc), and code gets more eyes, this guarantees that >> you'll see more and more stable patches for each release as well. >> >> Is there a reason not to take LED fixes if they fix a bug and don't >> cause a regression? Sure, we can draw some arbitrary line, maybe >> designate some subsystems that are more "important" than others, but >> what's the point? > >There's a tradeoff. > >You want to fix serious bugs in stable, and you really don't want >regressions in stable. And ... stable not having 1000s of patches >would be nice, too. I don't think we should use a number cap here, but rather look at the regression rate: how many patches broke something? Since the rate we're seeing now with AUTOSEL is similar to what we were seeing before AUTOSEL, what's the problem it's causing? >That means you want to ignore not-so-serious bugs, because benefit of >fixing them is lower than risk of the regressions. I believe bugs that >do not bother anyone should _not_ be fixed in stable. > >That was case of the LED patch. Yes, the commit fixed bug, but it >introduced regressions that were fixed by subsequent patches. How do you know if a bug bothers someone? If a user is annoyed by a LED issue, is he expected to triage the bug, report it on LKML and patiently wait for the appropriate patch to be backported? ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:14 ` Sasha Levin @ 2018-04-16 16:22 ` Steven Rostedt 2018-04-16 16:31 ` Sasha Levin 2018-04-16 16:28 ` Pavel Machek 2018-04-16 17:05 ` Pavel Machek 2 siblings, 1 reply; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 16:22 UTC (permalink / raw) To: Sasha Levin Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, 16 Apr 2018 16:14:15 +0000 Sasha Levin <Alexander.Levin@microsoft.com> wrote: > Since the rate we're seeing now with AUTOSEL is similar to what we were > seeing before AUTOSEL, what's the problem it's causing? Does that mean we just doubled the rate of regressions? That's the problem. > > How do you know if a bug bothers someone? > > If a user is annoyed by a LED issue, is he expected to triage the bug, > report it on LKML and patiently wait for the appropriate patch to be > backported? Yes. -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:22 ` Steven Rostedt @ 2018-04-16 16:31 ` Sasha Levin 2018-04-16 16:47 ` Steven Rostedt 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-16 16:31 UTC (permalink / raw) To: Steven Rostedt Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, Apr 16, 2018 at 12:22:44PM -0400, Steven Rostedt wrote: >On Mon, 16 Apr 2018 16:14:15 +0000 >Sasha Levin <Alexander.Levin@microsoft.com> wrote: > >> Since the rate we're seeing now with AUTOSEL is similar to what we were >> seeing before AUTOSEL, what's the problem it's causing? > >Does that mean we just doubled the rate of regressions? That's the >problem. No, the rate stayed the same :) If before ~2% of stable commits were buggy, this is still the case with AUTOSEL. >> >> How do you know if a bug bothers someone? >> >> If a user is annoyed by a LED issue, is he expected to triage the bug, >> report it on LKML and patiently wait for the appropriate patch to be >> backported? > >Yes. I'm honestly not sure how to respond. Let me ask my wife (who is happy using Linux as a regular desktop user) how comfortable she would be with triaging kernel bugs... ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:31 ` Sasha Levin @ 2018-04-16 16:47 ` Steven Rostedt 2018-04-16 16:53 ` Sasha Levin 0 siblings, 1 reply; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 16:47 UTC (permalink / raw) To: Sasha Levin Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, 16 Apr 2018 16:31:09 +0000 Sasha Levin <Alexander.Levin@microsoft.com> wrote: > On Mon, Apr 16, 2018 at 12:22:44PM -0400, Steven Rostedt wrote: > >On Mon, 16 Apr 2018 16:14:15 +0000 > >Sasha Levin <Alexander.Levin@microsoft.com> wrote: > > > >> Since the rate we're seeing now with AUTOSEL is similar to what we were > >> seeing before AUTOSEL, what's the problem it's causing? > > > >Does that mean we just doubled the rate of regressions? That's the > >problem. > > No, the rate stayed the same :) > > If before ~2% of stable commits were buggy, this is still the case with > AUTOSEL. Sorry, I didn't mean "rate" I meant "number". If the rate stayed the same, that means the number increased. > > >> > >> How do you know if a bug bothers someone? > >> > >> If a user is annoyed by a LED issue, is he expected to triage the bug, > >> report it on LKML and patiently wait for the appropriate patch to be > >> backported? > > > >Yes. > > I'm honestly not sure how to respond. > > Let me ask my wife (who is happy using Linux as a regular desktop user) > how comfortable she would be with triaging kernel bugs... That's really up to the distribution, not the main kernel stable. Does she download and compile the kernels herself? Does she use LEDs? The point is, stable is to keep what was working continued working. If we don't care about introducing a regression, and just want to keep regressions the same as mainline, why not just go to mainline? That way you can also get the new features? Mainline already has the mantra to not break user space. When I work on new features, I sometimes stumble on bugs with the current features. And some of those fixes require a rewrite. It was "good enough" before, but every so often could cause a bug that the new feature would trigger more often. Do we back port that rewrite? Do we backport fixes to old code that are more likely to be triggered by new features? Ideally, we should be working on getting to no regressions to stable. -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:47 ` Steven Rostedt @ 2018-04-16 16:53 ` Sasha Levin 2018-04-16 17:00 ` Pavel Machek 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-16 16:53 UTC (permalink / raw) To: Steven Rostedt Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, Apr 16, 2018 at 12:47:11PM -0400, Steven Rostedt wrote: >On Mon, 16 Apr 2018 16:31:09 +0000 >Sasha Levin <Alexander.Levin@microsoft.com> wrote: > >> On Mon, Apr 16, 2018 at 12:22:44PM -0400, Steven Rostedt wrote: >> >On Mon, 16 Apr 2018 16:14:15 +0000 >> >Sasha Levin <Alexander.Levin@microsoft.com> wrote: >> > >> >> Since the rate we're seeing now with AUTOSEL is similar to what we were >> >> seeing before AUTOSEL, what's the problem it's causing? >> > >> >Does that mean we just doubled the rate of regressions? That's the >> >problem. >> >> No, the rate stayed the same :) >> >> If before ~2% of stable commits were buggy, this is still the case with >> AUTOSEL. > >Sorry, I didn't mean "rate" I meant "number". If the rate stayed the >same, that means the number increased. Indeed, just like the number of regressions in mainline has increased over time. >> >> >> >> >> How do you know if a bug bothers someone? >> >> >> >> If a user is annoyed by a LED issue, is he expected to triage the bug, >> >> report it on LKML and patiently wait for the appropriate patch to be >> >> backported? >> > >> >Yes. >> >> I'm honestly not sure how to respond. >> >> Let me ask my wife (who is happy using Linux as a regular desktop user) >> how comfortable she would be with triaging kernel bugs... > >That's really up to the distribution, not the main kernel stable. Does >she download and compile the kernels herself? Does she use LEDs? > >The point is, stable is to keep what was working continued working. >If we don't care about introducing a regression, and just want to keep >regressions the same as mainline, why not just go to mainline? That way >you can also get the new features? Mainline already has the mantra to >not break user space. When I work on new features, I sometimes stumble >on bugs with the current features. And some of those fixes require a >rewrite. It was "good enough" before, but every so often could cause a >bug that the new feature would trigger more often. Do we back port that >rewrite? Do we backport fixes to old code that are more likely to be >triggered by new features? > >Ideally, we should be working on getting to no regressions to stable. This is exactly what we're doing. If a fix for a bug in -stable introduces a different regression, should we take it or not? ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:53 ` Sasha Levin @ 2018-04-16 17:00 ` Pavel Machek 2018-04-17 10:46 ` Greg KH 0 siblings, 1 reply; 113+ messages in thread From: Pavel Machek @ 2018-04-16 17:00 UTC (permalink / raw) To: Sasha Levin Cc: Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo [-- Attachment #1: Type: text/plain, Size: 1466 bytes --] Hi! > >> Let me ask my wife (who is happy using Linux as a regular desktop user) > >> how comfortable she would be with triaging kernel bugs... > > > >That's really up to the distribution, not the main kernel stable. Does > >she download and compile the kernels herself? Does she use LEDs? > > > >The point is, stable is to keep what was working continued working. > >If we don't care about introducing a regression, and just want to keep > >regressions the same as mainline, why not just go to mainline? That way > >you can also get the new features? Mainline already has the mantra to > >not break user space. When I work on new features, I sometimes stumble > >on bugs with the current features. And some of those fixes require a > >rewrite. It was "good enough" before, but every so often could cause a > >bug that the new feature would trigger more often. Do we back port that > >rewrite? Do we backport fixes to old code that are more likely to be > >triggered by new features? > > > >Ideally, we should be working on getting to no regressions to stable. > > This is exactly what we're doing. > > If a fix for a bug in -stable introduces a different regression, > should we take it or not? If a fix for bug introduces regression, would you call it "obviously correct"? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 17:00 ` Pavel Machek @ 2018-04-17 10:46 ` Greg KH 2018-04-17 12:24 ` Petr Mladek 0 siblings, 1 reply; 113+ messages in thread From: Greg KH @ 2018-04-17 10:46 UTC (permalink / raw) To: Pavel Machek Cc: Sasha Levin, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, Apr 16, 2018 at 07:00:10PM +0200, Pavel Machek wrote: > Hi! > > > >> Let me ask my wife (who is happy using Linux as a regular desktop user) > > >> how comfortable she would be with triaging kernel bugs... > > > > > >That's really up to the distribution, not the main kernel stable. Does > > >she download and compile the kernels herself? Does she use LEDs? > > > > > >The point is, stable is to keep what was working continued working. > > >If we don't care about introducing a regression, and just want to keep > > >regressions the same as mainline, why not just go to mainline? That way > > >you can also get the new features? Mainline already has the mantra to > > >not break user space. When I work on new features, I sometimes stumble > > >on bugs with the current features. And some of those fixes require a > > >rewrite. It was "good enough" before, but every so often could cause a > > >bug that the new feature would trigger more often. Do we back port that > > >rewrite? Do we backport fixes to old code that are more likely to be > > >triggered by new features? > > > > > >Ideally, we should be working on getting to no regressions to stable. > > > > This is exactly what we're doing. > > > > If a fix for a bug in -stable introduces a different regression, > > should we take it or not? > > If a fix for bug introduces regression, would you call it "obviously > correct"? I honestly can't believe you all are arguing about this. We backport bugfixes to the stable tree. If those fixes also are buggy we either apply the fix for that problem that ended up in Linus's tree, or we revert the patch. If the fix is not in Linus's tree, sometimes we leave the "bug" in stable for a bit to apply some pressure on the developer/maintainer to get it fixed in Linus's tree (that's what I mean by being "bug compatible".) This is exactly what we have been doing for over a decade now, why are people suddenly getting upset? Oh, I know why, suddenly subsystems that never were taking the time to mark patches for stable are getting patches backported and are getting nervous. The simple way to stop that from happening is to PROPERLY MARK PATCHES FOR STABLE IN THE FIRST PLACE! If you do that, then, no "automated" patches will get selected as you already handled them all. Or if there are some automated patches picked, you can easily NAK them (like xfs does as they know better than everyone else, and honestly, I trust them, and don't run xfs myself), or do like what I do when it happens to me and go "hey, nice, I missed that one!" There, problem solved, if you do that, no more worrying by you at all, and this thread can properly die. ugh, greg k-h ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 10:46 ` Greg KH @ 2018-04-17 12:24 ` Petr Mladek 2018-04-17 12:49 ` Michal Hocko 2018-04-17 13:45 ` Sasha Levin 0 siblings, 2 replies; 113+ messages in thread From: Petr Mladek @ 2018-04-17 12:24 UTC (permalink / raw) To: Greg KH Cc: Pavel Machek, Sasha Levin, Steven Rostedt, Linus Torvalds, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue 2018-04-17 12:46:37, Greg KH wrote: > Oh, I know why, suddenly subsystems that never were taking the time to > mark patches for stable are getting patches backported and are getting > nervous. Yes, I am getting nervous because of this. The number of printk fixes nominated for stable is increasing exponentially (just my feeling) during last few months. The problem is that I want to be responsible and think about possible regressions. Sometimes it requires checking the state of the particular kernel release. The older code base the more complicated the decision is. You might argue that backporting the fixes helps to get the same code in all supported code bases. But it is not true. It never will be the same. Anyway, in the past the "automatically" nominated printk fixes were trivial. They did not cause harm. But they also were not worth it, IMHO. They fixed corner cases that were there for ages. Most of these fixes were found by code review when working on a feature. They were not backed by bug reports. Last week, autosel nominated pretty non-trivial patch (started this thread). It partly solved a problem we tried to fix last few years. On one side, this was an annoying problem that motivated several people spend a lot of time on it. This might be a motivation for a backport. On the other hand, it took many years to come somewhere. The main problem was the fear of regressions. We fixed/improved many things in the mean time. It shows that the problem really is not trivial. The same is true for the fix. We did our best to avoid regressions. But it does not mean that there are none. Also it does not mean that it will really give better results in all situations. I really do not see a reason to hurry and backport this to the older kernel releases. It means to spread the fix but also eventual problems. It is easy to miss a dependant patch. The less trivial fix, the more possible problems are there. Back to the trend. Last week I got autosel mails even for patches that were still being discussed, had issues, and were far from upstream: https://lkml.kernel.org/r/DM5PR2101MB1032AB19B489D46B717B50D4FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com https://lkml.kernel.org/r/DM5PR2101MB10327FA0A7E0D2C901E33B79FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com It might be a good idea if the mail asked to add Fixes: tag or stable mailing list. But the mail suggested to add the unfinished patch into stable branch directly (even before upstreaming?). Now, there are only hand full of printk patches in each release, so it is still doable. I just do not understand how other maintainers, from much more busy subsystems, could cope with this trend. By other words. If you want to automatize patch nomination, you might need to automatize also patch review. Or you need to keep the patch rate low. This might mean to nominate only important and rather trivial fixes. Best Regards, Petr ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 12:24 ` Petr Mladek @ 2018-04-17 12:49 ` Michal Hocko 2018-04-17 13:39 ` Sasha Levin 2018-04-17 13:45 ` Sasha Levin 1 sibling, 1 reply; 113+ messages in thread From: Michal Hocko @ 2018-04-17 12:49 UTC (permalink / raw) To: Petr Mladek Cc: Greg KH, Pavel Machek, Sasha Levin, Steven Rostedt, Linus Torvalds, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue 17-04-18 14:24:54, Petr Mladek wrote: [...] > Back to the trend. Last week I got autosel mails even for > patches that were still being discussed, had issues, and > were far from upstream: > > https://lkml.kernel.org/r/DM5PR2101MB1032AB19B489D46B717B50D4FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com > https://lkml.kernel.org/r/DM5PR2101MB10327FA0A7E0D2C901E33B79FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com > > It might be a good idea if the mail asked to add Fixes: tag > or stable mailing list. But the mail suggested to add the > unfinished patch into stable branch directly (even before > upstreaming?). Well, I think that poking subsystems which ignore stable trees with such emails early during review might be quite helpful. Maybe people start marking for stable and we do not need the guessing later. I wouldn't bother poking those who are known to mark stable patches though. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 12:49 ` Michal Hocko @ 2018-04-17 13:39 ` Sasha Levin 2018-04-17 14:22 ` Michal Hocko 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-17 13:39 UTC (permalink / raw) To: Michal Hocko Cc: Petr Mladek, Greg KH, Pavel Machek, Steven Rostedt, Linus Torvalds, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue, Apr 17, 2018 at 02:49:24PM +0200, Michal Hocko wrote: >On Tue 17-04-18 14:24:54, Petr Mladek wrote: >[...] >> Back to the trend. Last week I got autosel mails even for >> patches that were still being discussed, had issues, and >> were far from upstream: >> >> https://lkml.kernel.org/r/DM5PR2101MB1032AB19B489D46B717B50D4FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com >> https://lkml.kernel.org/r/DM5PR2101MB10327FA0A7E0D2C901E33B79FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com >> >> It might be a good idea if the mail asked to add Fixes: tag >> or stable mailing list. But the mail suggested to add the >> unfinished patch into stable branch directly (even before >> upstreaming?). > >Well, I think that poking subsystems which ignore stable trees with such >emails early during review might be quite helpful. Maybe people start >marking for stable and we do not need the guessing later. I wouldn't >bother poking those who are known to mark stable patches though. Yup, mm/ needs far less poking that XFS (for example). What makes mm/ so good about this is that it's a rather small set of devs who are good at marking things for stable. As long as the commit came from one of these "core" mm/ folks it's almost guaranteed to have proper stable tags. But mm/ commits don't come only from these people. Here's a concrete example we can discuss: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d This was merged in a few days ago, and seems relevant for older kernel trees as well. Should it not have a stable tag? ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 13:39 ` Sasha Levin @ 2018-04-17 14:22 ` Michal Hocko 2018-04-17 14:36 ` Sasha Levin 0 siblings, 1 reply; 113+ messages in thread From: Michal Hocko @ 2018-04-17 14:22 UTC (permalink / raw) To: Sasha Levin Cc: Petr Mladek, Greg KH, Pavel Machek, Steven Rostedt, Linus Torvalds, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue 17-04-18 13:39:33, Sasha Levin wrote: [...] > But mm/ commits don't come only from these people. Here's a concrete > example we can discuss: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d I would be really careful. Because that reqiures to audit all callers to be compliant with the change. This is just _too_ easy to backport without noticing a failure. Now consider the other side. Is there any real bug report backing this? This behavior was like that for quite some time but I do not remember any actual bug report and the changelog doesn't mention one either. It is about theoretical problem. So if this was to be merged to stable then the changelog should contain a big fat warning about the existing users and how they should be checked. Besides that I can see Reviewed-by: akpm and Andrew is usually very careful about stable backports so there probably _was_ a reson to exclude stable. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 14:22 ` Michal Hocko @ 2018-04-17 14:36 ` Sasha Levin 2018-04-17 18:10 ` Michal Hocko 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-17 14:36 UTC (permalink / raw) To: Michal Hocko Cc: Petr Mladek, Greg KH, Pavel Machek, Steven Rostedt, Linus Torvalds, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue, Apr 17, 2018 at 04:22:46PM +0200, Michal Hocko wrote: >On Tue 17-04-18 13:39:33, Sasha Levin wrote: >[...] >> But mm/ commits don't come only from these people. Here's a concrete >> example we can discuss: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d > >I would be really careful. Because that reqiures to audit all callers to >be compliant with the change. This is just _too_ easy to backport >without noticing a failure. Now consider the other side. Is there any >real bug report backing this? This behavior was like that for quite some >time but I do not remember any actual bug report and the changelog >doesn't mention one either. It is about theoretical problem. https://lkml.org/lkml/2018/3/19/430 There's even a fun little reproducer that allowed me to confirm it's an issue (at least) on 4.15. Heck, it might even qualify as a CVE. >So if this was to be merged to stable then the changelog should contain >a big fat warning about the existing users and how they should be >checked. So what I'm asking is why *wasn't* it sent to stable? Yes, it requires additional work backporting this, but what I'm saying is that this didn't happen at all. >Besides that I can see Reviewed-by: akpm and Andrew is usually very >careful about stable backports so there probably _was_ a reson to >exclude stable. >-- >Michal Hocko >SUSE Labs ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 14:36 ` Sasha Levin @ 2018-04-17 18:10 ` Michal Hocko 0 siblings, 0 replies; 113+ messages in thread From: Michal Hocko @ 2018-04-17 18:10 UTC (permalink / raw) To: Sasha Levin Cc: Petr Mladek, Greg KH, Pavel Machek, Steven Rostedt, Linus Torvalds, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue 17-04-18 14:36:44, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 04:22:46PM +0200, Michal Hocko wrote: > >On Tue 17-04-18 13:39:33, Sasha Levin wrote: > >[...] > >> But mm/ commits don't come only from these people. Here's a concrete > >> example we can discuss: > >> > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d > > > >I would be really careful. Because that reqiures to audit all callers to > >be compliant with the change. This is just _too_ easy to backport > >without noticing a failure. Now consider the other side. Is there any > >real bug report backing this? This behavior was like that for quite some > >time but I do not remember any actual bug report and the changelog > >doesn't mention one either. It is about theoretical problem. > > https://lkml.org/lkml/2018/3/19/430 > > There's even a fun little reproducer that allowed me to confirm it's an > issue (at least) on 4.15. > > Heck, it might even qualify as a CVE. > > >So if this was to be merged to stable then the changelog should contain > >a big fat warning about the existing users and how they should be > >checked. > > So what I'm asking is why *wasn't* it sent to stable? Yes, it requires > additional work backporting this, but what I'm saying is that this > didn't happen at all. Do not ask me. I wasn't involved. But I would _guess_ that the original bug is not all that serious because it requires some specific privileges and it is quite unlikely that somebody privileged would want to shoot its feet. But this is just my wild guess. Anyway, I am pretty sure that if the triggering BUG was serious enough then it would be much safer to remove it for stable backports. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 12:24 ` Petr Mladek 2018-04-17 12:49 ` Michal Hocko @ 2018-04-17 13:45 ` Sasha Levin 2018-04-18 8:33 ` Petr Mladek 1 sibling, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-17 13:45 UTC (permalink / raw) To: Petr Mladek Cc: Greg KH, Pavel Machek, Steven Rostedt, Linus Torvalds, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote: >Back to the trend. Last week I got autosel mails even for >patches that were still being discussed, had issues, and >were far from upstream: > > https://lkml.kernel.org/r/DM5PR2101MB1032AB19B489D46B717B50D4FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com > https://lkml.kernel.org/r/DM5PR2101MB10327FA0A7E0D2C901E33B79FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com > >It might be a good idea if the mail asked to add Fixes: tag >or stable mailing list. But the mail suggested to add the >unfinished patch into stable branch directly (even before >upstreaming?). I obviously didn't suggest that this patch will go in -stable before it's upstream. I've started doing those because some folks can't be arsed to reply to a review request for a patch that is months old. I found that if I send these mails while the discussion is still going on I'd get a much better response rate from people. If you think any of these patches should go in stable there were two ways about it: - You end up adding the -stable tag yourself, and it would follow the usual route where Greg picks it up. - You reply to that mail, and the patch would wait in a list until my script notices it made it upstream, at which point it would get queued for stable. >Now, there are only hand full of printk patches in each >release, so it is still doable. I just do not understand >how other maintainers, from much more busy subsystems, >could cope with this trend. > >By other words. If you want to automatize patch nomination, >you might need to automatize also patch review. Or you need >to keep the patch rate low. This might mean to nominate >only important and rather trivial fixes. I also have an effort to help review the patches. See what I'm working on for the xfs folks: https://lkml.org/lkml/2018/3/29/1113 Where in addition to build tests I'd also run each commit, for each stable kernel through a set of xfstests and provide them along with the mail. So yes, I'm aware that the volume of patches is huge, but there's not much I can do about it because it's just a subset of the kernel's patch volume and since the kernel gets more and more patches each release, the volume of stable commits is bound to grow as well. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 13:45 ` Sasha Levin @ 2018-04-18 8:33 ` Petr Mladek 0 siblings, 0 replies; 113+ messages in thread From: Petr Mladek @ 2018-04-18 8:33 UTC (permalink / raw) To: Sasha Levin Cc: Greg KH, Pavel Machek, Steven Rostedt, Linus Torvalds, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue 2018-04-17 13:45:59, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote: > >Back to the trend. Last week I got autosel mails even for > >patches that were still being discussed, had issues, and > >were far from upstream: > > > > https://lkml.kernel.org/r/DM5PR2101MB1032AB19B489D46B717B50D4FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com > > https://lkml.kernel.org/r/DM5PR2101MB10327FA0A7E0D2C901E33B79FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com > > > >It might be a good idea if the mail asked to add Fixes: tag > >or stable mailing list. But the mail suggested to add the > >unfinished patch into stable branch directly (even before > >upstreaming?). > > I obviously didn't suggest that this patch will go in -stable before > it's upstream. > > I've started doing those because some folks can't be arsed to reply to a > review request for a patch that is months old. I found that if I send > these mails while the discussion is still going on I'd get a much better > response rate from people. I see. It makes sense. > If you think any of these patches should go in stable there were two > ways about it: > > - You end up adding the -stable tag yourself, and it would follow the > usual route where Greg picks it up. > - You reply to that mail, and the patch would wait in a list until my > script notices it made it upstream, at which point it would get > queued for stable. It would be great if the options are described in the mail. I wonder if it would make sense to add also a tag that would say that the commit is not suitable for stable. It might help both sides. The maintainers will be able to share their opinion and eventually reduce mails from autosel. You would get feedback that maintainers considered the patch for stable. It might be even useful for teaching the AI. > >Now, there are only hand full of printk patches in each > >release, so it is still doable. I just do not understand > >how other maintainers, from much more busy subsystems, > >could cope with this trend. > > So yes, I'm aware that the volume of patches is huge, but there's not > much I can do about it because it's just a subset of the kernel's patch > volume and since the kernel gets more and more patches each release, the > volume of stable commits is bound to grow as well. Yes, but the grow in the stable is much faster than the grow in maintain at the moment. It might be fine if it was caused just by engaging subsystems that ignored stable so far. But I am not sure if it is the case. Also I am not sure about your plans. Anyway, I am surprised that the patches might go into stable so easily (no response -> accepted). While it is pretty hard to get through the review process for mainline. Of course, many patches go into mainline without review as well. But the difference is that they are pushed by people that are familiar and responsible for the affected area. I could understand the pain. There are surely people that do not care about stable, because it takes time, it is hard to make decisions, flashbacks to the old code are painful, etc. Well, this is the reason why the maintenance support is and should be limited. Anyway, I think that it cannot be done reasonably without maintainers. You should be careful so that even the currently cooperating maintainers will not start considering autosel mails as a spam. (It is not my case. printk is small thing. But I could imagine that it might stop being bearable in bigger subsystems. As is already the case with xfs.) Best Regards, Petr ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:14 ` Sasha Levin 2018-04-16 16:22 ` Steven Rostedt @ 2018-04-16 16:28 ` Pavel Machek 2018-04-16 16:39 ` Sasha Levin 2018-04-16 17:05 ` Pavel Machek 2 siblings, 1 reply; 113+ messages in thread From: Pavel Machek @ 2018-04-16 16:28 UTC (permalink / raw) To: Sasha Levin Cc: Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo [-- Attachment #1: Type: text/plain, Size: 1036 bytes --] > >> Is there a reason not to take LED fixes if they fix a bug and don't > >> cause a regression? Sure, we can draw some arbitrary line, maybe > >> designate some subsystems that are more "important" than others, but > >> what's the point? > > > >There's a tradeoff. > > > >You want to fix serious bugs in stable, and you really don't want > >regressions in stable. And ... stable not having 1000s of patches > >would be nice, too. > > I don't think we should use a number cap here, but rather look at the > regression rate: how many patches broke something? > > Since the rate we're seeing now with AUTOSEL is similar to what we were > seeing before AUTOSEL, what's the problem it's causing? Regression rate should not be the only criteria. More patches mean bigger chance customer's patches will have a conflict with something in -stable, for example. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:28 ` Pavel Machek @ 2018-04-16 16:39 ` Sasha Levin 2018-04-16 16:42 ` Pavel Machek 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-16 16:39 UTC (permalink / raw) To: Pavel Machek Cc: Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, Apr 16, 2018 at 06:28:50PM +0200, Pavel Machek wrote: > >> >> Is there a reason not to take LED fixes if they fix a bug and don't >> >> cause a regression? Sure, we can draw some arbitrary line, maybe >> >> designate some subsystems that are more "important" than others, but >> >> what's the point? >> > >> >There's a tradeoff. >> > >> >You want to fix serious bugs in stable, and you really don't want >> >regressions in stable. And ... stable not having 1000s of patches >> >would be nice, too. >> >> I don't think we should use a number cap here, but rather look at the >> regression rate: how many patches broke something? >> >> Since the rate we're seeing now with AUTOSEL is similar to what we were >> seeing before AUTOSEL, what's the problem it's causing? > >Regression rate should not be the only criteria. > >More patches mean bigger chance customer's patches will have a >conflict with something in -stable, for example. Out of tree patches can't be a consideration here. There are no guarantees for out of tree code, ever. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:39 ` Sasha Levin @ 2018-04-16 16:42 ` Pavel Machek 2018-04-16 16:45 ` Sasha Levin 0 siblings, 1 reply; 113+ messages in thread From: Pavel Machek @ 2018-04-16 16:42 UTC (permalink / raw) To: Sasha Levin Cc: Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo [-- Attachment #1: Type: text/plain, Size: 1420 bytes --] On Mon 2018-04-16 16:39:20, Sasha Levin wrote: > On Mon, Apr 16, 2018 at 06:28:50PM +0200, Pavel Machek wrote: > > > >> >> Is there a reason not to take LED fixes if they fix a bug and don't > >> >> cause a regression? Sure, we can draw some arbitrary line, maybe > >> >> designate some subsystems that are more "important" than others, but > >> >> what's the point? > >> > > >> >There's a tradeoff. > >> > > >> >You want to fix serious bugs in stable, and you really don't want > >> >regressions in stable. And ... stable not having 1000s of patches > >> >would be nice, too. > >> > >> I don't think we should use a number cap here, but rather look at the > >> regression rate: how many patches broke something? > >> > >> Since the rate we're seeing now with AUTOSEL is similar to what we were > >> seeing before AUTOSEL, what's the problem it's causing? > > > >Regression rate should not be the only criteria. > > > >More patches mean bigger chance customer's patches will have a > >conflict with something in -stable, for example. > > Out of tree patches can't be a consideration here. There are no > guarantees for out of tree code, ever. Out of tree code is not consideration for mainline, agreed. Stable should be different. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:42 ` Pavel Machek @ 2018-04-16 16:45 ` Sasha Levin 2018-04-16 16:54 ` Pavel Machek 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-16 16:45 UTC (permalink / raw) To: Pavel Machek Cc: Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, Apr 16, 2018 at 06:42:30PM +0200, Pavel Machek wrote: >On Mon 2018-04-16 16:39:20, Sasha Levin wrote: >> On Mon, Apr 16, 2018 at 06:28:50PM +0200, Pavel Machek wrote: >> > >> >> >> Is there a reason not to take LED fixes if they fix a bug and don't >> >> >> cause a regression? Sure, we can draw some arbitrary line, maybe >> >> >> designate some subsystems that are more "important" than others, but >> >> >> what's the point? >> >> > >> >> >There's a tradeoff. >> >> > >> >> >You want to fix serious bugs in stable, and you really don't want >> >> >regressions in stable. And ... stable not having 1000s of patches >> >> >would be nice, too. >> >> >> >> I don't think we should use a number cap here, but rather look at the >> >> regression rate: how many patches broke something? >> >> >> >> Since the rate we're seeing now with AUTOSEL is similar to what we were >> >> seeing before AUTOSEL, what's the problem it's causing? >> > >> >Regression rate should not be the only criteria. >> > >> >More patches mean bigger chance customer's patches will have a >> >conflict with something in -stable, for example. >> >> Out of tree patches can't be a consideration here. There are no >> guarantees for out of tree code, ever. > >Out of tree code is not consideration for mainline, agreed. Stable >should be different. This is a discussion we could have with in right forum, but FYI stable doesn't even guarantee KABI compatibility between minor versions at this point. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:45 ` Sasha Levin @ 2018-04-16 16:54 ` Pavel Machek 2018-04-17 10:50 ` Greg KH 0 siblings, 1 reply; 113+ messages in thread From: Pavel Machek @ 2018-04-16 16:54 UTC (permalink / raw) To: Sasha Levin Cc: Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo [-- Attachment #1: Type: text/plain, Size: 1920 bytes --] On Mon 2018-04-16 16:45:16, Sasha Levin wrote: > On Mon, Apr 16, 2018 at 06:42:30PM +0200, Pavel Machek wrote: > >On Mon 2018-04-16 16:39:20, Sasha Levin wrote: > >> On Mon, Apr 16, 2018 at 06:28:50PM +0200, Pavel Machek wrote: > >> > > >> >> >> Is there a reason not to take LED fixes if they fix a bug and don't > >> >> >> cause a regression? Sure, we can draw some arbitrary line, maybe > >> >> >> designate some subsystems that are more "important" than others, but > >> >> >> what's the point? > >> >> > > >> >> >There's a tradeoff. > >> >> > > >> >> >You want to fix serious bugs in stable, and you really don't want > >> >> >regressions in stable. And ... stable not having 1000s of patches > >> >> >would be nice, too. > >> >> > >> >> I don't think we should use a number cap here, but rather look at the > >> >> regression rate: how many patches broke something? > >> >> > >> >> Since the rate we're seeing now with AUTOSEL is similar to what we were > >> >> seeing before AUTOSEL, what's the problem it's causing? > >> > > >> >Regression rate should not be the only criteria. > >> > > >> >More patches mean bigger chance customer's patches will have a > >> >conflict with something in -stable, for example. > >> > >> Out of tree patches can't be a consideration here. There are no > >> guarantees for out of tree code, ever. > > > >Out of tree code is not consideration for mainline, agreed. Stable > >should be different. > > This is a discussion we could have with in right forum, but FYI stable > doesn't even guarantee KABI compatibility between minor versions at this > point. Stable should be useful base for distributions. They carry out of tree patches, and yes, you should try to make their lives easy. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:54 ` Pavel Machek @ 2018-04-17 10:50 ` Greg KH 0 siblings, 0 replies; 113+ messages in thread From: Greg KH @ 2018-04-17 10:50 UTC (permalink / raw) To: Pavel Machek Cc: Sasha Levin, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, Apr 16, 2018 at 06:54:51PM +0200, Pavel Machek wrote: > On Mon 2018-04-16 16:45:16, Sasha Levin wrote: > > On Mon, Apr 16, 2018 at 06:42:30PM +0200, Pavel Machek wrote: > > >On Mon 2018-04-16 16:39:20, Sasha Levin wrote: > > >> On Mon, Apr 16, 2018 at 06:28:50PM +0200, Pavel Machek wrote: > > >> > > > >> >> >> Is there a reason not to take LED fixes if they fix a bug and don't > > >> >> >> cause a regression? Sure, we can draw some arbitrary line, maybe > > >> >> >> designate some subsystems that are more "important" than others, but > > >> >> >> what's the point? > > >> >> > > > >> >> >There's a tradeoff. > > >> >> > > > >> >> >You want to fix serious bugs in stable, and you really don't want > > >> >> >regressions in stable. And ... stable not having 1000s of patches > > >> >> >would be nice, too. > > >> >> > > >> >> I don't think we should use a number cap here, but rather look at the > > >> >> regression rate: how many patches broke something? > > >> >> > > >> >> Since the rate we're seeing now with AUTOSEL is similar to what we were > > >> >> seeing before AUTOSEL, what's the problem it's causing? > > >> > > > >> >Regression rate should not be the only criteria. > > >> > > > >> >More patches mean bigger chance customer's patches will have a > > >> >conflict with something in -stable, for example. > > >> > > >> Out of tree patches can't be a consideration here. There are no > > >> guarantees for out of tree code, ever. > > > > > >Out of tree code is not consideration for mainline, agreed. Stable > > >should be different. > > > > This is a discussion we could have with in right forum, but FYI stable > > doesn't even guarantee KABI compatibility between minor versions at this > > point. > > Stable should be useful base for distributions. They carry out of tree > patches, and yes, you should try to make their lives easy. How do you know I already don't do that? But no, in the end, it's not my job to make their life easier if they are off in their own corner never providing me feedback or help. For those companies/distros/SoCs that do provide that feedback, I am glad to work with them. As proof of that, there are at least 3 "major" SoC vendors that have been merging every one of the stable releases into their internal trees for the past 6+ months now. I get reports when they do the merge and test, and so far, we have only had 1 regression. And that regression was because that SoC vendor forgot to upstream a patch that they had in their internal tree (i.e. they fixed it a while ago but forgot to tell anyone else, nothing we can do about that.) So if you are a distro/company/whatever that takes stable releases, and have run into problems, please let me know, and I will be glad to work with you. If you are not that, then please don't attempt to speak for them... thanks, greg k-h ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:14 ` Sasha Levin 2018-04-16 16:22 ` Steven Rostedt 2018-04-16 16:28 ` Pavel Machek @ 2018-04-16 17:05 ` Pavel Machek 2018-04-16 17:16 ` Sasha Levin 2 siblings, 1 reply; 113+ messages in thread From: Pavel Machek @ 2018-04-16 17:05 UTC (permalink / raw) To: Sasha Levin Cc: Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo [-- Attachment #1: Type: text/plain, Size: 764 bytes --] Hi! > How do you know if a bug bothers someone? > > If a user is annoyed by a LED issue, is he expected to triage the bug, > report it on LKML and patiently wait for the appropriate patch to be > backported? If the user is annoyed by a LED issue, you are actually expected to tell him that it is not going to be fixed, because it is not on the list: - It must fix a problem that causes a build error (but not for things marked CONFIG_BROKEN), an oops, a hang, data corruption, a real security issue, or some "oh, that's not good" issue. In short, something critical. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 17:05 ` Pavel Machek @ 2018-04-16 17:16 ` Sasha Levin 2018-04-16 17:44 ` Steven Rostedt 2018-04-16 20:17 ` Jiri Kosina 0 siblings, 2 replies; 113+ messages in thread From: Sasha Levin @ 2018-04-16 17:16 UTC (permalink / raw) To: Pavel Machek Cc: Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, Apr 16, 2018 at 07:05:01PM +0200, Pavel Machek wrote: >Hi! > >> How do you know if a bug bothers someone? >> >> If a user is annoyed by a LED issue, is he expected to triage the bug, >> report it on LKML and patiently wait for the appropriate patch to be >> backported? > >If the user is annoyed by a LED issue, you are actually expected to >tell him that it is not going to be fixed, because it is not on the list: > > - It must fix a problem that causes a build error (but not for things > marked CONFIG_BROKEN), an oops, a hang, data corruption, a real > security issue, or some "oh, that's not good" issue. In short, > something critical. So if a user is operating a nuclear power plant, and has 2 leds: green one that says "All OK!" and a red one saying "NUCLEAR MELTDOWN!", and once in a blue moon a race condition is causing the red one to go on and cause panic in the little province he lives in, we should tell that user to fuck off? LEDs may not be critical for you, but they can be critical for someone else. Think of all the different users we have and the wildly different ways they use the kernel. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 17:16 ` Sasha Levin @ 2018-04-16 17:44 ` Steven Rostedt 2018-04-16 18:17 ` Sasha Levin 2018-04-16 20:17 ` Jiri Kosina 1 sibling, 1 reply; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 17:44 UTC (permalink / raw) To: Sasha Levin Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, 16 Apr 2018 17:16:10 +0000 Sasha Levin <Alexander.Levin@microsoft.com> wrote: > So if a user is operating a nuclear power plant, and has 2 leds: green > one that says "All OK!" and a red one saying "NUCLEAR MELTDOWN!", and > once in a blue moon a race condition is causing the red one to go on and > cause panic in the little province he lives in, we should tell that user > to fuck off? > > LEDs may not be critical for you, but they can be critical for someone > else. Think of all the different users we have and the wildly different > ways they use the kernel. We can point them to the fix and have them backport it. Or they should ask their distribution to backport it. Hopefully they tested the kernel they are using for something like that, and only want critical fixes. What happens if they take the next stable assuming that it has critical fixes only, and this fix causes a regression that creates the "ALL OK!" when it wasn't. Basically, I rather have stable be more bug compatible with the version it is based on with only critical fixes (things that will cause an oops) than to try to be bug compatible with mainline, as then we get into a state where things are a frankenstein of the stable base version and mainline. I could say, "Yeah this feature works better on this 4.x version of the kernel" and not worry about "4.x.y" versions having it better. -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 17:44 ` Steven Rostedt @ 2018-04-16 18:17 ` Sasha Levin 2018-04-16 18:35 ` Steven Rostedt 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-16 18:17 UTC (permalink / raw) To: Steven Rostedt Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, Apr 16, 2018 at 01:44:23PM -0400, Steven Rostedt wrote: >On Mon, 16 Apr 2018 17:16:10 +0000 >Sasha Levin <Alexander.Levin@microsoft.com> wrote: > > >> So if a user is operating a nuclear power plant, and has 2 leds: green >> one that says "All OK!" and a red one saying "NUCLEAR MELTDOWN!", and >> once in a blue moon a race condition is causing the red one to go on and >> cause panic in the little province he lives in, we should tell that user >> to fuck off? >> >> LEDs may not be critical for you, but they can be critical for someone >> else. Think of all the different users we have and the wildly different >> ways they use the kernel. > >We can point them to the fix and have them backport it. Or they should >ask their distribution to backport it. It may work in your subsystem, but it really doesn't work this way with the kernel. Let me share a concrete example with you: there's a vfs bug that's a pain to reproduce going around. It was originally reported on CoreOS/AWS: https://github.com/coreos/bugs/issues/2356 But our customers reported to us that they're hitting this issue too. We couldn't reproduce it, and the call trace indicated it may be a memory corrution. We could however confirm with the customers that the latest mainline fixes the issue. Given that we couldn't reproduce it, and neither of us is a fs/ expert, we sent a mail to LKML, just like you suggested doing: https://lkml.org/lkml/2018/3/2/1038 But unlike what you said, no one pointed us to the fix, even though the issue was fixed on mainline. Heck, no one engaged in any meaningful conversation about the bug. I really think that we have a different views as to how well the whole "let me shoot a mail to LKML" process works, which leads to different views on -stable. >Hopefully they tested the kernel they are using for something like >that, and only want critical fixes. What happens if they take the next >stable assuming that it has critical fixes only, and this fix causes a >regression that creates the "ALL OK!" when it wasn't. > >Basically, I rather have stable be more bug compatible with the version >it is based on with only critical fixes (things that will cause an >oops) than to try to be bug compatible with mainline, as then we get >into a state where things are a frankenstein of the stable base version >and mainline. I could say, "Yeah this feature works better on this >4.x version of the kernel" and not worry about "4.x.y" versions having >it better. This is how things used to work, right? Look at redhat kernels for example, they'd stick with a kernel for tens of years, doing the tiniest fixes, only when customers complained, and encouraging users to upgrade only when the kernel would go EoL, and when customers couldn't do that because they were too locked on that kernel version. redhat still supports 2.6.9. I thought we agreed that this is bad? We wanted users to be closer to mainline, and we can't do it without bringing -stable closer to mainline as well. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 18:17 ` Sasha Levin @ 2018-04-16 18:35 ` Steven Rostedt 0 siblings, 0 replies; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 18:35 UTC (permalink / raw) To: Sasha Levin Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, 16 Apr 2018 18:17:17 +0000 Sasha Levin <Alexander.Levin@microsoft.com> wrote: > I thought we agreed that this is bad? We wanted users to be closer to > mainline, and we can't do it without bringing -stable closer to mainline > as well. I guess the question comes down to, what do the users of stable kernels want? For my machines, I always stay one or two releases behind mainline. Right now my kernels are on 4.15.x, and will probably jump to 4.16.x the next time I upgrade my machines. I'm fine with something breaking every so often as long as it's not data corruption (although I have lots of backups of my systems in case that happens, just a PITA to fix it). I only hit bugs on these boxes probably once a year at most in doing so. But I mostly do what other kernel developers do and that means the bugs I would mostly hit, other developers hit before their code is released. Thus, if stable users are fine with being regression compatible with mainline, then I'm fine with it too. -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 17:16 ` Sasha Levin 2018-04-16 17:44 ` Steven Rostedt @ 2018-04-16 20:17 ` Jiri Kosina 2018-04-16 20:36 ` Sasha Levin 1 sibling, 1 reply; 113+ messages in thread From: Jiri Kosina @ 2018-04-16 20:17 UTC (permalink / raw) To: Sasha Levin Cc: Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, 16 Apr 2018, Sasha Levin wrote: > So if a user is operating a nuclear power plant, and has 2 leds: green > one that says "All OK!" and a red one saying "NUCLEAR MELTDOWN!", and > once in a blue moon a race condition is causing the red one to go on and > cause panic in the little province he lives in, we should tell that user > to fuck off? > > LEDs may not be critical for you, but they can be critical for someone > else. Think of all the different users we have and the wildly different > ways they use the kernel. I am pretty sure that for almost every fix there is a person on a planet that'd rate it "critical". We can't really use this as an argument for inclusion of code into -stable, as that'd mean that -stable and Linus' tree would have to be basically the same. -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 20:17 ` Jiri Kosina @ 2018-04-16 20:36 ` Sasha Levin 2018-04-16 20:43 ` Jiri Kosina 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-16 20:36 UTC (permalink / raw) To: Jiri Kosina Cc: Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, Apr 16, 2018 at 10:17:17PM +0200, Jiri Kosina wrote: >On Mon, 16 Apr 2018, Sasha Levin wrote: > >> So if a user is operating a nuclear power plant, and has 2 leds: green >> one that says "All OK!" and a red one saying "NUCLEAR MELTDOWN!", and >> once in a blue moon a race condition is causing the red one to go on and >> cause panic in the little province he lives in, we should tell that user >> to fuck off? >> >> LEDs may not be critical for you, but they can be critical for someone >> else. Think of all the different users we have and the wildly different >> ways they use the kernel. > >I am pretty sure that for almost every fix there is a person on a planet >that'd rate it "critical". We can't really use this as an argument for >inclusion of code into -stable, as that'd mean that -stable and Linus' So I think that Linus's claim that users come first applies here as well. If there's a user that cares about a particular feature being broken, then we go ahead and fix his bug rather then ignoring him. >tree would have to be basically the same. Basically the same minus all new features/subsystems/arch/etc. But yes, ideally we'd want all bugfixes that go in mainline. Who not? Instead of keeping bug fixes out, we need to work on improving our testing story. Instead of ignoring that "person that'd rate it critical" we should add his usecase into our testing matrix. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 20:36 ` Sasha Levin @ 2018-04-16 20:43 ` Jiri Kosina 2018-04-16 21:18 ` Sasha Levin 0 siblings, 1 reply; 113+ messages in thread From: Jiri Kosina @ 2018-04-16 20:43 UTC (permalink / raw) To: Sasha Levin Cc: Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, 16 Apr 2018, Sasha Levin wrote: > So I think that Linus's claim that users come first applies here as > well. If there's a user that cares about a particular feature being > broken, then we go ahead and fix his bug rather then ignoring him. So one extreme is fixing -stable *iff* users actually do report an issue. The other extreme is backporting everything that potentially looks like a potential fix of "something" (according to some arbitrary metric), pro-actively. The former voilates the "users first" rule, the latter has a very, very high risk of regressions. So this whole debate is about finding a compromise. My gut feeling always was that the statement in Documentation/process/stable-kernel-rules.rst is very reasonable, but making the process way more "aggresive" when backporting patches is breaking much of its original spirit for me. -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 20:43 ` Jiri Kosina @ 2018-04-16 21:18 ` Sasha Levin 2018-04-16 21:28 ` Jiri Kosina 2018-05-03 9:47 ` Pavel Machek 0 siblings, 2 replies; 113+ messages in thread From: Sasha Levin @ 2018-04-16 21:18 UTC (permalink / raw) To: Jiri Kosina Cc: Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote: >On Mon, 16 Apr 2018, Sasha Levin wrote: > >> So I think that Linus's claim that users come first applies here as >> well. If there's a user that cares about a particular feature being >> broken, then we go ahead and fix his bug rather then ignoring him. > >So one extreme is fixing -stable *iff* users actually do report an issue. > >The other extreme is backporting everything that potentially looks like a >potential fix of "something" (according to some arbitrary metric), >pro-actively. > >The former voilates the "users first" rule, the latter has a very, very >high risk of regressions. > >So this whole debate is about finding a compromise. > >My gut feeling always was that the statement in > > Documentation/process/stable-kernel-rules.rst > >is very reasonable, but making the process way more "aggresive" when >backporting patches is breaking much of its original spirit for me. I agree that as an enterprise distro taking everything from -stable isn't the best idea. Ideally you'd want to be close to the first extreme you've mentioned and only take commits if customers are asking you to do so. I think that the rule we're trying to agree upon is the "It must fix a real bug that bothers people". I think that we can agree that it's impossible to expect every single Linux user to go on LKML and complain about a bug he encountered, so the rule quickly becomes "It must fix a real bug that can bother people". My "aggressiveness" comes from the whole "bother" part: it doesn't have to be critical, it doesn't have to cause data corruption, it doesn't have to be a security issue. It's enough that the bug actually affects a user in a way he didn't expect it to (if a user doesn't have expectations, it would fall under the "This could be a problem..." exception. We can go into a discussion about what exactly "bothering" is, but on the flip side, the whole -stable tag is just a way for folks to indicate they want a given patch reviewed for stable, it's not actually a guarantee of whether the patch will go in to -stable or not. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 21:18 ` Sasha Levin @ 2018-04-16 21:28 ` Jiri Kosina 2018-04-17 10:39 ` Greg KH 2018-05-03 9:47 ` Pavel Machek 1 sibling, 1 reply; 113+ messages in thread From: Jiri Kosina @ 2018-04-16 21:28 UTC (permalink / raw) To: Sasha Levin Cc: Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, 16 Apr 2018, Sasha Levin wrote: > I agree that as an enterprise distro taking everything from -stable > isn't the best idea. Ideally you'd want to be close to the first > extreme you've mentioned and only take commits if customers are asking > you to do so. > > I think that the rule we're trying to agree upon is the "It must fix > a real bug that bothers people". > > I think that we can agree that it's impossible to expect every single > Linux user to go on LKML and complain about a bug he encountered, so the > rule quickly becomes "It must fix a real bug that can bother people". So is there a reason why stable couldn't become some hybrid-form union of - really critical issues (data corruption, boot issues, severe security issues) taken from bleeding edge upstream - [reviewed] cherry-picks of functional fixes from major distro kernels (based on that very -stable release), as that's apparently what people are hitting in the real world with that particular kernel ? Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 21:28 ` Jiri Kosina @ 2018-04-17 10:39 ` Greg KH 2018-04-17 11:07 ` Michal Hocko 2018-04-17 11:21 ` Jiri Kosina 0 siblings, 2 replies; 113+ messages in thread From: Greg KH @ 2018-04-17 10:39 UTC (permalink / raw) To: Jiri Kosina Cc: Sasha Levin, Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote: > On Mon, 16 Apr 2018, Sasha Levin wrote: > > > I agree that as an enterprise distro taking everything from -stable > > isn't the best idea. Ideally you'd want to be close to the first > > extreme you've mentioned and only take commits if customers are asking > > you to do so. > > > > I think that the rule we're trying to agree upon is the "It must fix > > a real bug that bothers people". > > > > I think that we can agree that it's impossible to expect every single > > Linux user to go on LKML and complain about a bug he encountered, so the > > rule quickly becomes "It must fix a real bug that can bother people". > > So is there a reason why stable couldn't become some hybrid-form union of > > - really critical issues (data corruption, boot issues, severe security > issues) taken from bleeding edge upstream > - [reviewed] cherry-picks of functional fixes from major distro kernels > (based on that very -stable release), as that's apparently what people > are hitting in the real world with that particular kernel It already is that :) The problem Sasha is trying to solve here is that for many subsystems, maintainers do not mark patches for stable at all. So real bugfixes that do hit people are not getting to those kernels, which force the distros to do extra work to triage a bug, dig through upstream kernels, find and apply the patch. By identifying the patches that should have been marked for stable, based on the ways that the changelog text is written and the logic in the patch itself, we circumvent that extra annoyance of users hitting problems and complaining, or ignoring them and hoping they go away if they reboot. I've been doing this "by hand" for many years now, with no complaints so far. Sasha has taken it to the next level as I don't scale and has started to automate it using some really nice tools. That's all, this isn't crazy new features being backported, it's just patches that are obviously fixes being added to the stable tree. Yes, sometimes those fixes need additional fixes, and that's fine, normal stable-marked patches need that all the time. I don't see anyone complaining about that, right? So nothing "new" is happening here, EXCEPT we are actually starting to get a better kernel-wide coverage for stable fixes, which we have not had in the past. That's a good thing! The number of patches applied to stable is still a very very very tiny % compared to mainline, so nothing new is happening here. Oh, and if you do want to complain about huge new features being backported, look at the mess that Spectre and Meltdown has caused in the stable trees. I don't see anyone complaining about those massive changes :) thanks, greg k-h ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 10:39 ` Greg KH @ 2018-04-17 11:07 ` Michal Hocko 2018-04-17 14:04 ` Sasha Levin 2018-04-17 11:21 ` Jiri Kosina 1 sibling, 1 reply; 113+ messages in thread From: Michal Hocko @ 2018-04-17 11:07 UTC (permalink / raw) To: Greg KH Cc: Jiri Kosina, Sasha Levin, Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue 17-04-18 12:39:36, Greg KH wrote: > On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote: > > On Mon, 16 Apr 2018, Sasha Levin wrote: > > > > > I agree that as an enterprise distro taking everything from -stable > > > isn't the best idea. Ideally you'd want to be close to the first > > > extreme you've mentioned and only take commits if customers are asking > > > you to do so. > > > > > > I think that the rule we're trying to agree upon is the "It must fix > > > a real bug that bothers people". > > > > > > I think that we can agree that it's impossible to expect every single > > > Linux user to go on LKML and complain about a bug he encountered, so the > > > rule quickly becomes "It must fix a real bug that can bother people". > > > > So is there a reason why stable couldn't become some hybrid-form union of > > > > - really critical issues (data corruption, boot issues, severe security > > issues) taken from bleeding edge upstream > > - [reviewed] cherry-picks of functional fixes from major distro kernels > > (based on that very -stable release), as that's apparently what people > > are hitting in the real world with that particular kernel > > It already is that :) > > The problem Sasha is trying to solve here is that for many subsystems, > maintainers do not mark patches for stable at all. The way he is trying to do that is just wrong. Generate a pressure on those subsystems by referring to bug reports and unhappy users and I am pretty sure they will try harder... You cannot solve the problem by bypassing them without having deep understanding of the specific subsytem. Once you have it, just make sure you are part of the review process and make sure to mark patches before they are merged. > So real bugfixes > that do hit people are not getting to those kernels, which force the > distros to do extra work to triage a bug, dig through upstream kernels, > find and apply the patch. I would say that this is the primary role of the distro. To hide the jungle of the upstream work and provide the additional of bug filtering and forwarding them the right direction. > By identifying the patches that should have been marked for stable, > based on the ways that the changelog text is written and the logic in > the patch itself, we circumvent that extra annoyance of users hitting > problems and complaining, or ignoring them and hoping they go away if > they reboot. Well, but this is a two edge sword. You are not only backporting obvious bug fixes but also pulling many patch out of the context they were merged to and double checking all the assumptions are still true is a non-trivial task to do. I am still not convinced any script or AI can do that right now. > I've been doing this "by hand" for many years now, with no complaints so > far. Really? I remember quite some complains about broken stable releases and also many discussions on KS how the current workflow doesn't really work for some users (e.g. distributions). > Sasha has taken it to the next level as I don't scale and has > started to automate it using some really nice tools. That's all, this > isn't crazy new features being backported, it's just patches that are > obviously fixes being added to the stable tree. I have yet to see a tool which can recognize an "obvious fix". Seriously! Matching keywords in the changelog and some pattern recognition in the diff can help to do some pre filtering _can_ help a lot but there is still a human interaction needed to do sanity checking. And that really requires deep subsystem knowledge. I really fail to see how that can work without relevant people involvement. Pretending that you can do stable without maintainers will simply not work IMNHO. > Yes, sometimes those fixes need additional fixes, and that's fine, > normal stable-marked patches need that all the time. I don't see anyone > complaining about that, right? > > So nothing "new" is happening here, EXCEPT we are actually starting to > get a better kernel-wide coverage for stable fixes, which we have not > had in the past. That's a good thing! The number of patches applied to > stable is still a very very very tiny % compared to mainline, so nothing > new is happening here. yes I do agree, the stable process is not very much different from the past and I would tend both processes broken because they explicitly try to avoid maintainers which is just wrong. > Oh, and if you do want to complain about huge new features being > backported, look at the mess that Spectre and Meltdown has caused in the > stable trees. I don't see anyone complaining about those massive > changes :) Are you serious? Are you going the compare the biggest PITA that the community had to undergo because of HW issues with random pattern matching in changelog/diffs? Come on! -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 11:07 ` Michal Hocko @ 2018-04-17 14:04 ` Sasha Levin 2018-04-17 14:15 ` Steven Rostedt 2018-04-17 14:36 ` Michal Hocko 0 siblings, 2 replies; 113+ messages in thread From: Sasha Levin @ 2018-04-17 14:04 UTC (permalink / raw) To: Michal Hocko Cc: Greg KH, Jiri Kosina, Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote: >On Tue 17-04-18 12:39:36, Greg KH wrote: >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote: >> > On Mon, 16 Apr 2018, Sasha Levin wrote: >> > >> > > I agree that as an enterprise distro taking everything from -stable >> > > isn't the best idea. Ideally you'd want to be close to the first >> > > extreme you've mentioned and only take commits if customers are asking >> > > you to do so. >> > > >> > > I think that the rule we're trying to agree upon is the "It must fix >> > > a real bug that bothers people". >> > > >> > > I think that we can agree that it's impossible to expect every single >> > > Linux user to go on LKML and complain about a bug he encountered, so the >> > > rule quickly becomes "It must fix a real bug that can bother people". >> > >> > So is there a reason why stable couldn't become some hybrid-form union of >> > >> > - really critical issues (data corruption, boot issues, severe security >> > issues) taken from bleeding edge upstream >> > - [reviewed] cherry-picks of functional fixes from major distro kernels >> > (based on that very -stable release), as that's apparently what people >> > are hitting in the real world with that particular kernel >> >> It already is that :) >> >> The problem Sasha is trying to solve here is that for many subsystems, >> maintainers do not mark patches for stable at all. > >The way he is trying to do that is just wrong. Generate a pressure on >those subsystems by referring to bug reports and unhappy users and I am >pretty sure they will try harder... You cannot solve the problem by >bypassing them without having deep understanding of the specific >subsytem. Once you have it, just make sure you are part of the review >process and make sure to mark patches before they are merged. I think we just don't agree on how we should "pressure". Look at the discussion I had with the XFS folks who just don't want to deal with this -stable thing because they have to much work upstream. There wasn't a single patch in -stable coming from XFS for the past 6+ months. I'm aware of more than one way to corrupt an XFS volume for any distro that uses a kernel older than 4.15. Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be better about it, but I don't see this changing. The solution to this, in my opinion, is to automate the whole selection and review process. We do selection using AI, and we run every possible test that's relevant to that subsystem. At which point, the amount of work a human needs to do to review a patch shrinks into something far more managable for some maintainers. >> So real bugfixes >> that do hit people are not getting to those kernels, which force the >> distros to do extra work to triage a bug, dig through upstream kernels, >> find and apply the patch. > >I would say that this is the primary role of the distro. To hide the >jungle of the upstream work and provide the additional of bug filtering >and forwarding them the right direction. More often than triaging, you'll just be asked to upgrade to the latest version. What sort of user experience does that provide? [snip] >> So nothing "new" is happening here, EXCEPT we are actually starting to >> get a better kernel-wide coverage for stable fixes, which we have not >> had in the past. That's a good thing! The number of patches applied to >> stable is still a very very very tiny % compared to mainline, so nothing >> new is happening here. > >yes I do agree, the stable process is not very much different from the >past and I would tend both processes broken because they explicitly try >to avoid maintainers which is just wrong. Avoid maintainers?! We send so much "spam" trying to get maintainers more involved in the process. How is that avoiding them? If you're a maintainer who has specific requirements for the -stable flow, or you have any automated testing you'd like to be run on these commits, or you want these mails to come in a different format, or pretty much anything else at all just shoot me a mail! It's been almost impossible to get maintainers involved in this process. We don't sneak anything past maintainers, there are multiple mails over multiple weeks for each commit that would go in. You don't have to review it right away either, just reply with "please don't merge until I'm done reviewing" and it'll get removed from the queue. >> Oh, and if you do want to complain about huge new features being >> backported, look at the mess that Spectre and Meltdown has caused in the >> stable trees. I don't see anyone complaining about those massive >> changes :) > >Are you serious? Are you going the compare the biggest PITA that the >community had to undergo because of HW issues with random pattern >matching in changelog/diffs? Come on! HW Issues are irrelevant here. You had a bug that allowed arbitrary kernel memory access. I can easily list quite a few commits, that are not tagged for stable, that fix exactly the same thing. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 14:04 ` Sasha Levin @ 2018-04-17 14:15 ` Steven Rostedt 2018-04-17 14:36 ` Greg KH 2018-04-17 14:36 ` Michal Hocko 1 sibling, 1 reply; 113+ messages in thread From: Steven Rostedt @ 2018-04-17 14:15 UTC (permalink / raw) To: Sasha Levin Cc: Michal Hocko, Greg KH, Jiri Kosina, Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue, 17 Apr 2018 14:04:36 +0000 Sasha Levin <Alexander.Levin@microsoft.com> wrote: > The solution to this, in my opinion, is to automate the whole selection > and review process. We do selection using AI, and we run every possible > test that's relevant to that subsystem. > > At which point, the amount of work a human needs to do to review a patch > shrinks into something far more managable for some maintainers. I guess the real question is, who are the stable kernels for? Is it just a place to look at to see what distros should think about. A superset of what distros would take. Then distros would have a nice place to look to find what patches they should look at. But the stable tree itself wont be used. But it's not being used today by major distros (Red Hat and SuSE). Debian may be using it, but that's because the stable maintainer for its kernels is also the Debian maintainer. Who are the customers of the stable trees? They are the ones that should be determining the "equation" for what goes into it. Personally, I use stable as a one off from mainline. Like I mentioned in another email. I'm currently on 4.15.x and will probably move to 4.16.x next. Unless there's some critical bug announcement, I update my machines once a month. I originally just used mainline, but that was a bit too unstable for my work machines ;-) -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 14:15 ` Steven Rostedt @ 2018-04-17 14:36 ` Greg KH 0 siblings, 0 replies; 113+ messages in thread From: Greg KH @ 2018-04-17 14:36 UTC (permalink / raw) To: Steven Rostedt Cc: Sasha Levin, Michal Hocko, Jiri Kosina, Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue, Apr 17, 2018 at 10:15:02AM -0400, Steven Rostedt wrote: > On Tue, 17 Apr 2018 14:04:36 +0000 > Sasha Levin <Alexander.Levin@microsoft.com> wrote: > > > The solution to this, in my opinion, is to automate the whole selection > > and review process. We do selection using AI, and we run every possible > > test that's relevant to that subsystem. > > > > At which point, the amount of work a human needs to do to review a patch > > shrinks into something far more managable for some maintainers. > > I guess the real question is, who are the stable kernels for? Is it just > a place to look at to see what distros should think about. A superset > of what distros would take. Then distros would have a nice place to > look to find what patches they should look at. But the stable tree > itself wont be used. But it's not being used today by major distros > (Red Hat and SuSE). Debian may be using it, but that's because the > stable maintainer for its kernels is also the Debian maintainer. > > Who are the customers of the stable trees? They are the ones that > should be determining the "equation" for what goes into it. The "customers" of the stable trees are anyone who uses Linux. Right now, it's estimated that only about 1/3 of the kernels running out there, at the best, are an "enterprise" kernel/distro. 2/3 of the world either run a kernel.org-based release + their own patches, or Debian. And Debian piggy-backs on the stable kernel releases pretty regularily. So the majority of the Linux users out there are what we are doing this for. Those that do not pay for a company to sift through things and only cherry-pick what they want to pick out (hint, they almost always miss things, some do this better than others...) That's who this is all for, which is why we are doing our best to keep on top of the avalanche of patches going into upstream to get the needed fixes (both security and "normal" fixes) out to users as soon as possible. So again, if you are a subsystem maintainer, tag your patches for stable. If you do not, you will get automated emails asking you about patches that should be applied (like the one that started this thread). If you want to just have us ignore your subsystem entirely, we will be glad to do so, and we will tell the world to not use your subsystem if at all possible (see previous comments about xfs, and I would argue IB right now...) > Personally, I use stable as a one off from mainline. Like I mentioned > in another email. I'm currently on 4.15.x and will probably move to > 4.16.x next. Unless there's some critical bug announcement, I update my > machines once a month. I originally just used mainline, but that was a > bit too unstable for my work machines ;-) That's great, you are a user of these trees then. So you benifit directly, along with everyone else who relies on them. And again, I'm working with the SoC vendors to directly incorporate these trees into their device trees, and I've already seen some devices in the wild push out updated 4.4.y kernels a few weeks after they are released. That's the end goal here, to have the world's devices in a much more secure and stable shape by relying on these kernels. thanks, greg k-h ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 14:04 ` Sasha Levin 2018-04-17 14:15 ` Steven Rostedt @ 2018-04-17 14:36 ` Michal Hocko 2018-04-17 14:55 ` Sasha Levin 1 sibling, 1 reply; 113+ messages in thread From: Michal Hocko @ 2018-04-17 14:36 UTC (permalink / raw) To: Sasha Levin Cc: Greg KH, Jiri Kosina, Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue 17-04-18 14:04:36, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote: > >On Tue 17-04-18 12:39:36, Greg KH wrote: > >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote: > >> > On Mon, 16 Apr 2018, Sasha Levin wrote: > >> > > >> > > I agree that as an enterprise distro taking everything from -stable > >> > > isn't the best idea. Ideally you'd want to be close to the first > >> > > extreme you've mentioned and only take commits if customers are asking > >> > > you to do so. > >> > > > >> > > I think that the rule we're trying to agree upon is the "It must fix > >> > > a real bug that bothers people". > >> > > > >> > > I think that we can agree that it's impossible to expect every single > >> > > Linux user to go on LKML and complain about a bug he encountered, so the > >> > > rule quickly becomes "It must fix a real bug that can bother people". > >> > > >> > So is there a reason why stable couldn't become some hybrid-form union of > >> > > >> > - really critical issues (data corruption, boot issues, severe security > >> > issues) taken from bleeding edge upstream > >> > - [reviewed] cherry-picks of functional fixes from major distro kernels > >> > (based on that very -stable release), as that's apparently what people > >> > are hitting in the real world with that particular kernel > >> > >> It already is that :) > >> > >> The problem Sasha is trying to solve here is that for many subsystems, > >> maintainers do not mark patches for stable at all. > > > >The way he is trying to do that is just wrong. Generate a pressure on > >those subsystems by referring to bug reports and unhappy users and I am > >pretty sure they will try harder... You cannot solve the problem by > >bypassing them without having deep understanding of the specific > >subsytem. Once you have it, just make sure you are part of the review > >process and make sure to mark patches before they are merged. > > I think we just don't agree on how we should "pressure". > > Look at the discussion I had with the XFS folks who just don't want to > deal with this -stable thing because they have to much work upstream. So do you really think that you or any script decide without them? My recollection from that discussion was quite opposite. Dave was quite clear that most of fixes are quite hard to evaluate and most of them are simply not worth risking the backport. > There wasn't a single patch in -stable coming from XFS for the past 6+ > months. I'm aware of more than one way to corrupt an XFS volume for any > distro that uses a kernel older than 4.15. Then try to poke/bribe somebody to have it fixed. But applying _something_ is just not a solution. You should also evaluate whether "I am able to corrupt" is something that "people see in the wild". Sure there are zillions of bugs hidden in the large code base like the kernel. People just do not tend to hit them and this will likely not change very much in the future. > Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be > better about it, but I don't see this changing. I can surely have one or two and discuss this. I am pretty sure xfs guys are not going to pretend older kernels do not exist. > The solution to this, in my opinion, is to automate the whole selection > and review process. We do selection using AI, and we run every possible > test that's relevant to that subsystem. > > At which point, the amount of work a human needs to do to review a patch > shrinks into something far more managable for some maintainers. I really disagree. I am pretty sure maintainers are very well aware of how the patch is important. Some do no care about stable and I agree you should poke those. But some have really good reasons to not throw many patches that direction because they do not feel the patch is important enough. Remember this is not about numbers. The more is not always better. > >> So real bugfixes > >> that do hit people are not getting to those kernels, which force the > >> distros to do extra work to triage a bug, dig through upstream kernels, > >> find and apply the patch. > > > >I would say that this is the primary role of the distro. To hide the > >jungle of the upstream work and provide the additional of bug filtering > >and forwarding them the right direction. > > More often than triaging, you'll just be asked to upgrade to the latest > version. What sort of user experience does that provide? > > [snip] > > >> So nothing "new" is happening here, EXCEPT we are actually starting to > >> get a better kernel-wide coverage for stable fixes, which we have not > >> had in the past. That's a good thing! The number of patches applied to > >> stable is still a very very very tiny % compared to mainline, so nothing > >> new is happening here. > > > >yes I do agree, the stable process is not very much different from the > >past and I would tend both processes broken because they explicitly try > >to avoid maintainers which is just wrong. > > Avoid maintainers?! We send so much "spam" trying to get maintainers > more involved in the process. How is that avoiding them? Just read what your wrote again. I am pretty sure AUTOSEL is on filter list on many people. We have a good volume of email traffic already and seeing more automatic one just doesn't help. At all! > If you're a maintainer who has specific requirements for the -stable > flow, or you have any automated testing you'd like to be run on these > commits, or you want these mails to come in a different format, or > pretty much anything else at all just shoot me a mail! > > It's been almost impossible to get maintainers involved in this process. The whole stable history was that about not bothering maintainers and here is the result. > We don't sneak anything past maintainers, there are multiple mails over > multiple weeks for each commit that would go in. You don't have to > review it right away either, just reply with "please don't merge until > I'm done reviewing" and it'll get removed from the queue. I am not talking about sneaking or pushing behind the backs. I am just saying that you cannot do this without direct involvement of maintainers. If they do not respond to bug reports should at them and I am pretty sure that those subsystems will get a bigger pressure to find their way to select _important_ fixes to users who are not running the bleeding edge because those users _matter_ as well (maybe even more because they are a much larger group). > >> Oh, and if you do want to complain about huge new features being > >> backported, look at the mess that Spectre and Meltdown has caused in the > >> stable trees. I don't see anyone complaining about those massive > >> changes :) > > > >Are you serious? Are you going the compare the biggest PITA that the > >community had to undergo because of HW issues with random pattern > >matching in changelog/diffs? Come on! > > HW Issues are irrelevant here. You had a bug that allowed arbitrary > kernel memory access. I can easily list quite a few commits, that are > not tagged for stable, that fix exactly the same thing. Those are important fixes and if you are aware of them then you should be involving the respective maintainer. I haven't heard about _any_ maintainer who would refuse to help. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 14:36 ` Michal Hocko @ 2018-04-17 14:55 ` Sasha Levin 2018-04-17 15:52 ` Jiri Kosina 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-17 14:55 UTC (permalink / raw) To: Michal Hocko Cc: Greg KH, Jiri Kosina, Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue, Apr 17, 2018 at 04:36:31PM +0200, Michal Hocko wrote: >On Tue 17-04-18 14:04:36, Sasha Levin wrote: >> On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote: >> >On Tue 17-04-18 12:39:36, Greg KH wrote: >> >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote: >> >> > On Mon, 16 Apr 2018, Sasha Levin wrote: >> >> > >> >> > > I agree that as an enterprise distro taking everything from -stable >> >> > > isn't the best idea. Ideally you'd want to be close to the first >> >> > > extreme you've mentioned and only take commits if customers are asking >> >> > > you to do so. >> >> > > >> >> > > I think that the rule we're trying to agree upon is the "It must fix >> >> > > a real bug that bothers people". >> >> > > >> >> > > I think that we can agree that it's impossible to expect every single >> >> > > Linux user to go on LKML and complain about a bug he encountered, so the >> >> > > rule quickly becomes "It must fix a real bug that can bother people". >> >> > >> >> > So is there a reason why stable couldn't become some hybrid-form union of >> >> > >> >> > - really critical issues (data corruption, boot issues, severe security >> >> > issues) taken from bleeding edge upstream >> >> > - [reviewed] cherry-picks of functional fixes from major distro kernels >> >> > (based on that very -stable release), as that's apparently what people >> >> > are hitting in the real world with that particular kernel >> >> >> >> It already is that :) >> >> >> >> The problem Sasha is trying to solve here is that for many subsystems, >> >> maintainers do not mark patches for stable at all. >> > >> >The way he is trying to do that is just wrong. Generate a pressure on >> >those subsystems by referring to bug reports and unhappy users and I am >> >pretty sure they will try harder... You cannot solve the problem by >> >bypassing them without having deep understanding of the specific >> >subsytem. Once you have it, just make sure you are part of the review >> >process and make sure to mark patches before they are merged. >> >> I think we just don't agree on how we should "pressure". >> >> Look at the discussion I had with the XFS folks who just don't want to >> deal with this -stable thing because they have to much work upstream. > >So do you really think that you or any script decide without them? My >recollection from that discussion was quite opposite. Dave was quite >clear that most of fixes are quite hard to evaluate and most of them >are simply not worth risking the backport. No, *some* fixes are hard, not most. I'm not trying to decide for them, I'm trying to help them decide. >> There wasn't a single patch in -stable coming from XFS for the past 6+ >> months. I'm aware of more than one way to corrupt an XFS volume for any >> distro that uses a kernel older than 4.15. > >Then try to poke/bribe somebody to have it fixed. But applying >_something_ is just not a solution. You should also evaluate whether "I >am able to corrupt" is something that "people see in the wild". Sure >there are zillions of bugs hidden in the large code base like the >kernel. People just do not tend to hit them and this will likely not >change very much in the future. We can't ignore bugs just because people don't notice. Data corruption bugs in particular are a pain to report as well, the corruption might have happened months before and there's not much to report at that point. There's quite a few bug classes like that. >> Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be >> better about it, but I don't see this changing. > >I can surely have one or two and discuss this. I am pretty sure xfs guys >are not going to pretend older kernels do not exist. > >> The solution to this, in my opinion, is to automate the whole selection >> and review process. We do selection using AI, and we run every possible >> test that's relevant to that subsystem. >> >> At which point, the amount of work a human needs to do to review a patch >> shrinks into something far more managable for some maintainers. > >I really disagree. I am pretty sure maintainers are very well aware of >how the patch is important. Some do no care about stable and I agree you >should poke those. But some have really good reasons to not throw many >patches that direction because they do not feel the patch is important >enough. > >Remember this is not about numbers. The more is not always better. So what is "important"? Look at the XFS issues, they were important enough to get fixed upstream, and have an appropriate test added to xfstests. Why didn't they go back to -stable? >> >> So real bugfixes >> >> that do hit people are not getting to those kernels, which force the >> >> distros to do extra work to triage a bug, dig through upstream kernels, >> >> find and apply the patch. >> > >> >I would say that this is the primary role of the distro. To hide the >> >jungle of the upstream work and provide the additional of bug filtering >> >and forwarding them the right direction. >> >> More often than triaging, you'll just be asked to upgrade to the latest >> version. What sort of user experience does that provide? >> >> [snip] >> >> >> So nothing "new" is happening here, EXCEPT we are actually starting to >> >> get a better kernel-wide coverage for stable fixes, which we have not >> >> had in the past. That's a good thing! The number of patches applied to >> >> stable is still a very very very tiny % compared to mainline, so nothing >> >> new is happening here. >> > >> >yes I do agree, the stable process is not very much different from the >> >past and I would tend both processes broken because they explicitly try >> >to avoid maintainers which is just wrong. >> >> Avoid maintainers?! We send so much "spam" trying to get maintainers >> more involved in the process. How is that avoiding them? > >Just read what your wrote again. I am pretty sure AUTOSEL is on filter >list on many people. We have a good volume of email traffic already and >seeing more automatic one just doesn't help. At all! > >> If you're a maintainer who has specific requirements for the -stable >> flow, or you have any automated testing you'd like to be run on these >> commits, or you want these mails to come in a different format, or >> pretty much anything else at all just shoot me a mail! >> >> It's been almost impossible to get maintainers involved in this process. > >The whole stable history was that about not bothering maintainers and >here is the result. > >> We don't sneak anything past maintainers, there are multiple mails over >> multiple weeks for each commit that would go in. You don't have to >> review it right away either, just reply with "please don't merge until >> I'm done reviewing" and it'll get removed from the queue. > >I am not talking about sneaking or pushing behind the backs. I am just >saying that you cannot do this without direct involvement of >maintainers. If they do not respond to bug reports should at them and I >am pretty sure that those subsystems will get a bigger pressure to find >their way to select _important_ fixes to users who are not running the >bleeding edge because those users _matter_ as well (maybe even more >because they are a much larger group). > >> >> Oh, and if you do want to complain about huge new features being >> >> backported, look at the mess that Spectre and Meltdown has caused in the >> >> stable trees. I don't see anyone complaining about those massive >> >> changes :) >> > >> >Are you serious? Are you going the compare the biggest PITA that the >> >community had to undergo because of HW issues with random pattern >> >matching in changelog/diffs? Come on! >> >> HW Issues are irrelevant here. You had a bug that allowed arbitrary >> kernel memory access. I can easily list quite a few commits, that are >> not tagged for stable, that fix exactly the same thing. > >Those are important fixes and if you are aware of them then you should >be involving the respective maintainer. I haven't heard about _any_ >maintainer who would refuse to help. Let's do it this way: let's assume my AUTOSEL project is bad and I'll get rid of it tomorrow. How do I get the XFS folks to send their stuff to -stable? (we have quite a few customers who use XFS) How do I get the KVM folks to be more consistent about tagging patches for -stable? (we support nested KVM!) How Do I get people who are not aware of how the -stable project to tag their commits properly? (there's quite a long tail of authors sending 1 important bugfix and disappearing forever) We can agree that just asking them nicely doesn't work: Greg has been poking maintainers for years, the -stable project got bunch of publicity, and the instructions for including a patch in -stable are pretty straightforward. You're saying that AUTOSEL doesn't work, so let's ignore that too. How should we proceed? ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 14:55 ` Sasha Levin @ 2018-04-17 15:52 ` Jiri Kosina 2018-04-17 16:06 ` Sasha Levin 2018-04-17 16:25 ` Mike Galbraith 0 siblings, 2 replies; 113+ messages in thread From: Jiri Kosina @ 2018-04-17 15:52 UTC (permalink / raw) To: Sasha Levin Cc: Michal Hocko, Greg KH, Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue, 17 Apr 2018, Sasha Levin wrote: > How do I get the XFS folks to send their stuff to -stable? (we have > quite a few customers who use XFS) If XFS (or *any* other subsystem) doesn't have enough manpower of upstream maintainers to deal with stable, we just have to accept that and find an answer to that. If XFS folks claim that they don't have enough mental capacity to create/verify XFS backports, I totally don't see how any kind of AI would have. If your business relies on XFS (and so does ours, BTW) or any other subsystem that doesn't have enough manpower to care for stable, the proper solution (and contribution) would be just bringing more people into the XFS community. To put it simply -- I don't think the simple lack of actual human brainpower can be reasonably resolved in other way than bringing more of it in. Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 15:52 ` Jiri Kosina @ 2018-04-17 16:06 ` Sasha Levin 2018-05-03 10:04 ` Pavel Machek 2018-04-17 16:25 ` Mike Galbraith 1 sibling, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-17 16:06 UTC (permalink / raw) To: Jiri Kosina Cc: Michal Hocko, Greg KH, Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote: >On Tue, 17 Apr 2018, Sasha Levin wrote: > >> How do I get the XFS folks to send their stuff to -stable? (we have >> quite a few customers who use XFS) > >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream >maintainers to deal with stable, we just have to accept that and find an >answer to that. This is exactly what I'm doing. Many subsystems don't have enough manpower to deal with -stable, so I'm trying to help. >If XFS folks claim that they don't have enough mental capacity to >create/verify XFS backports, I totally don't see how any kind of AI would >have. Because creating backports is not all about mental capacity! A lot of time gets wasted on going through the list of commits, backporting each of those commits into every -stable tree we have, building it, running tests, etc. So it's not all about pure mental capacity, but more about the time per-patch it takes to get -stable done. If I can cut down on that, by suggesting a list of commits, doing builds and tests, what's the problem? >If your business relies on XFS (and so does ours, BTW) or any other >subsystem that doesn't have enough manpower to care for stable, the proper >solution (and contribution) would be just bringing more people into the >XFS community. Microsoft's business relies on quite a few kernel subsystems. While we try to bring more people in the kernel (we're hiring!), as you might know it's not easy getting kernel folks. So just "get more people" isn't a good solution. It doesn't scale either. >To put it simply -- I don't think the simple lack of actual human >brainpower can be reasonably resolved in other way than bringing more of >it in. > >Thanks, > >-- >Jiri Kosina >SUSE Labs > ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 16:06 ` Sasha Levin @ 2018-05-03 10:04 ` Pavel Machek 2018-05-03 13:02 ` Sasha Levin 0 siblings, 1 reply; 113+ messages in thread From: Pavel Machek @ 2018-05-03 10:04 UTC (permalink / raw) To: Sasha Levin Cc: Jiri Kosina, Michal Hocko, Greg KH, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo [-- Attachment #1: Type: text/plain, Size: 1289 bytes --] On Tue 2018-04-17 16:06:29, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote: > >On Tue, 17 Apr 2018, Sasha Levin wrote: > > > >> How do I get the XFS folks to send their stuff to -stable? (we have > >> quite a few customers who use XFS) > > > >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream > >maintainers to deal with stable, we just have to accept that and find an > >answer to that. > > This is exactly what I'm doing. Many subsystems don't have enough > manpower to deal with -stable, so I'm trying to help. ...and the torrent of spams from the AUTOSEL subsystem actually makes that worse. And when you are told particular fix to LEDs is not that important after all, you start arguing about nuclear power plants (without really knowing how critical subsystems work). If you want cooperation with maintainers to work, the rules need to be clear, first. They are documented, so follow them. If you think rules are wrong, lets talk about changing the rules; but arguing "every bug is important because someone may be hitting it" is not ok. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-05-03 10:04 ` Pavel Machek @ 2018-05-03 13:02 ` Sasha Levin 0 siblings, 0 replies; 113+ messages in thread From: Sasha Levin @ 2018-05-03 13:02 UTC (permalink / raw) To: Pavel Machek Cc: Jiri Kosina, Michal Hocko, Greg KH, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Thu, May 03, 2018 at 12:04:41PM +0200, Pavel Machek wrote: >On Tue 2018-04-17 16:06:29, Sasha Levin wrote: >> On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote: >> >On Tue, 17 Apr 2018, Sasha Levin wrote: >> > >> >> How do I get the XFS folks to send their stuff to -stable? (we have >> >> quite a few customers who use XFS) >> > >> >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream >> >maintainers to deal with stable, we just have to accept that and find an >> >answer to that. >> >> This is exactly what I'm doing. Many subsystems don't have enough >> manpower to deal with -stable, so I'm trying to help. > >...and the torrent of spams from the AUTOSEL subsystem actually makes >that worse. > >And when you are told particular fix to LEDs is not that important >after all, you start arguing about nuclear power plants (without >really knowing how critical subsystems work). Obviously your knowledge far surpasses mine. >If you want cooperation with maintainers to work, the rules need to be >clear, first. They are documented, so follow them. If you think rules >are wrong, lets talk about changing the rules; but arguing "every bug >is important because someone may be hitting it" is not ok. I'm sorry but you're just unfamiliar with the process. I'd point out that all my AUTOSEL commits go through Greg, who wrote the rules, and accepts my patches. The rules are there as a guideline to allow us to not take certain patches, they're not there as a strict set of rules we must follow at all times. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 15:52 ` Jiri Kosina 2018-04-17 16:06 ` Sasha Levin @ 2018-04-17 16:25 ` Mike Galbraith 1 sibling, 0 replies; 113+ messages in thread From: Mike Galbraith @ 2018-04-17 16:25 UTC (permalink / raw) To: Jiri Kosina, Sasha Levin Cc: Michal Hocko, Greg KH, Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue, 2018-04-17 at 17:52 +0200, Jiri Kosina wrote: > On Tue, 17 Apr 2018, Sasha Levin wrote: > > > How do I get the XFS folks to send their stuff to -stable? (we have > > quite a few customers who use XFS) > > If XFS (or *any* other subsystem) doesn't have enough manpower of upstream > maintainers to deal with stable, we just have to accept that and find an > answer to that. > > If XFS folks claim that they don't have enough mental capacity to > create/verify XFS backports, I totally don't see how any kind of AI would > have. > > If your business relies on XFS (and so does ours, BTW) or any other > subsystem that doesn't have enough manpower to care for stable, the proper > solution (and contribution) would be just bringing more people into the > XFS community. > > To put it simply -- I don't think the simple lack of actual human > brainpower can be reasonably resolved in other way than bringing more of > it in. Not to worry... soon enough it'll be submitting properly massaged backports of the stuff it submitted upstream :) -Mike ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 10:39 ` Greg KH 2018-04-17 11:07 ` Michal Hocko @ 2018-04-17 11:21 ` Jiri Kosina 1 sibling, 0 replies; 113+ messages in thread From: Jiri Kosina @ 2018-04-17 11:21 UTC (permalink / raw) To: Greg KH Cc: Sasha Levin, Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue, 17 Apr 2018, Greg KH wrote: > It already is that :) I have a question: I guess a stable team has an idea who they are preparing the tree for, IOW who is the target consumer. Who is it? Certainly it's not major distros, as both RH and SUSE already stated that they are either not basing off the stable kernel (only cherry-pick fixes from it) (RH), or are quite far in the process of moving away from stable tree towards combination of what RH is doing + semi-automated evaluation of Fixes: tag (SUSE). If the target audience is somewhere else, that's perfectly fine, but then it'd have to be stated explicitly I guess. I can't speak for RH, but for us (at least me personally), the pace of patches flowing into -stable is way too high for us to keep control of what is landing in our tree. In some sense, stability should be equivalent to "minimal necessary amout of super-critical changes". That's not what this "let's backport proactively almost everything that has word 'fixes' in changelog" (I'm a bit exaggerating of course) seems to be about. Again, the rules stated out in Documentation/process/stable-kernel-rules.rst are very nice, and are exactly something at least we would be very happy about. They have the nice hidden asumption in them, that someone actually has to actively invest human brain power to think about the fix, its consequences, prerequisities, etc. Not just doing a big dump of all commits that "might fix something". How many of the actual patches flowing into -stable would satisfy those criteria these days? IOW, I'm pretty sure our users are much happier with us supplying them reactive fixes than pro-active uncertainity. > The problem Sasha is trying to solve here is that for many subsystems, > maintainers do not mark patches for stable at all. The pressure on those subsystems should be coming from unhappy users (being it end-users or vendors redistributing the tree) of the stable tree, who would be complaining about missing fixes for those subsystems. Is this actually happening? Where? > Oh, and if you do want to complain about huge new features being > backported, look at the mess that Spectre and Meltdown has caused in the > stable trees. I don't see anyone complaining about those massive > changes :) Umm, sorry, how is this related? There simply was no other way, and I took it for given that this is seen by everybody involved as an absolute exception, due to the nature of the issue and of the massive changes that were needed. Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 21:18 ` Sasha Levin 2018-04-16 21:28 ` Jiri Kosina @ 2018-05-03 9:47 ` Pavel Machek 2018-05-03 13:06 ` Sasha Levin 1 sibling, 1 reply; 113+ messages in thread From: Pavel Machek @ 2018-05-03 9:47 UTC (permalink / raw) To: Sasha Levin Cc: Jiri Kosina, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo [-- Attachment #1: Type: text/plain, Size: 2295 bytes --] On Mon 2018-04-16 21:18:47, Sasha Levin wrote: > On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote: > >On Mon, 16 Apr 2018, Sasha Levin wrote: > > > >> So I think that Linus's claim that users come first applies here as > >> well. If there's a user that cares about a particular feature being > >> broken, then we go ahead and fix his bug rather then ignoring him. > > > >So one extreme is fixing -stable *iff* users actually do report an issue. > > > >The other extreme is backporting everything that potentially looks like a > >potential fix of "something" (according to some arbitrary metric), > >pro-actively. > > > >The former voilates the "users first" rule, the latter has a very, very > >high risk of regressions. > > > >So this whole debate is about finding a compromise. > > > >My gut feeling always was that the statement in > > > > Documentation/process/stable-kernel-rules.rst > > > >is very reasonable, but making the process way more "aggresive" when > >backporting patches is breaking much of its original spirit for me. > > I agree that as an enterprise distro taking everything from -stable > isn't the best idea. Ideally you'd want to be close to the first Original purpose of -stable was "to be common base of enterprise distros" and our documentation still says it is. > I think that we can agree that it's impossible to expect every single > Linux user to go on LKML and complain about a bug he encountered, so the > rule quickly becomes "It must fix a real bug that can bother > people". I think you are playing dangerous word games. > My "aggressiveness" comes from the whole "bother" part: it doesn't have > to be critical, it doesn't have to cause data corruption, it doesn't > have to be a security issue. It's enough that the bug actually affects a > user in a way he didn't expect it to (if a user doesn't have > expectations, it would fall under the "This could be a problem..." > exception. And it seems documentation says you should be less aggressive and world tells you they expect to be less aggressive. So maybe that's what you should do? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-05-03 9:47 ` Pavel Machek @ 2018-05-03 13:06 ` Sasha Levin 0 siblings, 0 replies; 113+ messages in thread From: Sasha Levin @ 2018-05-03 13:06 UTC (permalink / raw) To: Pavel Machek Cc: Jiri Kosina, Linus Torvalds, Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Thu, May 03, 2018 at 11:47:24AM +0200, Pavel Machek wrote: >On Mon 2018-04-16 21:18:47, Sasha Levin wrote: >> On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote: >> >On Mon, 16 Apr 2018, Sasha Levin wrote: >> > >> >> So I think that Linus's claim that users come first applies here as >> >> well. If there's a user that cares about a particular feature being >> >> broken, then we go ahead and fix his bug rather then ignoring him. >> > >> >So one extreme is fixing -stable *iff* users actually do report an issue. >> > >> >The other extreme is backporting everything that potentially looks like a >> >potential fix of "something" (according to some arbitrary metric), >> >pro-actively. >> > >> >The former voilates the "users first" rule, the latter has a very, very >> >high risk of regressions. >> > >> >So this whole debate is about finding a compromise. >> > >> >My gut feeling always was that the statement in >> > >> > Documentation/process/stable-kernel-rules.rst >> > >> >is very reasonable, but making the process way more "aggresive" when >> >backporting patches is breaking much of its original spirit for me. >> >> I agree that as an enterprise distro taking everything from -stable >> isn't the best idea. Ideally you'd want to be close to the first > >Original purpose of -stable was "to be common base of enterprise >distros" and our documentation still says it is. I guess that the world changes? At this point calling enterprise distros a niche wouldn't be too far from the truth. Furthermore, some enterprise distros (as stated earlier in this thread) don't even follow -stable anymore and cherry pick their own commits. So no, the main driving force behind -stable is not traditional enterprise distributions. >> I think that we can agree that it's impossible to expect every single >> Linux user to go on LKML and complain about a bug he encountered, so the >> rule quickly becomes "It must fix a real bug that can bother >> people". > >I think you are playing dangerous word games. > >> My "aggressiveness" comes from the whole "bother" part: it doesn't have >> to be critical, it doesn't have to cause data corruption, it doesn't >> have to be a security issue. It's enough that the bug actually affects a >> user in a way he didn't expect it to (if a user doesn't have >> expectations, it would fall under the "This could be a problem..." >> exception. > >And it seems documentation says you should be less aggressive and >world tells you they expect to be less aggressive. So maybe that's >what you should do? Who is this "world" you're referring to? ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:06 ` Pavel Machek 2018-04-16 16:14 ` Sasha Levin @ 2018-04-16 16:20 ` Steven Rostedt 2018-04-16 16:28 ` Sasha Levin 1 sibling, 1 reply; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 16:20 UTC (permalink / raw) To: Pavel Machek Cc: Sasha Levin, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, 16 Apr 2018 18:06:08 +0200 Pavel Machek <pavel@ucw.cz> wrote: > That means you want to ignore not-so-serious bugs, because benefit of > fixing them is lower than risk of the regressions. I believe bugs that > do not bother anyone should _not_ be fixed in stable. > > That was case of the LED patch. Yes, the commit fixed bug, but it > introduced regressions that were fixed by subsequent patches. I agree. I would disagree that the patch this thread is on should go to stable. What's the point of stable if it introduces regressions by backporting bug fixes for non major bugs. Every fix I make I consider labeling it for stable. The ones I don't, I feel the bug fix is not worth the risk of added regressions. I worry that people will get lazy and stop marking commits for stable (or even thinking about it) because they know that there's a bot that will pull it for them. That thought crossed my mind. Why do I want to label anything stable if a bot will probably catch it. Then I could just wait till the bot posts it before I even think about stable. -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:20 ` Steven Rostedt @ 2018-04-16 16:28 ` Sasha Levin 2018-04-16 16:39 ` Pavel Machek 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-16 16:28 UTC (permalink / raw) To: Steven Rostedt Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, Apr 16, 2018 at 12:20:19PM -0400, Steven Rostedt wrote: >On Mon, 16 Apr 2018 18:06:08 +0200 >Pavel Machek <pavel@ucw.cz> wrote: > >> That means you want to ignore not-so-serious bugs, because benefit of >> fixing them is lower than risk of the regressions. I believe bugs that >> do not bother anyone should _not_ be fixed in stable. >> >> That was case of the LED patch. Yes, the commit fixed bug, but it >> introduced regressions that were fixed by subsequent patches. > >I agree. I would disagree that the patch this thread is on should go to >stable. What's the point of stable if it introduces regressions by >backporting bug fixes for non major bugs. One such reason is that users will then hit the regression when they upgrade to the next -stable version anyways. >Every fix I make I consider labeling it for stable. The ones I don't, I >feel the bug fix is not worth the risk of added regressions. > >I worry that people will get lazy and stop marking commits for stable >(or even thinking about it) because they know that there's a bot that >will pull it for them. That thought crossed my mind. Why do I want to >label anything stable if a bot will probably catch it. Then I could >just wait till the bot posts it before I even think about stable. People are already "lazy". You are actually an exception for marking your commits. Yes, folks will chime in with "sure, I mark my patches too!", but if you look at the entire committer pool in the kernel you'll see that most don't bother with this to begin with. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:28 ` Sasha Levin @ 2018-04-16 16:39 ` Pavel Machek 2018-04-16 16:43 ` Sasha Levin 0 siblings, 1 reply; 113+ messages in thread From: Pavel Machek @ 2018-04-16 16:39 UTC (permalink / raw) To: Sasha Levin Cc: Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo [-- Attachment #1: Type: text/plain, Size: 2318 bytes --] On Mon 2018-04-16 16:28:00, Sasha Levin wrote: > On Mon, Apr 16, 2018 at 12:20:19PM -0400, Steven Rostedt wrote: > >On Mon, 16 Apr 2018 18:06:08 +0200 > >Pavel Machek <pavel@ucw.cz> wrote: > > > >> That means you want to ignore not-so-serious bugs, because benefit of > >> fixing them is lower than risk of the regressions. I believe bugs that > >> do not bother anyone should _not_ be fixed in stable. > >> > >> That was case of the LED patch. Yes, the commit fixed bug, but it > >> introduced regressions that were fixed by subsequent patches. > > > >I agree. I would disagree that the patch this thread is on should go to > >stable. What's the point of stable if it introduces regressions by > >backporting bug fixes for non major bugs. > > One such reason is that users will then hit the regression when they > upgrade to the next -stable version anyways. Well, yes, testing is required when moving from 4.14 to 4.15. But testing should not be required when moving from 4.14.5 to 4.14.6. > >Every fix I make I consider labeling it for stable. The ones I don't, I > >feel the bug fix is not worth the risk of added regressions. > > > >I worry that people will get lazy and stop marking commits for stable > >(or even thinking about it) because they know that there's a bot that > >will pull it for them. That thought crossed my mind. Why do I want to > >label anything stable if a bot will probably catch it. Then I could > >just wait till the bot posts it before I even think about stable. > > People are already "lazy". You are actually an exception for marking your > commits. > > Yes, folks will chime in with "sure, I mark my patches too!", but if you > look at the entire committer pool in the kernel you'll see that most > don't bother with this to begin with. So you take everything and put it into stable? I don't think that's a solution. If you are worried about people not putting enough "Stable: " tags in their commits, perhaps you can write them emails "hey, I think this should go to stable, do you agree"? You should get people marking their commits themselves pretty quickly... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:39 ` Pavel Machek @ 2018-04-16 16:43 ` Sasha Levin 2018-04-16 16:53 ` Steven Rostedt 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-16 16:43 UTC (permalink / raw) To: Pavel Machek Cc: Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, Apr 16, 2018 at 06:39:53PM +0200, Pavel Machek wrote: >On Mon 2018-04-16 16:28:00, Sasha Levin wrote: >> On Mon, Apr 16, 2018 at 12:20:19PM -0400, Steven Rostedt wrote: >> >On Mon, 16 Apr 2018 18:06:08 +0200 >> >Pavel Machek <pavel@ucw.cz> wrote: >> > >> >> That means you want to ignore not-so-serious bugs, because benefit of >> >> fixing them is lower than risk of the regressions. I believe bugs that >> >> do not bother anyone should _not_ be fixed in stable. >> >> >> >> That was case of the LED patch. Yes, the commit fixed bug, but it >> >> introduced regressions that were fixed by subsequent patches. >> > >> >I agree. I would disagree that the patch this thread is on should go to >> >stable. What's the point of stable if it introduces regressions by >> >backporting bug fixes for non major bugs. >> >> One such reason is that users will then hit the regression when they >> upgrade to the next -stable version anyways. > >Well, yes, testing is required when moving from 4.14 to 4.15. But >testing should not be required when moving from 4.14.5 to 4.14.6. You always have to test, even without the AUTOSEL stuff. The rejection rate was 2% even before AUTOSEL, so there was always a chance of regression when upgrading minor stable versions. >> >Every fix I make I consider labeling it for stable. The ones I don't, I >> >feel the bug fix is not worth the risk of added regressions. >> > >> >I worry that people will get lazy and stop marking commits for stable >> >(or even thinking about it) because they know that there's a bot that >> >will pull it for them. That thought crossed my mind. Why do I want to >> >label anything stable if a bot will probably catch it. Then I could >> >just wait till the bot posts it before I even think about stable. >> >> People are already "lazy". You are actually an exception for marking your >> commits. >> >> Yes, folks will chime in with "sure, I mark my patches too!", but if you >> look at the entire committer pool in the kernel you'll see that most >> don't bother with this to begin with. > >So you take everything and put it into stable? I don't think that's a >solution. I don't think I ever said that I want to put *everything* >If you are worried about people not putting enough "Stable: " tags in >their commits, perhaps you can write them emails "hey, I think this >should go to stable, do you agree"? You should get people marking >their commits themselves pretty quickly... Greg has been doing this for years, ask him how that worked out for him. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:43 ` Sasha Levin @ 2018-04-16 16:53 ` Steven Rostedt 2018-04-16 16:58 ` Pavel Machek 2018-04-16 17:09 ` Sasha Levin 0 siblings, 2 replies; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 16:53 UTC (permalink / raw) To: Sasha Levin Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, 16 Apr 2018 16:43:13 +0000 Sasha Levin <Alexander.Levin@microsoft.com> wrote: > >If you are worried about people not putting enough "Stable: " tags in > >their commits, perhaps you can write them emails "hey, I think this > >should go to stable, do you agree"? You should get people marking > >their commits themselves pretty quickly... > > Greg has been doing this for years, ask him how that worked out for him. Then he shouldn't pull in the fix. Let it be broken. As soon as someone complains about it being broken, then bug the maintainer again. "Hey, this is broken in 4.x, and this looks like the fix for it. Do you agree?" I agree that some patches don't need this discussion. Things that are obvious. Off-by-one and stack-overflow and other bugs like that. Or another common bug is error paths that don't release locks. These should just be backported. But subtle fixes like this thread should default to (not backport unless someones complains or the author/maintainer acks it). -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:53 ` Steven Rostedt @ 2018-04-16 16:58 ` Pavel Machek 2018-04-16 17:09 ` Sasha Levin 1 sibling, 0 replies; 113+ messages in thread From: Pavel Machek @ 2018-04-16 16:58 UTC (permalink / raw) To: Steven Rostedt Cc: Sasha Levin, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH [-- Attachment #1: Type: text/plain, Size: 1334 bytes --] On Mon 2018-04-16 12:53:07, Steven Rostedt wrote: > On Mon, 16 Apr 2018 16:43:13 +0000 > Sasha Levin <Alexander.Levin@microsoft.com> wrote: > > > >If you are worried about people not putting enough "Stable: " tags in > > >their commits, perhaps you can write them emails "hey, I think this > > >should go to stable, do you agree"? You should get people marking > > >their commits themselves pretty quickly... > > > > Greg has been doing this for years, ask him how that worked out for him. > > Then he shouldn't pull in the fix. Let it be broken. As soon as someone > complains about it being broken, then bug the maintainer again. "Hey, > this is broken in 4.x, and this looks like the fix for it. Do you > agree?" > > I agree that some patches don't need this discussion. Things that are > obvious. Off-by-one and stack-overflow and other bugs like that. Or > another common bug is error paths that don't release locks. These > should just be backported. But subtle fixes like this thread should > default to (not backport unless someones complains or the > author/maintainer acks it). Agreed. And it scares me we are even discussing this. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:53 ` Steven Rostedt 2018-04-16 16:58 ` Pavel Machek @ 2018-04-16 17:09 ` Sasha Levin 2018-04-16 17:33 ` Steven Rostedt 1 sibling, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-16 17:09 UTC (permalink / raw) To: Steven Rostedt Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, Apr 16, 2018 at 12:53:07PM -0400, Steven Rostedt wrote: >On Mon, 16 Apr 2018 16:43:13 +0000 >Sasha Levin <Alexander.Levin@microsoft.com> wrote: > >> >If you are worried about people not putting enough "Stable: " tags in >> >their commits, perhaps you can write them emails "hey, I think this >> >should go to stable, do you agree"? You should get people marking >> >their commits themselves pretty quickly... >> >> Greg has been doing this for years, ask him how that worked out for him. > >Then he shouldn't pull in the fix. Let it be broken. As soon as someone >complains about it being broken, then bug the maintainer again. "Hey, >this is broken in 4.x, and this looks like the fix for it. Do you >agree?" If that process would work, I would also get ACK/NACK on every AUTOSEL request I'm sending. What usually happens with customer reported issues is that either they're just told to upgrade to the latest kernel (where the bug is fixed), or if the distro team can't get them to do that and go hunting for that bug, they'll just pick it for their kernel tree without ever telling -stable. I had a project to get all the fixes Cannonical had in their trees that we're not in -stable. We're talking hundreds of patches here. >I agree that some patches don't need this discussion. Things that are >obvious. Off-by-one and stack-overflow and other bugs like that. Or >another common bug is error paths that don't release locks. These >should just be backported. But subtle fixes like this thread should >default to (not backport unless someones complains or the >author/maintainer acks it). Let's play a "be the -stable maintainer" game. Would you take any of the following commits? https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=fc90441e728aa461a8ed1cfede08b0b9efef43fb https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=a918d2bcea6aab6e671bfb0901cbecc3cf68fca1 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=b1999fa6e8145305a6c8bda30ea20783717708e6 ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 17:09 ` Sasha Levin @ 2018-04-16 17:33 ` Steven Rostedt 2018-04-16 17:42 ` Sasha Levin 0 siblings, 1 reply; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 17:33 UTC (permalink / raw) To: Sasha Levin Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, 16 Apr 2018 17:09:38 +0000 Sasha Levin <Alexander.Levin@microsoft.com> wrote: > Let's play a "be the -stable maintainer" game. Would you take any > of the following commits? > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=fc90441e728aa461a8ed1cfede08b0b9efef43fb No, not automatically, or without someone from KVM letting me know what side-effects that may have. Not stopping on a breakpoint is not that critical, it may be a bit annoying. I would ask the KVM maintainers if they feel it's critical enough for backporting, but without hearing from them, I would leave it be. > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=a918d2bcea6aab6e671bfb0901cbecc3cf68fca1 Sure. Even if it has a subtle regression, that's a critical bug being fixed. > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=b1999fa6e8145305a6c8bda30ea20783717708e6 I would consider unlocking a mutex that one didn't lock a critical bug, so yes. Again, things that deal with locking or buffer overflows, I would take the fix, as those are critical. But other behavior issues where it's not critical, I would leave be unless told further by someone else. -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 17:33 ` Steven Rostedt @ 2018-04-16 17:42 ` Sasha Levin 2018-04-16 18:26 ` Steven Rostedt 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-16 17:42 UTC (permalink / raw) To: Steven Rostedt Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, Apr 16, 2018 at 01:33:21PM -0400, Steven Rostedt wrote: >On Mon, 16 Apr 2018 17:09:38 +0000 >Sasha Levin <Alexander.Levin@microsoft.com> wrote: > >> Let's play a "be the -stable maintainer" game. Would you take any >> of the following commits? >> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=fc90441e728aa461a8ed1cfede08b0b9efef43fb > >No, not automatically, or without someone from KVM letting me know what >side-effects that may have. Not stopping on a breakpoint is not that >critical, it may be a bit annoying. I would ask the KVM maintainers if >they feel it's critical enough for backporting, but without hearing >from them, I would leave it be. Fair enough. >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=a918d2bcea6aab6e671bfb0901cbecc3cf68fca1 > >Sure. Even if it has a subtle regression, that's a critical bug being >fixed. This was later reverted, in -stable: """ Commit d63c7dd5bcb9 ("ipr: Fix out-of-bounds null overwrite") removed the end of line handling when storing the update_fw sysfs attribute. This changed the userpace API because it started refusing writes terminated by a line feed, which broke the update tools we already have. """ >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=b1999fa6e8145305a6c8bda30ea20783717708e6 > >I would consider unlocking a mutex that one didn't lock a critical bug, >so yes. > >Again, things that deal with locking or buffer overflows, I would take >the fix, as those are critical. But other behavior issues where it's >not critical, I would leave be unless told further by someone else. This too, was reverted: """ It causes run-time breakage in the 4.4-stable tree and more patches are needed to be applied first before this one in order to resolve the issue. """ This is how fun it is reviewing AUTOSEL commits :) Even the small "trivial", "obviously correct" patches have room for errors for various reasons. Also note that all of these patches were tagged for stable and actually ended up in at least one tree. This is why I'm basing a lot of my decision making on the rejection rate. If the AUTOSEL process does the job well enough as the "regular" process did before, why push it back? ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 17:42 ` Sasha Levin @ 2018-04-16 18:26 ` Steven Rostedt 2018-04-16 18:30 ` Linus Torvalds 2018-04-16 18:35 ` Sasha Levin 0 siblings, 2 replies; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 18:26 UTC (permalink / raw) To: Sasha Levin Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, 16 Apr 2018 17:42:38 +0000 Sasha Levin <Alexander.Levin@microsoft.com> wrote: > >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=a918d2bcea6aab6e671bfb0901cbecc3cf68fca1 > > > >Sure. Even if it has a subtle regression, that's a critical bug being > >fixed. > > This was later reverted, in -stable: > > """ > Commit d63c7dd5bcb9 ("ipr: Fix out-of-bounds null overwrite") removed > the end of line handling when storing the update_fw sysfs attribute. > This changed the userpace API because it started refusing writes > terminated by a line feed, which broke the update tools we already have. > """ I hope it wasn't reverted. It did fix a critical bug. The problem is that it only fixed a critical bug, but didn't go far enough to keep the bug fix from breaking API. I see this as two bugs being fixed. Even though the second bug was "caused" by the first fix. the first fix was still necessary. The second bug was relying on broken code. This hasn't changed my position on that patch from being backported. I would not even mark this as a regression. I would say the original code was broken too much, and fixing part of it just showed revealed another broken part. > > >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=b1999fa6e8145305a6c8bda30ea20783717708e6 > > > >I would consider unlocking a mutex that one didn't lock a critical bug, > >so yes. > > > >Again, things that deal with locking or buffer overflows, I would take > >the fix, as those are critical. But other behavior issues where it's > >not critical, I would leave be unless told further by someone else. > > This too, was reverted: > > """ > It causes run-time breakage in the 4.4-stable tree and more patches are > needed to be applied first before this one in order to resolve the > issue. > """ It wasn't reverted in mainline. Looks like there was some subtle issues with the different stable versions. Perhaps the "fixes" was wrong. > > This is how fun it is reviewing AUTOSEL commits :) > > Even the small "trivial", "obviously correct" patches have room for > errors for various reasons. And that's fine. Any code written can have bugs in it. That's just a given. Which pushes for why we should be extremely picky about what we backport. > > Also note that all of these patches were tagged for stable and actually > ended up in at least one tree. > > This is why I'm basing a lot of my decision making on the rejection rate. > If the AUTOSEL process does the job well enough as the "regular" > process did before, why push it back? Because I think we are adding too many patches to stable. And automating it may just make things worse. Your examples above back my argument more than they refute it. If people can't determine what is "obviously correct" how is automation going to do any better? -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 18:26 ` Steven Rostedt @ 2018-04-16 18:30 ` Linus Torvalds 2018-04-16 18:41 ` Steven Rostedt 2018-04-16 18:35 ` Sasha Levin 1 sibling, 1 reply; 113+ messages in thread From: Linus Torvalds @ 2018-04-16 18:30 UTC (permalink / raw) To: Steven Rostedt Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, Apr 16, 2018 at 11:26 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > > The problem is that it only fixed a critical bug, but didn't go far > enough to keep the bug fix from breaking API. An API breakage that gets noticed *is* a crtitical bug. You can't call something else critical and then say "but it broken API". Seriously. Why do I even have to mention this? If you break user workflows, NOTHING ELSE MATTERS. Even security is secondary to "people don't use the end result, because it doesn't work for them any more". Really. Stop with this idiotic "only API". Breaking user space is just about the only thing that really matters. The rest is "small matter of implementation". Linus ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 18:30 ` Linus Torvalds @ 2018-04-16 18:41 ` Steven Rostedt 2018-04-16 18:52 ` Linus Torvalds 0 siblings, 1 reply; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 18:41 UTC (permalink / raw) To: Linus Torvalds Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, 16 Apr 2018 11:30:06 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Apr 16, 2018 at 11:26 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > The problem is that it only fixed a critical bug, but didn't go far > > enough to keep the bug fix from breaking API. > > An API breakage that gets noticed *is* a crtitical bug. I totally agree with you. You misunderstood what I wrote. I said there were two bugs here. The first bug was a possible accessing bad memory bug. That needed to be fixed. The problem was by fixing that, it broke API. But that's because the original code was broken where it relied on broken code to get right. I never said the second bug fix should not have been backported. I even said that the first bug "didn't go far enough". I hope the answer was not to revert the bug and put back the possible bad memory access in to keep API. But it was to backport the second bug fix that still has the first fix, but fixes the API breakage. Yes, an API breakage is something I would label as critical to be backported. -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 18:41 ` Steven Rostedt @ 2018-04-16 18:52 ` Linus Torvalds 2018-04-16 19:00 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 113+ messages in thread From: Linus Torvalds @ 2018-04-16 18:52 UTC (permalink / raw) To: Steven Rostedt Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, Apr 16, 2018 at 11:41 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > >I never said the second > bug fix should not have been backported. I even said that the first bug > "didn't go far enough". You're still not getting it. The "didn't go far enough" means that the bug fix is *BUGGY*. It needs to be reverted. > I hope the answer was not to revert the bug and put back the possible > bad memory access in to keep API. But that very must *IS* the answer. If there isn't a fix for the ABI breakage, then the first bugfix needs to be reverted. Really. There is no such thing as "but the fix was more important than the bug it introduced". This is why we started with the whole "actively revert things that introduce regressions". Because people always kept claiming that "but but I fixed a worse bug, and it's better to fix the worse bug even if it then introduces another problem, because the other problem is lesser". NO. We're better off making *no* progress, than making "unsteady progress". Really. Seriously. If you cannot fix a bug without introducing another one, don't do it. Don't do kernel development. The whole mentality you show is NOT ACCEPTABLE. So the *only* answer is: "fix the bug _and_ keep the API". There is no other choice. The whole "I fixed one problem but introduced another" is not how we work. You should damn well know that. There are no excuses. And yes, sometimes that means jumping through hoops. But that's what it takes to keep users happy. Linus ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 18:52 ` Linus Torvalds @ 2018-04-16 19:00 ` Linus Torvalds 2018-04-16 19:30 ` Steven Rostedt 2018-04-16 19:19 ` Linus Torvalds 2018-04-16 19:24 ` Steven Rostedt 2 siblings, 1 reply; 113+ messages in thread From: Linus Torvalds @ 2018-04-16 19:00 UTC (permalink / raw) To: Steven Rostedt Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, Apr 16, 2018 at 11:52 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > We're better off making *no* progress, than making "unsteady progress". > > Really. Seriously. Side note: the original impetus for this was our suspend/resume mess. It went on for *YEARS*, and it was absolutely chock-full of exactly this "I fixed the worse problem, and introduced another one". There's a reason I'm a hardliner on the regression issue. We've been there, we've done that. The whole "two steps forwards, one step back" mentality may work really well if you're doing line dancing. BUT WE ARE NOT LINE DANCING. We do kernel development. Absolutely NOTHING else is more important than the "no regressions" rule. NOTHING. And just since everybody always tries to weasel about this: the only regressions that matter are the ones that people notice in real loads. So if you write a test-case that tests that "system call number 345 returns -ENOSYS", and we add a new system call, and you say "hey, you regressed my system call test", that's not a regression. That's just a "change in behavior". It becomes a regression only if there are people using tools or workflows that actually depend on it. So if it turns out (for example) that Firefox had some really odd bug, and the intent was to do system call 123, but a typo had caused it to do system call 345 instead, and another bug meant that the end result worked fine as long as system call 345 returned ENOSYS, then the addition of that system call actually does turn into a regression. See? Even adding a system call can be a regression, because what matters is not behavior per se, but users _depending_ on some specific behavior. Linus ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 19:00 ` Linus Torvalds @ 2018-04-16 19:30 ` Steven Rostedt 0 siblings, 0 replies; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 19:30 UTC (permalink / raw) To: Linus Torvalds Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, 16 Apr 2018 12:00:08 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Mon, Apr 16, 2018 at 11:52 AM, Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > > > > > We're better off making *no* progress, than making "unsteady progress". > > > > > > Really. Seriously. [ me inserted: ] > On Mon, 16 Apr 2018 3:24:29 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > > I'm talking about the given example of a simple memory bug that caused > > a very subtle breakage of API, which had another trivial fix that > > should be backported. I'm not sure that's what you were talking about. > > Side note: the original impetus for this was our suspend/resume mess. > It went on for *YEARS*, and it was absolutely chock-full of exactly > this "I fixed the worse problem, and introduced another one". What you are talking about here isn't what I was talking about above ;-) -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 18:52 ` Linus Torvalds 2018-04-16 19:00 ` Linus Torvalds @ 2018-04-16 19:19 ` Linus Torvalds 2018-04-16 19:24 ` Steven Rostedt 2 siblings, 0 replies; 113+ messages in thread From: Linus Torvalds @ 2018-04-16 19:19 UTC (permalink / raw) To: Steven Rostedt Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, Apr 16, 2018 at 11:52 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > And yes, sometimes that means jumping through hoops. But that's what > it takes to keep users happy. The example of "jumping through hoops" I tend to give is the pipe "packet mode". The kernel actually has a magic pipe mode for "packet buffers", so that if you do two small writes, the other side of the pipe can actually say "I want to read not the pipe buffers, but the individual packets". Why would we do that? That's not how pipes work! If you want to send and receive messages, use a socket, for chrissake! A pipe is just a stream of bytes - as the elder Gods of Unix decreed! But it turns out that we added the notion of a packetized pipe writer, and you can actually access it in user space by setting the O_DIRECT flag (so after you do the "pipe()" system call, do a fcntl(SETFL, O_DIRECT) on it). Absolutely nobody uses it, afaik, because you'd be crazy to do it. What would be the point? sockets work, and are portable. So why do we have it? We have it for one single user: autofs. The way you talk to the kernel side of things is with a magic pipe that you get at mount time, and the user-space autofs daemon might be 32-bit even if the kernel is 64-bit, and we had a horrible ABI mistake which meant that sending the binary data over that pipe had different format for a 32-bit autofsd. And systemd and automount had different workarounds (or lack there-of), for the ABI issue. So the whole "ok, allow people to send packets, and read them as packets" came about purely because the ABI was broken, and there was no other way to fix things. See 64f371bc3107 ("autofs: make the autofsv5 packet file descriptor use a packetized pipe") for (some) of the sad details. That commit kind of makes it sound like it's a nice solution that just takes advantage of the packetized pipes. Nice and clean fix, right? No. The packetized pipes exist in the first place _purely_ to make that nice solution possible. It's literally just a "this allows us to be ABI compatible with two different users that were confused about the compatibility issue we had due to a broken binary structure format acrss x86-32 and x86-64". See commit 9883035ae7ed ("pipes: add a "packetized pipe" mode for writing") for the other side of that. All this just because _we_ made a mistake in our ABI, and then real life users started using that mistake, including one user that literally *knew* about the mistake and worked around it and started depending on the fact t hat our compatibility mode was buggy because of it. So it was a bug in our ABI. But since people depended on the bug, the bug was a feature, and needed to be kept around. In this case by adding a totally new and unrelated feature, and using that new feature to make those old users happy. The whole "set packetized mode on the autofs pipe" is all done transparently inside the kernel, and automount never knew or needed to care that we started to use a packetized pipe to get the data it sent us in the chunks it intended. Linus ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 18:52 ` Linus Torvalds 2018-04-16 19:00 ` Linus Torvalds 2018-04-16 19:19 ` Linus Torvalds @ 2018-04-16 19:24 ` Steven Rostedt 2018-04-16 19:28 ` Linus Torvalds 2 siblings, 1 reply; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 19:24 UTC (permalink / raw) To: Linus Torvalds Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, 16 Apr 2018 11:52:48 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Apr 16, 2018 at 11:41 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > > > >I never said the second > > bug fix should not have been backported. I even said that the first bug > > "didn't go far enough". > > You're still not getting it. > > The "didn't go far enough" means that the bug fix is *BUGGY*. It needs > to be reverted. It wasn't reverted. Look at the code in question. Commit d63c7dd5bcb +++ b/drivers/scsi/ipr.c @@ -4003,13 +4003,12 @@ static ssize_t ipr_store_update_fw(struct device *dev, struct ipr_sglist *sglist; char fname[100]; char *src; - int len, result, dnld_size; + int result, dnld_size; if (!capable(CAP_SYS_ADMIN)) return -EACCES; - len = snprintf(fname, 99, "%s", buf); - fname[len-1] = '\0'; + snprintf(fname, sizeof(fname), "%s", buf); if (request_firmware(&fw_entry, fname, &ioa_cfg->pdev->dev)) { dev_err(&ioa_cfg->pdev->dev, "Firmware file %s not found\n", fname); The bug is that len returned by snprintf() can be much larger than 100. That fname[len-1] = '\0' can allow a user to decide where to write zeros. That patch never got reverted in mainline. It was fixed with this: Commit 21b81716c6bf --- a/drivers/scsi/ipr.c +++ b/drivers/scsi/ipr.c @@ -4002,6 +4002,7 @@ static ssize_t ipr_store_update_fw(struct device *dev, struct ipr_sglist *sglist; char fname[100]; char *src; + char *endline; int result, dnld_size; if (!capable(CAP_SYS_ADMIN)) @@ -4009,6 +4010,10 @@ static ssize_t ipr_store_update_fw(struct device *dev, snprintf(fname, sizeof(fname), "%s", buf); + endline = strchr(fname, '\n'); + if (endline) + *endline = '\0'; + if (request_firmware(&fw_entry, fname, &ioa_cfg->pdev->dev)) { dev_err(&ioa_cfg->pdev->dev, "Firmware file %s not found\n", fname); return -EIO; > > > I hope the answer was not to revert the bug and put back the possible > > bad memory access in to keep API. > > But that very must *IS* the answer. If there isn't a fix for the ABI > breakage, then the first bugfix needs to be reverted. It wasn't reverted and that was my point. It just wasn't a complete fix. And I'm saying that once the API breakage became apparent, the second fix should have been backported as well. I'm not saying that we should allow API breakage to fix a critical bug. I'm saying that the API breakage was really a secondary bug that needed to be addressed. My point is the first fix was NOT reverted! > > Really. There is no such thing as "but the fix was more important than > the bug it introduced". I'm not saying that. > > This is why we started with the whole "actively revert things that > introduce regressions". Because people always kept claiming that "but > but I fixed a worse bug, and it's better to fix the worse bug even if > it then introduces another problem, because the other problem is > lesser". > > NO. Right, but the fix to the API was also trivial. I don't understand why you are arguing with me. I agree with you. I'm talking about this specific instance. Where a bug was fixed, and the API breakage was another fix that needed to be backported. Are you saying if code could allow userspace to write zeros anywhere in memory, that we should keep it to allow API compatibility? > > We're better off making *no* progress, than making "unsteady progress". > > Really. Seriously. > > If you cannot fix a bug without introducing another one, don't do it. > Don't do kernel development. Um, I think that's impossible. As the example shows. Not many people would have caught the original fix would caused another bug. That requirement would pretty much keep everyone from ever doing any kernel development. > > The whole mentality you show is NOT ACCEPTABLE. > > So the *only* answer is: "fix the bug _and_ keep the API". There is > no other choice. I agree. But that that wasn't the question. > > The whole "I fixed one problem but introduced another" is not how we > work. You should damn well know that. There are no excuses. > > And yes, sometimes that means jumping through hoops. But that's what > it takes to keep users happy. I'm talking about the given example of a simple memory bug that caused a very subtle breakage of API, which had another trivial fix that should be backported. I'm not sure that's what you were talking about. -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 19:24 ` Steven Rostedt @ 2018-04-16 19:28 ` Linus Torvalds 2018-04-16 19:31 ` Linus Torvalds 2018-04-16 19:38 ` Steven Rostedt 0 siblings, 2 replies; 113+ messages in thread From: Linus Torvalds @ 2018-04-16 19:28 UTC (permalink / raw) To: Steven Rostedt Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, Apr 16, 2018 at 12:24 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > Right, but the fix to the API was also trivial. I don't understand why > you are arguing with me. I agree with you. I'm talking about this > specific instance. Where a bug was fixed, and the API breakage was > another fix that needed to be backported. Fair enough. Were you there when the report of breakage came in? Because *my* argument is that reverting something that causes problems is simply *never* the wrong answer. If you know of the fix, fine. But clearly people DID NOT KNOW. So reverting was the right choice. Linus ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 19:28 ` Linus Torvalds @ 2018-04-16 19:31 ` Linus Torvalds 2018-04-16 19:58 ` Steven Rostedt 2018-04-16 19:38 ` Steven Rostedt 1 sibling, 1 reply; 113+ messages in thread From: Linus Torvalds @ 2018-04-16 19:31 UTC (permalink / raw) To: Steven Rostedt Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, Apr 16, 2018 at 12:28 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > If you know of the fix, fine. But clearly people DID NOT KNOW. So > reverting was the right choice. .. and this is obviously different in stable and in mainline. For example, I start reverting very aggressively only at the end of a release. If I get a bisected bug report in the last week, I generally revert without much argument, unless the author of the patch has an immediate fix. In contrast, during the merge window and the early rc's, I'm perfectly happy to say "ok, let's see if somebody can fix this" and not really consider a revert. But the -stable tree? Seriously, what do you expect them to do if they get a report that a commit they added to the stable tree regresses? "Revert first, ask questions later" is definitely a very sane model there. Linus ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 19:31 ` Linus Torvalds @ 2018-04-16 19:58 ` Steven Rostedt 0 siblings, 0 replies; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 19:58 UTC (permalink / raw) To: Linus Torvalds Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, 16 Apr 2018 12:31:09 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > > But the -stable tree? > > Seriously, what do you expect them to do if they get a report that a > commit they added to the stable tree regresses? > > "Revert first, ask questions later" is definitely a very sane model there. The topic of our discussion is on what to backport, and how likely is it to cause regressions. I'm arguing that the bar for backporting should be raised, and that only "critical" fixes should be backported. Sasha pointed this bug fix as an example, and asked me if I would backport it under my conditions. I said yes. He then said "it was reverted", pointing me to the commit that fixed it. That confused me. When I looked further, I noticed that it wasn't reverted, and since he pointed me to the API fix, I said "I hope it wasn't reverted" meaning I hope they backported the obvious API fix and didn't just revert the original fix. -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 19:28 ` Linus Torvalds 2018-04-16 19:31 ` Linus Torvalds @ 2018-04-16 19:38 ` Steven Rostedt 2018-04-16 19:55 ` Linus Torvalds 1 sibling, 1 reply; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 19:38 UTC (permalink / raw) To: Linus Torvalds Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, 16 Apr 2018 12:28:21 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Apr 16, 2018 at 12:24 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > Right, but the fix to the API was also trivial. I don't understand why > > you are arguing with me. I agree with you. I'm talking about this > > specific instance. Where a bug was fixed, and the API breakage was > > another fix that needed to be backported. > > Fair enough. Were you there when the report of breakage came in? No I wasn't. > > Because *my* argument is that reverting something that causes problems > is simply *never* the wrong answer. > > If you know of the fix, fine. But clearly people DID NOT KNOW. So > reverting was the right choice. But I don't see in the git history that this was ever reverted. My reply saying that "I hope it wasn't reverted", was a response for it being reverted in stable, not mainline. Considering that the original bug would allow userspace to write zeros anywhere in memory, I would have definitely worked on finding why the API breakage happened and fixing it properly before putting such a large hole back into the kernel. I'm assuming that may have been what happened because the commit was never reverted in your tree, and if I was responsible for that code, I would be up all night looking for an API fix to make sure the original fix isn't reverted. -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 19:38 ` Steven Rostedt @ 2018-04-16 19:55 ` Linus Torvalds 2018-04-16 20:02 ` Steven Rostedt 0 siblings, 1 reply; 113+ messages in thread From: Linus Torvalds @ 2018-04-16 19:55 UTC (permalink / raw) To: Steven Rostedt Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, Apr 16, 2018 at 12:38 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > But I don't see in the git history that this was ever reverted. My reply > saying that "I hope it wasn't reverted", was a response for it being > reverted in stable, not mainline. See my other email. If your'e stable maintainer, and you get a report of a commit that causes problems, your first reaction probably really should just be "revert it". You can always re-apply it later, but a patch that causes problems is absolutely very much suspect, and automatically should make any stable maintainer go "that needs much more analysis". Sure, hopefully automation finds the fix too (ie commit 21b81716c6bf "ipr: Fix regression when loading firmware") in mainline. It did have the proper "fixes" tag, so it should hopefully have been easy to find by the automation that stable people use. But at the same time, I still maintain that "just revert it" is rather likely the right solution for stable. If it had a bug once, maybe it shouldn't have been applied in the first place. The author can then get notified by the other stable automation, and at that point argue for "yeah, it was buggy, but together with this other fix it's really important". But even when that is the case, I really don't see that the author should complain about it being reverted. Because it's *such* a no-brainer in stable. Linus ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 19:55 ` Linus Torvalds @ 2018-04-16 20:02 ` Steven Rostedt 2018-04-16 20:17 ` Linus Torvalds 0 siblings, 1 reply; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 20:02 UTC (permalink / raw) To: Linus Torvalds Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, 16 Apr 2018 12:55:46 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Apr 16, 2018 at 12:38 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > But I don't see in the git history that this was ever reverted. My reply > > saying that "I hope it wasn't reverted", was a response for it being > > reverted in stable, not mainline. > > See my other email. Already replied. > > If your'e stable maintainer, and you get a report of a commit that > causes problems, your first reaction probably really should just be > "revert it". > > You can always re-apply it later, but a patch that causes problems is > absolutely very much suspect, and automatically should make any stable > maintainer go "that needs much more analysis". > > Sure, hopefully automation finds the fix too (ie commit 21b81716c6bf > "ipr: Fix regression when loading firmware") in mainline. > > It did have the proper "fixes" tag, so it should hopefully have been > easy to find by the automation that stable people use. > > But at the same time, I still maintain that "just revert it" is > rather likely the right solution for stable. If it had a bug once, > maybe it shouldn't have been applied in the first place. > > The author can then get notified by the other stable automation, and > at that point argue for "yeah, it was buggy, but together with this > other fix it's really important". > > But even when that is the case, I really don't see that the author > should complain about it being reverted. Because it's *such* a > no-brainer in stable. But this is going way off topic to what we were discussing. The discussion is about what gets backported. Is automating the process going to make stable better? Or is it likely to add more regressions. Sasha's response has been that his automated process has the same rate of regressions as what gets tagged by authors. My argument is that perhaps authors should tag less to stable. -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 20:02 ` Steven Rostedt @ 2018-04-16 20:17 ` Linus Torvalds 2018-04-16 20:33 ` Jiri Kosina 2018-04-16 21:27 ` Steven Rostedt 0 siblings, 2 replies; 113+ messages in thread From: Linus Torvalds @ 2018-04-16 20:17 UTC (permalink / raw) To: Steven Rostedt Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, Apr 16, 2018 at 1:02 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > But this is going way off topic to what we were discussing. The > discussion is about what gets backported. Is automating the process > going to make stable better? Or is it likely to add more regressions. > > Sasha's response has been that his automated process has the same rate > of regressions as what gets tagged by authors. My argument is that > perhaps authors should tag less to stable. The ones who should matter most for that discussion is the distros, since they are the actual users of stable (as well as the people doing the work, of course - ie Sasha and Greg and the rest of the stable gang). And I suspect that they actually do want all the noise, and all the stuff that isn't "critical". That's often the _easy_ choice. It's the stuff that I suspect the stable maintainers go "this I don't even have to think about", because it's a new driver ID or something. Because the bulk of stable tends to be driver updates, afaik. Which distros very much tend to want. Will developers think that their patches matter so much that they should go to stable? Yes they will. Will they overtag as a result? Probably. But the reverse likely also happens, where people simply don't think about stable at all, and just want to fix a bug. In many ways "Fixes" is likely a better thing to check for in stable backports, but that doesn't always exist either. And just judging by the amount of stable email I get - and by how excited _I_ would be about stable work, I think "automated process" is simply not an option. It's a _requirement_. You'd go completely crazy if you didn't automate 99% of all the stable work. So can you trust the "Cc: stable" as being perfect? Hell no. But what's your alternative? Manually selecting things for stable? Asking the developers separately? Because "criticality" definitely isn't what determines it. If it was, we'd never add driver ID's etc to stable - they're clearly not "critical". Yet it feels like that's sometimes those driver things are the _bulk_ of it, and it is usually fairly safe (not quite as obviously safe as you'd think, because a driver ID addition has occasionally meant not just "now it's supported", but instead "now the generic driver doesn't trigger for it any more", so it can actually break things). So I think - and _hope_ - that 99% of stable should be the non-critical stuff that people don't even need to think about. The critical stuff is hopefully a tiny tiny percentage. Linus ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 20:17 ` Linus Torvalds @ 2018-04-16 20:33 ` Jiri Kosina 2018-04-16 21:27 ` Steven Rostedt 1 sibling, 0 replies; 113+ messages in thread From: Jiri Kosina @ 2018-04-16 20:33 UTC (permalink / raw) To: Linus Torvalds Cc: Steven Rostedt, Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, 16 Apr 2018, Linus Torvalds wrote: > The ones who should matter most for that discussion is the distros, > since they are the actual users of stable (as well as the people doing > the work, of course - ie Sasha and Greg and the rest of the stable > gang). > > And I suspect that they actually do want all the noise, and all the > stuff that isn't "critical". That's often the _easy_ choice. It's the > stuff that I suspect the stable maintainers go "this I don't even have > to think about", because it's a new driver ID or something. So I am a maintainer of SUSE enterprise kernel, and I can tell you I personally really don't want all the noise, simply because (a) noone asked us to distribute it (if they did, we would do so) (b) the risk of regressions We've always been very cautious about what is coming from stable, and actually filtering out patches we actively don't want for one reason or another. And yes, there is also a history of regressions caused by stable updates that were painful for us ... I brought this up a multiple times at ksummit-discuss@ over past years. So with the upcoming release, we've actually abandonded stable and are relying more on auto-processing the Fixes: tag. That is not perfect in both ways (it doesn't cover everything, and we can miss complex semantical dependencies between patches even though they "apply"), but we didn't base our decision this time on aligning our schedule with stable, and so far we don't seem to be suffering. And we have much better overview / control over what is landing in our enterprise tree (of course this all is shepherded by machinery around processing Fixes: tag, which then though has to be *actively* acted upon, depending on a case-by-case human assessment of how critical it actually is). > Because the bulk of stable tends to be driver updates, afaik. Which > distros very much tend to want. For "community" distros (like Fedora, openSUSE), perhaps, yeah. For "enterprise" kernels, quite frankly, we much rather get the driver updates/backports from the respective HW vedndors we're cooperating with, as they have actually tested and verified the backport on the metal. > The critical stuff is hopefully a tiny tiny percentage. But quite frankly, that's the only thing we as distro *really* want -- to save our users from hitting the critical issues with all the consequences (data loss, unbootable systems, etc). All the rest we can easily handle on a reactive basis, which heavily depends on the userbase spectrum (and that's probably completely different for each -stable tree consumer anyway). This is a POV of me as an distro kernel maintainer, but mileage of others definitely can / will vary of course. Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 20:17 ` Linus Torvalds 2018-04-16 20:33 ` Jiri Kosina @ 2018-04-16 21:27 ` Steven Rostedt 1 sibling, 0 replies; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 21:27 UTC (permalink / raw) To: Linus Torvalds Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, 16 Apr 2018 13:17:24 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Apr 16, 2018 at 1:02 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > But this is going way off topic to what we were discussing. The > > discussion is about what gets backported. Is automating the process > > going to make stable better? Or is it likely to add more regressions. > > > > Sasha's response has been that his automated process has the same rate > > of regressions as what gets tagged by authors. My argument is that > > perhaps authors should tag less to stable. > > The ones who should matter most for that discussion is the distros, > since they are the actual users of stable (as well as the people doing > the work, of course - ie Sasha and Greg and the rest of the stable > gang). That was actually my final conclusion before we started out discussion ;-) http://lkml.kernel.org/r/20180416143510.79ba5c63@gandalf.local.home > > And I suspect that they actually do want all the noise, and all the > stuff that isn't "critical". That's often the _easy_ choice. It's the > stuff that I suspect the stable maintainers go "this I don't even have > to think about", because it's a new driver ID or something. Although Red Hat doesn't base off of the stable kernel. At least it didn't when I was there. They may look at the stable kernel, but they make their own decisions. If we want the distros to use stable as the base, it should be the least common factor among them. Otherwise, if stable includes commits that a distro would rather not backport, then they wont use stable. > > Because the bulk of stable tends to be driver updates, afaik. Which > distros very much tend to want. > > Will developers think that their patches matter so much that they > should go to stable? Yes they will. Will they overtag as a result? > Probably. But the reverse likely also happens, where people simply > don't think about stable at all, and just want to fix a bug. > > In many ways "Fixes" is likely a better thing to check for in stable > backports, but that doesn't always exist either. > > And just judging by the amount of stable email I get - and by how > excited _I_ would be about stable work, I think "automated process" is > simply not an option. It's a _requirement_. You'd go completely crazy > if you didn't automate 99% of all the stable work. > > So can you trust the "Cc: stable" as being perfect? Hell no. But > what's your alternative? Manually selecting things for stable? Asking > the developers separately? > > Because "criticality" definitely isn't what determines it. If it was, > we'd never add driver ID's etc to stable - they're clearly not > "critical". True. But I believe the driver ID's was given the "exception". > > Yet it feels like that's sometimes those driver things are the _bulk_ > of it, and it is usually fairly safe (not quite as obviously safe as > you'd think, because a driver ID addition has occasionally meant not > just "now it's supported", but instead "now the generic driver doesn't > trigger for it any more", so it can actually break things). > > So I think - and _hope_ - that 99% of stable should be the > non-critical stuff that people don't even need to think about. > > The critical stuff is hopefully a tiny tiny percentage. Well, I'm not sure that's really the case. $ git log --oneline v4.14.33..v4.14.34 | head -20 ffebeb0d7c37 Linux 4.14.34 fdae5b620566 net/mlx4_core: Fix memory leak while delete slave's resources 9fdeb33e1913 vhost_net: add missing lock nesting notation 8c316b625705 team: move dev_mc_sync after master_upper_dev_link in team_port_add 233ba28e1862 route: check sysctl_fib_multipath_use_neigh earlier than hash 2f8aa659d4c0 vhost: validate log when IOTLB is enabled 72b880f43990 net/mlx5e: Fix traffic being dropped on VF representor 9408bceb0649 net/mlx4_en: Fix mixed PFC and Global pause user control requests 477c73abf26a strparser: Fix sign of err codes 1c71bfe84deb net/sched: fix NULL dereference on the error path of tcf_skbmod_init() a19024a3f343 net/sched: fix NULL dereference in the error path of tunnel_key_init() e096c8bf4fb8 net/mlx5e: Sync netdev vxlan ports at open baab1f0c4885 net/mlx5e: Don't override vport admin link state in switchdev mode 1ec7966ab7db ipv6: sr: fix seg6 encap performances with TSO enabled e52a45bb392f nfp: use full 40 bits of the NSP buffer address ddf79878f1e0 net/mlx5e: Fix memory usage issues in offloading TC flows 9282181c1cc5 net/mlx5e: Avoid using the ipv6 stub in the TC offload neigh update path b9c6ddda3805 vti6: better validate user provided tunnel names 109dce20c6ed ip6_tunnel: better validate user provided tunnel names 72363c63b070 ip6_gre: better validate user provided tunnel names The majority of those appear to be on the critical side. -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 18:26 ` Steven Rostedt 2018-04-16 18:30 ` Linus Torvalds @ 2018-04-16 18:35 ` Sasha Levin 2018-04-16 18:57 ` Steven Rostedt 1 sibling, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-16 18:35 UTC (permalink / raw) To: Steven Rostedt Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, Apr 16, 2018 at 02:26:53PM -0400, Steven Rostedt wrote: >On Mon, 16 Apr 2018 17:42:38 +0000 >Sasha Levin <Alexander.Levin@microsoft.com> wrote: >> Also note that all of these patches were tagged for stable and actually >> ended up in at least one tree. >> >> This is why I'm basing a lot of my decision making on the rejection rate. >> If the AUTOSEL process does the job well enough as the "regular" >> process did before, why push it back? > >Because I think we are adding too many patches to stable. And >automating it may just make things worse. Your examples above back my >argument more than they refute it. If people can't determine what is >"obviously correct" how is automation going to do any better? I don't understand that statament, it sounds illogical to me. If I were to tell you that I have a crack team of 10 kernel hackers who dig through all mainline commits to find commits that should be backported to stable, and they do it with less mistakes than authors/maintainers make when they tag their own commits, would I get the same level of objection? On the correctness side, I have another effort to improve the quality of testing -stable commits get, but this is somewhat unrelated to the whole automatic selection process. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 18:35 ` Sasha Levin @ 2018-04-16 18:57 ` Steven Rostedt 0 siblings, 0 replies; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 18:57 UTC (permalink / raw) To: Sasha Levin Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH On Mon, 16 Apr 2018 18:35:44 +0000 Sasha Levin <Alexander.Levin@microsoft.com> wrote: > If I were to tell you that I have a crack team of 10 kernel hackers who > dig through all mainline commits to find commits that should be > backported to stable, and they do it with less mistakes than > authors/maintainers make when they tag their own commits, would I get the > same level of objection? Probably ;-) I've been struggling with my own stable tags, and been thinking that I too suffer from tagging too much for stable, because there's code I fix, and think "hmm, this could have some unwanted side effects". I'm actually worried that my own fixes could cause an API breakage that I'm unaware of. What I'm staying is, I think we should start looking at fixes that fix bugs we consider critical. Those being: * off-by-one * memory overflow * locking mismatch * API regressions For my sub-system * wrong data coming out Which can be a critical issue. Wrong data is worse than no data. But then, there's the times a bug will produce no data, and considering what it is, and how much of an effort it takes to fix it, I may or may not label "no data" issues for stable. The cases where I enable something with a bunch of parameters, and because of some mishandling of the parameter it just screws up totally (where it's obvious that it screwed up), I only mark those for stable if it doesn't require a redesign of the code to fix it. There's been some cases where a redesign was required, and I didn't mark it for stable. The fixes for tracing that I don't usually tag for stable is when doing complex tracing simply doesn't work and produces no data or errors incorrectly. Depending on how complex the fix is, I mark it for stable, otherwise, I think the fix is more likely to break something else that is more common, then this hardly ever used feature. The fact that nobody noticed, or hasn't complained about it usually plays a lot in that decision. If someone complained to me about breakage, I'm more likely to label it for stable. But if I discover it myself, as I probably use the tracing system differently than others as I wrote the code, then I don't usually mark it. -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 15:18 ` Linus Torvalds 2018-04-16 15:30 ` Pavel Machek @ 2018-04-16 15:36 ` Steven Rostedt 2018-04-16 16:02 ` Sasha Levin 2018-04-16 15:39 ` Sasha Levin 2 siblings, 1 reply; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 15:36 UTC (permalink / raw) To: Linus Torvalds Cc: Sasha Levin, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Mon, 16 Apr 2018 08:18:09 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > I wonder if the "AUTOSEL" patches should at least have an "ack-by" from > > someone before they are pulled in. Otherwise there may be some subtle > > issues that can find their way into stable releases. > > I don't know about anybody else, but I get so many of the patch-bot > patches for stable etc that I will *not* reply to normal cases. Only > if there's some issue with a patch will I reply. > > I probably do get more than most, but still - requiring active > participation for the steady flow of normal stable patches is almost > pointless. > > Just look at the subject line of this thread. The numbers are so big > that you almost need exponential notation for them. > I'm worried about just backporting patches that nobody actually looked at. Is someone going through and vetting that these should definitely be added to stable. I would like to have some trusted human (doesn't even need to be the author or maintainer of the patch) to look at all the patches before they are applied. I would say anything more than a trivial patch would require author or sub maintainer ack. Look at this patch, I don't think it should go to stable, even though it does fix issues. But the fix is for systems already having issues, and this keeps printk from making things worse. The fix has side effects that other commits have addressed, and if this patch gets backported, those other ones must too. Maybe I was too strong by saying all patches should be acked, but anything more than buffer overflows and off by one errors probably require a bit more vetting by a human than to just pull in all patches that a bot flags to be backported. -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 15:36 ` Steven Rostedt @ 2018-04-16 16:02 ` Sasha Levin 2018-04-16 16:10 ` Pavel Machek 2018-04-16 16:12 ` Steven Rostedt 0 siblings, 2 replies; 113+ messages in thread From: Sasha Levin @ 2018-04-16 16:02 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Mon, Apr 16, 2018 at 11:36:29AM -0400, Steven Rostedt wrote: >On Mon, 16 Apr 2018 08:18:09 -0700 >Linus Torvalds <torvalds@linux-foundation.org> wrote: > >> On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote: >> > >> > I wonder if the "AUTOSEL" patches should at least have an "ack-by" from >> > someone before they are pulled in. Otherwise there may be some subtle >> > issues that can find their way into stable releases. >> >> I don't know about anybody else, but I get so many of the patch-bot >> patches for stable etc that I will *not* reply to normal cases. Only >> if there's some issue with a patch will I reply. >> >> I probably do get more than most, but still - requiring active >> participation for the steady flow of normal stable patches is almost >> pointless. >> >> Just look at the subject line of this thread. The numbers are so big >> that you almost need exponential notation for them. >> > >I'm worried about just backporting patches that nobody actually looked >at. Is someone going through and vetting that these should definitely >be added to stable. I would like to have some trusted human (doesn't >even need to be the author or maintainer of the patch) to look at all >the patches before they are applied. I do go through every single commit sent this way and review it. Sometimes things slip by, but it's not a fully automatic process. Let's look at this patch as a concrete example: the only reason, according to the stable rules, that it shouldn't go in -stable is that it's longer than 100 lines. Otherwise, it fixes a bug, it doesn't introduce any new features, it's upstream, and so on. It had some fixes that went upstream as well? Great, let's grab those as well. >I would say anything more than a trivial patch would require author or >sub maintainer ack. Look at this patch, I don't think it should go to >stable, even though it does fix issues. But the fix is for systems >already having issues, and this keeps printk from making things worse. >The fix has side effects that other commits have addressed, and if this >patch gets backported, those other ones must too. Sure, let's get those patches in as well. One of the things Greg is pushing strongly for is "bug compatibility": we want the kernel to behave the same way between mainline and stable. If the code is broken, it should be broken in the same way. If anything, after this discussion I'd recommend that we take this patch and it's follow-up fixes... >Maybe I was too strong by saying all patches should be acked, but >anything more than buffer overflows and off by one errors probably >require a bit more vetting by a human than to just pull in all patches >that a bot flags to be backported. If anyone wants to give me a hand with going through these I'd be more than happy to. I know that Ben Hutchings is looking at the ones that land in 4.4 carefully. It's always good to have more than 1 set of eyes! ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:02 ` Sasha Levin @ 2018-04-16 16:10 ` Pavel Machek 2018-04-16 16:12 ` Steven Rostedt 1 sibling, 0 replies; 113+ messages in thread From: Pavel Machek @ 2018-04-16 16:10 UTC (permalink / raw) To: Sasha Levin Cc: Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo [-- Attachment #1: Type: text/plain, Size: 2854 bytes --] On Mon 2018-04-16 16:02:03, Sasha Levin wrote: > On Mon, Apr 16, 2018 at 11:36:29AM -0400, Steven Rostedt wrote: > >On Mon, 16 Apr 2018 08:18:09 -0700 > >Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > >> On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > >> > > >> > I wonder if the "AUTOSEL" patches should at least have an "ack-by" from > >> > someone before they are pulled in. Otherwise there may be some subtle > >> > issues that can find their way into stable releases. > >> > >> I don't know about anybody else, but I get so many of the patch-bot > >> patches for stable etc that I will *not* reply to normal cases. Only > >> if there's some issue with a patch will I reply. > >> > >> I probably do get more than most, but still - requiring active > >> participation for the steady flow of normal stable patches is almost > >> pointless. > >> > >> Just look at the subject line of this thread. The numbers are so big > >> that you almost need exponential notation for them. > >> > > > >I'm worried about just backporting patches that nobody actually looked > >at. Is someone going through and vetting that these should definitely > >be added to stable. I would like to have some trusted human (doesn't > >even need to be the author or maintainer of the patch) to look at all > >the patches before they are applied. > > I do go through every single commit sent this way and review it. > Sometimes things slip by, but it's not a fully automatic process. > > Let's look at this patch as a concrete example: the only reason, > according to the stable rules, that it shouldn't go in -stable is that > it's longer than 100 lines. > > Otherwise, it fixes a bug, it doesn't introduce any new features, it's > upstream, and so on. It had some fixes that went upstream as well? > Great, let's grab those as well. > > >I would say anything more than a trivial patch would require author or > >sub maintainer ack. Look at this patch, I don't think it should go to > >stable, even though it does fix issues. But the fix is for systems > >already having issues, and this keeps printk from making things worse. > >The fix has side effects that other commits have addressed, and if this > >patch gets backported, those other ones must too. > > Sure, let's get those patches in as well. > > One of the things Greg is pushing strongly for is "bug compatibility": > we want the kernel to behave the same way between mainline and stable. > If the code is broken, it should be broken in the same way. Maybe Greg should be Cced on this conversation? Anyway, I don't think "bug compatibility" is a good goal. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:02 ` Sasha Levin 2018-04-16 16:10 ` Pavel Machek @ 2018-04-16 16:12 ` Steven Rostedt 2018-04-16 16:19 ` Sasha Levin 1 sibling, 1 reply; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 16:12 UTC (permalink / raw) To: Sasha Levin Cc: Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Mon, 16 Apr 2018 16:02:03 +0000 Sasha Levin <Alexander.Levin@microsoft.com> wrote: > One of the things Greg is pushing strongly for is "bug compatibility": > we want the kernel to behave the same way between mainline and stable. > If the code is broken, it should be broken in the same way. Wait! What does that mean? What's the purpose of stable if it is as broken as mainline? -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:12 ` Steven Rostedt @ 2018-04-16 16:19 ` Sasha Levin 2018-04-16 16:30 ` Steven Rostedt 2018-04-19 11:41 ` Thomas Backlund 0 siblings, 2 replies; 113+ messages in thread From: Sasha Levin @ 2018-04-16 16:19 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: >On Mon, 16 Apr 2018 16:02:03 +0000 >Sasha Levin <Alexander.Levin@microsoft.com> wrote: > >> One of the things Greg is pushing strongly for is "bug compatibility": >> we want the kernel to behave the same way between mainline and stable. >> If the code is broken, it should be broken in the same way. > >Wait! What does that mean? What's the purpose of stable if it is as >broken as mainline? This just means that if there is a fix that went in mainline, and the fix is broken somehow, we'd rather take the broken fix than not. In this scenario, *something* will be broken, it's just a matter of what. We'd rather have the same thing broken between mainline and stable. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:19 ` Sasha Levin @ 2018-04-16 16:30 ` Steven Rostedt 2018-04-16 16:37 ` Sasha Levin 2018-04-19 11:41 ` Thomas Backlund 1 sibling, 1 reply; 113+ messages in thread From: Steven Rostedt @ 2018-04-16 16:30 UTC (permalink / raw) To: Sasha Levin Cc: Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Mon, 16 Apr 2018 16:19:14 +0000 Sasha Levin <Alexander.Levin@microsoft.com> wrote: > >Wait! What does that mean? What's the purpose of stable if it is as > >broken as mainline? > > This just means that if there is a fix that went in mainline, and the > fix is broken somehow, we'd rather take the broken fix than not. > > In this scenario, *something* will be broken, it's just a matter of > what. We'd rather have the same thing broken between mainline and > stable. Honestly, I think that removes all value of the stable series. I remember when the stable series were first created. People were saying that it wouldn't even get to more than 5 versions, because the bar for backporting was suppose to be very high. Today it's just a fork of the kernel at a given version. No more features, but we will be OK with regressions. I'm struggling to see what the benefit of it is suppose to be? -- Steve ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:30 ` Steven Rostedt @ 2018-04-16 16:37 ` Sasha Levin 2018-04-16 17:06 ` Pavel Machek 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-16 16:37 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote: >On Mon, 16 Apr 2018 16:19:14 +0000 >Sasha Levin <Alexander.Levin@microsoft.com> wrote: > >> >Wait! What does that mean? What's the purpose of stable if it is as >> >broken as mainline? >> >> This just means that if there is a fix that went in mainline, and the >> fix is broken somehow, we'd rather take the broken fix than not. >> >> In this scenario, *something* will be broken, it's just a matter of >> what. We'd rather have the same thing broken between mainline and >> stable. > >Honestly, I think that removes all value of the stable series. I >remember when the stable series were first created. People were saying >that it wouldn't even get to more than 5 versions, because the bar for >backporting was suppose to be very high. Today it's just a fork of the >kernel at a given version. No more features, but we will be OK with >regressions. I'm struggling to see what the benefit of it is suppose to >be? It's not "OK with regressions". Let's look at a hypothetical example: You have a 4.15.1 kernel that has a broken printf() behaviour so that when you: pr_err("%d", 5) Would print: "Microsoft Rulez" Bad, right? So you went ahead and fixed it, and now it prints "5" as you might expect. But alas, with your patch, running: pr_err("%s", "hi!") Would show a cat picture for 5 seconds. Should we take your patch in -stable or not? If we don't, we're stuck with the original issue while the mainline kernel will behave differently, but if we do - we introduce a new regression. So it's not the case that a -stable kernel will have *more* regression, it'll just have similar ones to mainline. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:37 ` Sasha Levin @ 2018-04-16 17:06 ` Pavel Machek 2018-04-16 17:23 ` Sasha Levin 0 siblings, 1 reply; 113+ messages in thread From: Pavel Machek @ 2018-04-16 17:06 UTC (permalink / raw) To: Sasha Levin Cc: Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo [-- Attachment #1: Type: text/plain, Size: 2044 bytes --] On Mon 2018-04-16 16:37:56, Sasha Levin wrote: > On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote: > >On Mon, 16 Apr 2018 16:19:14 +0000 > >Sasha Levin <Alexander.Levin@microsoft.com> wrote: > > > >> >Wait! What does that mean? What's the purpose of stable if it is as > >> >broken as mainline? > >> > >> This just means that if there is a fix that went in mainline, and the > >> fix is broken somehow, we'd rather take the broken fix than not. > >> > >> In this scenario, *something* will be broken, it's just a matter of > >> what. We'd rather have the same thing broken between mainline and > >> stable. > > > >Honestly, I think that removes all value of the stable series. I > >remember when the stable series were first created. People were saying > >that it wouldn't even get to more than 5 versions, because the bar for > >backporting was suppose to be very high. Today it's just a fork of the > >kernel at a given version. No more features, but we will be OK with > >regressions. I'm struggling to see what the benefit of it is suppose to > >be? > > It's not "OK with regressions". > > Let's look at a hypothetical example: You have a 4.15.1 kernel that has > a broken printf() behaviour so that when you: > > pr_err("%d", 5) > > Would print: > > "Microsoft Rulez" > > Bad, right? So you went ahead and fixed it, and now it prints "5" as you > might expect. But alas, with your patch, running: > > pr_err("%s", "hi!") > > Would show a cat picture for 5 seconds. > > Should we take your patch in -stable or not? If we don't, we're stuck > with the original issue while the mainline kernel will behave > differently, but if we do - we introduce a new regression. Of course not. - It must be obviously correct and tested. If it introduces new bug, it is not correct, and certainly not obviously correct. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 17:06 ` Pavel Machek @ 2018-04-16 17:23 ` Sasha Levin 2018-04-17 11:41 ` Jan Kara 2018-05-03 9:32 ` Pavel Machek 0 siblings, 2 replies; 113+ messages in thread From: Sasha Levin @ 2018-04-16 17:23 UTC (permalink / raw) To: Pavel Machek Cc: Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote: >On Mon 2018-04-16 16:37:56, Sasha Levin wrote: >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote: >> >On Mon, 16 Apr 2018 16:19:14 +0000 >> >Sasha Levin <Alexander.Levin@microsoft.com> wrote: >> > >> >> >Wait! What does that mean? What's the purpose of stable if it is as >> >> >broken as mainline? >> >> >> >> This just means that if there is a fix that went in mainline, and the >> >> fix is broken somehow, we'd rather take the broken fix than not. >> >> >> >> In this scenario, *something* will be broken, it's just a matter of >> >> what. We'd rather have the same thing broken between mainline and >> >> stable. >> > >> >Honestly, I think that removes all value of the stable series. I >> >remember when the stable series were first created. People were saying >> >that it wouldn't even get to more than 5 versions, because the bar for >> >backporting was suppose to be very high. Today it's just a fork of the >> >kernel at a given version. No more features, but we will be OK with >> >regressions. I'm struggling to see what the benefit of it is suppose to >> >be? >> >> It's not "OK with regressions". >> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has >> a broken printf() behaviour so that when you: >> >> pr_err("%d", 5) >> >> Would print: >> >> "Microsoft Rulez" >> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you >> might expect. But alas, with your patch, running: >> >> pr_err("%s", "hi!") >> >> Would show a cat picture for 5 seconds. >> >> Should we take your patch in -stable or not? If we don't, we're stuck >> with the original issue while the mainline kernel will behave >> differently, but if we do - we introduce a new regression. > >Of course not. > >- It must be obviously correct and tested. > >If it introduces new bug, it is not correct, and certainly not >obviously correct. As you might have noticed, we don't strictly follow the rules. Take a look at the whole PTI story as an example. It's way more than 100 lines, it's not obviously corrent, it fixed more than 1 thing, and so on, and yet it went in -stable! Would you argue we shouldn't have backported PTI to -stable? ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 17:23 ` Sasha Levin @ 2018-04-17 11:41 ` Jan Kara 2018-04-17 13:31 ` Sasha Levin 2018-05-03 9:32 ` Pavel Machek 1 sibling, 1 reply; 113+ messages in thread From: Jan Kara @ 2018-04-17 11:41 UTC (permalink / raw) To: Sasha Levin Cc: Pavel Machek, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Mon 16-04-18 17:23:30, Sasha Levin wrote: > On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote: > >On Mon 2018-04-16 16:37:56, Sasha Levin wrote: > >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote: > >> >On Mon, 16 Apr 2018 16:19:14 +0000 > >> >Sasha Levin <Alexander.Levin@microsoft.com> wrote: > >> > > >> >> >Wait! What does that mean? What's the purpose of stable if it is as > >> >> >broken as mainline? > >> >> > >> >> This just means that if there is a fix that went in mainline, and the > >> >> fix is broken somehow, we'd rather take the broken fix than not. > >> >> > >> >> In this scenario, *something* will be broken, it's just a matter of > >> >> what. We'd rather have the same thing broken between mainline and > >> >> stable. > >> > > >> >Honestly, I think that removes all value of the stable series. I > >> >remember when the stable series were first created. People were saying > >> >that it wouldn't even get to more than 5 versions, because the bar for > >> >backporting was suppose to be very high. Today it's just a fork of the > >> >kernel at a given version. No more features, but we will be OK with > >> >regressions. I'm struggling to see what the benefit of it is suppose to > >> >be? > >> > >> It's not "OK with regressions". > >> > >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has > >> a broken printf() behaviour so that when you: > >> > >> pr_err("%d", 5) > >> > >> Would print: > >> > >> "Microsoft Rulez" > >> > >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you > >> might expect. But alas, with your patch, running: > >> > >> pr_err("%s", "hi!") > >> > >> Would show a cat picture for 5 seconds. > >> > >> Should we take your patch in -stable or not? If we don't, we're stuck > >> with the original issue while the mainline kernel will behave > >> differently, but if we do - we introduce a new regression. > > > >Of course not. > > > >- It must be obviously correct and tested. > > > >If it introduces new bug, it is not correct, and certainly not > >obviously correct. > > As you might have noticed, we don't strictly follow the rules. > > Take a look at the whole PTI story as an example. It's way more than 100 > lines, it's not obviously corrent, it fixed more than 1 thing, and so > on, and yet it went in -stable! > > Would you argue we shouldn't have backported PTI to -stable? So I agree with that being backported. But I think this nicely demostrates a point some people are trying to make in this thread. We do take fixes with high risk or regression if they fix serious enough issue. Also we do take fixes to non-serious stuff (such as addition of device ID) if the chances of regression are really low. So IMHO the metric for including the fix is not solely "how annoying to user this can be" but rather something like: score = (how annoying the bug is) * ((1 / (chance of regression due to including this)) - 1)^3 (constants are somewhat arbitrary subject to tuning ;). Now both 'annoying' and 'regression chance' parts are subjective and sometimes difficult to estimate so don't take the formula too seriously but it demonstrates the point. I think we all agree we want to fix annoying stuff and we don't want regressions. But you need to somehow weight this over your expected userbase - and this is where your argument "but someone might be annoyed by LEDs not working so let's include it" has problems - it should rather be "is the annoyance of non-working leds over expected user base high enough to risk a regression due to this patch for someone in the expected user base"? The answer to this second question is not clear at all to a casual reviewer and that's why we IMHO have CC stable tag as maintainer is supposed to have at least a bit better clue. Another point I wanted to make is that if chance a patch causes a regression is about 2% as you said somewhere else in a thread, then by adding 20 patches that "may fix a bug that is annoying for someone" you've just increased a chance there's a regression in the release by 34%. And this is not just a math game, this also roughly matches a real experience with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such regression chance? And I also note that for a regression to get reported so that it gets included into your 2% estimate of a patch regression rate, someone must be bothered enough by it to triage it and send an email somewhere so that already falls into a category of "serious" stuff to me. So these are the reasons why I think that merging tons of patches into stable isn't actually very good. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 11:41 ` Jan Kara @ 2018-04-17 13:31 ` Sasha Levin 2018-04-17 15:55 ` Jan Kara 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-17 13:31 UTC (permalink / raw) To: Jan Kara Cc: Pavel Machek, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue, Apr 17, 2018 at 01:41:44PM +0200, Jan Kara wrote: >On Mon 16-04-18 17:23:30, Sasha Levin wrote: >> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote: >> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote: >> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote: >> >> >On Mon, 16 Apr 2018 16:19:14 +0000 >> >> >Sasha Levin <Alexander.Levin@microsoft.com> wrote: >> >> > >> >> >> >Wait! What does that mean? What's the purpose of stable if it is as >> >> >> >broken as mainline? >> >> >> >> >> >> This just means that if there is a fix that went in mainline, and the >> >> >> fix is broken somehow, we'd rather take the broken fix than not. >> >> >> >> >> >> In this scenario, *something* will be broken, it's just a matter of >> >> >> what. We'd rather have the same thing broken between mainline and >> >> >> stable. >> >> > >> >> >Honestly, I think that removes all value of the stable series. I >> >> >remember when the stable series were first created. People were saying >> >> >that it wouldn't even get to more than 5 versions, because the bar for >> >> >backporting was suppose to be very high. Today it's just a fork of the >> >> >kernel at a given version. No more features, but we will be OK with >> >> >regressions. I'm struggling to see what the benefit of it is suppose to >> >> >be? >> >> >> >> It's not "OK with regressions". >> >> >> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has >> >> a broken printf() behaviour so that when you: >> >> >> >> pr_err("%d", 5) >> >> >> >> Would print: >> >> >> >> "Microsoft Rulez" >> >> >> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you >> >> might expect. But alas, with your patch, running: >> >> >> >> pr_err("%s", "hi!") >> >> >> >> Would show a cat picture for 5 seconds. >> >> >> >> Should we take your patch in -stable or not? If we don't, we're stuck >> >> with the original issue while the mainline kernel will behave >> >> differently, but if we do - we introduce a new regression. >> > >> >Of course not. >> > >> >- It must be obviously correct and tested. >> > >> >If it introduces new bug, it is not correct, and certainly not >> >obviously correct. >> >> As you might have noticed, we don't strictly follow the rules. >> >> Take a look at the whole PTI story as an example. It's way more than 100 >> lines, it's not obviously corrent, it fixed more than 1 thing, and so >> on, and yet it went in -stable! >> >> Would you argue we shouldn't have backported PTI to -stable? > >So I agree with that being backported. But I think this nicely demostrates >a point some people are trying to make in this thread. We do take fixes >with high risk or regression if they fix serious enough issue. Also we do >take fixes to non-serious stuff (such as addition of device ID) if the >chances of regression are really low. > >So IMHO the metric for including the fix is not solely "how annoying to >user this can be" but rather something like: > >score = (how annoying the bug is) * ((1 / (chance of regression due to > including this)) - 1)^3 > >(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying' >and 'regression chance' parts are subjective and sometimes difficult to >estimate so don't take the formula too seriously but it demonstrates the >point. I think we all agree we want to fix annoying stuff and we don't want >regressions. But you need to somehow weight this over your expected >userbase - and this is where your argument "but someone might be annoyed by >LEDs not working so let's include it" has problems - it should rather be >"is the annoyance of non-working leds over expected user base high enough >to risk a regression due to this patch for someone in the expected user >base"? The answer to this second question is not clear at all to a casual >reviewer and that's why we IMHO have CC stable tag as maintainer is >supposed to have at least a bit better clue. We may be able to guesstimate the 'regression chance', but there's no way we can guess the 'annoyance' once. There are so many different use cases that we just can't even guess how many people would get "annoyed" by something. Even regression chance is tricky, look at the commits I've linked earlier in the thread. Even the most trivial looking commits that end up in stable have a chance for regression. >Another point I wanted to make is that if chance a patch causes a >regression is about 2% as you said somewhere else in a thread, then by >adding 20 patches that "may fix a bug that is annoying for someone" you've >just increased a chance there's a regression in the release by 34%. And So I've said that the rejection rate is less than 2%. This includes all commits that I have proposed for -stable, but didn't end up being included in -stable. This includes commits that the author/maintainers NACKed, commits that didn't do anything on older kernels, commits that were buggy but were caught before the kernel was released, commits that failed to build on an arch I didn't test it on originally and so on. After thousands of merged AUTOSEL patches I can count the number of times a commit has caused a regression and had to be removed on one hand. >this is not just a math game, this also roughly matches a real experience >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such >regression chance? And I also note that for a regression to get reported so >that it gets included into your 2% estimate of a patch regression rate, >someone must be bothered enough by it to triage it and send an email >somewhere so that already falls into a category of "serious" stuff to me. It is indeed a numbers game, but the regression rate isn't 2%, it's closer to 0.05%. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 13:31 ` Sasha Levin @ 2018-04-17 15:55 ` Jan Kara 2018-04-17 16:19 ` Sasha Levin 0 siblings, 1 reply; 113+ messages in thread From: Jan Kara @ 2018-04-17 15:55 UTC (permalink / raw) To: Sasha Levin Cc: Jan Kara, Pavel Machek, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue 17-04-18 13:31:51, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 01:41:44PM +0200, Jan Kara wrote: > >On Mon 16-04-18 17:23:30, Sasha Levin wrote: > >> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote: > >> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote: > >> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote: > >> >> >On Mon, 16 Apr 2018 16:19:14 +0000 > >> >> >Sasha Levin <Alexander.Levin@microsoft.com> wrote: > >> >> > > >> >> >> >Wait! What does that mean? What's the purpose of stable if it is as > >> >> >> >broken as mainline? > >> >> >> > >> >> >> This just means that if there is a fix that went in mainline, and the > >> >> >> fix is broken somehow, we'd rather take the broken fix than not. > >> >> >> > >> >> >> In this scenario, *something* will be broken, it's just a matter of > >> >> >> what. We'd rather have the same thing broken between mainline and > >> >> >> stable. > >> >> > > >> >> >Honestly, I think that removes all value of the stable series. I > >> >> >remember when the stable series were first created. People were saying > >> >> >that it wouldn't even get to more than 5 versions, because the bar for > >> >> >backporting was suppose to be very high. Today it's just a fork of the > >> >> >kernel at a given version. No more features, but we will be OK with > >> >> >regressions. I'm struggling to see what the benefit of it is suppose to > >> >> >be? > >> >> > >> >> It's not "OK with regressions". > >> >> > >> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has > >> >> a broken printf() behaviour so that when you: > >> >> > >> >> pr_err("%d", 5) > >> >> > >> >> Would print: > >> >> > >> >> "Microsoft Rulez" > >> >> > >> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you > >> >> might expect. But alas, with your patch, running: > >> >> > >> >> pr_err("%s", "hi!") > >> >> > >> >> Would show a cat picture for 5 seconds. > >> >> > >> >> Should we take your patch in -stable or not? If we don't, we're stuck > >> >> with the original issue while the mainline kernel will behave > >> >> differently, but if we do - we introduce a new regression. > >> > > >> >Of course not. > >> > > >> >- It must be obviously correct and tested. > >> > > >> >If it introduces new bug, it is not correct, and certainly not > >> >obviously correct. > >> > >> As you might have noticed, we don't strictly follow the rules. > >> > >> Take a look at the whole PTI story as an example. It's way more than 100 > >> lines, it's not obviously corrent, it fixed more than 1 thing, and so > >> on, and yet it went in -stable! > >> > >> Would you argue we shouldn't have backported PTI to -stable? > > > >So I agree with that being backported. But I think this nicely demostrates > >a point some people are trying to make in this thread. We do take fixes > >with high risk or regression if they fix serious enough issue. Also we do > >take fixes to non-serious stuff (such as addition of device ID) if the > >chances of regression are really low. > > > >So IMHO the metric for including the fix is not solely "how annoying to > >user this can be" but rather something like: > > > >score = (how annoying the bug is) * ((1 / (chance of regression due to > > including this)) - 1)^3 > > > >(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying' > >and 'regression chance' parts are subjective and sometimes difficult to > >estimate so don't take the formula too seriously but it demonstrates the > >point. I think we all agree we want to fix annoying stuff and we don't want > >regressions. But you need to somehow weight this over your expected > >userbase - and this is where your argument "but someone might be annoyed by > >LEDs not working so let's include it" has problems - it should rather be > >"is the annoyance of non-working leds over expected user base high enough > >to risk a regression due to this patch for someone in the expected user > >base"? The answer to this second question is not clear at all to a casual > >reviewer and that's why we IMHO have CC stable tag as maintainer is > >supposed to have at least a bit better clue. > > We may be able to guesstimate the 'regression chance', but there's no > way we can guess the 'annoyance' once. There are so many different use > cases that we just can't even guess how many people would get "annoyed" > by something. As a maintainer, I hope I have reasonable idea what are common use cases for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't know all of the use cases so people doing unusual stuff hit more bugs and have to report them to get fixes included in -stable. But for me this is a preferable tradeoff over the risk of regression so this is the rule I use when tagging for stable. Now I'm not a -stable maintainer and I fully agree with "those who do the work decide" principle so pick whatever patches you think are appropriate, I just wanted explain why I don't think more patches in stable are necessarily good. > Even regression chance is tricky, look at the commits I've linked > earlier in the thread. Even the most trivial looking commits that end up > in stable have a chance for regression. Sure, you can never be certain and I think people (including me) underestimate the chance of regressions for "trivial" patches. But you just estimate a chance, you may be lucky, you may not... > >Another point I wanted to make is that if chance a patch causes a > >regression is about 2% as you said somewhere else in a thread, then by > >adding 20 patches that "may fix a bug that is annoying for someone" you've > >just increased a chance there's a regression in the release by 34%. And > > So I've said that the rejection rate is less than 2%. This includes > all commits that I have proposed for -stable, but didn't end up being > included in -stable. > > This includes commits that the author/maintainers NACKed, commits that > didn't do anything on older kernels, commits that were buggy but were > caught before the kernel was released, commits that failed to build on > an arch I didn't test it on originally and so on. > > After thousands of merged AUTOSEL patches I can count the number of > times a commit has caused a regression and had to be removed on one > hand. > > >this is not just a math game, this also roughly matches a real experience > >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such > >regression chance? And I also note that for a regression to get reported so > >that it gets included into your 2% estimate of a patch regression rate, > >someone must be bothered enough by it to triage it and send an email > >somewhere so that already falls into a category of "serious" stuff to me. > > It is indeed a numbers game, but the regression rate isn't 2%, it's > closer to 0.05%. Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14 stable tree suggests some 13 commits were reverted from stable due to bugs. That's some 0.4% and that doesn't count fixes that were applied to fix other regressions. But the actual numbers don't really matter that much, in principle the more patches you add the higher is the chance of regression. You can't change that so you better have a good reason to include a patch... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 15:55 ` Jan Kara @ 2018-04-17 16:19 ` Sasha Levin 2018-04-17 17:57 ` Jan Kara 2018-05-03 9:36 ` Pavel Machek 0 siblings, 2 replies; 113+ messages in thread From: Sasha Levin @ 2018-04-17 16:19 UTC (permalink / raw) To: Jan Kara Cc: Pavel Machek, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote: >On Tue 17-04-18 13:31:51, Sasha Levin wrote: >> We may be able to guesstimate the 'regression chance', but there's no >> way we can guess the 'annoyance' once. There are so many different use >> cases that we just can't even guess how many people would get "annoyed" >> by something. > >As a maintainer, I hope I have reasonable idea what are common use cases >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't >know all of the use cases so people doing unusual stuff hit more bugs and >have to report them to get fixes included in -stable. But for me this is a >preferable tradeoff over the risk of regression so this is the rule I use >when tagging for stable. Now I'm not a -stable maintainer and I fully agree >with "those who do the work decide" principle so pick whatever patches you >think are appropriate, I just wanted explain why I don't think more patches >in stable are necessarily good. The AUTOSEL story is different for subsystems that don't do -stable, and subsystems that are actually doing the work (like yourself). I'm not trying to override active maintainers, I'm trying to help them make decisions. The AUTOSEL bot will attempt to apply any patch it deems as -stable for on all -stable branches, finding possible dependencies, build them, and run any tests that you might deem necessary. You would be able to start your analysis without "wasting" time on doing a bunch of "manual labor". There's a big difference between subsystems like yours and most of the rest of the kernel. >> Even regression chance is tricky, look at the commits I've linked >> earlier in the thread. Even the most trivial looking commits that end up >> in stable have a chance for regression. > >Sure, you can never be certain and I think people (including me) >underestimate the chance of regressions for "trivial" patches. But you just >estimate a chance, you may be lucky, you may not... > >> >Another point I wanted to make is that if chance a patch causes a >> >regression is about 2% as you said somewhere else in a thread, then by >> >adding 20 patches that "may fix a bug that is annoying for someone" you've >> >just increased a chance there's a regression in the release by 34%. And >> >> So I've said that the rejection rate is less than 2%. This includes >> all commits that I have proposed for -stable, but didn't end up being >> included in -stable. >> >> This includes commits that the author/maintainers NACKed, commits that >> didn't do anything on older kernels, commits that were buggy but were >> caught before the kernel was released, commits that failed to build on >> an arch I didn't test it on originally and so on. >> >> After thousands of merged AUTOSEL patches I can count the number of >> times a commit has caused a regression and had to be removed on one >> hand. >> >> >this is not just a math game, this also roughly matches a real experience >> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such >> >regression chance? And I also note that for a regression to get reported so >> >that it gets included into your 2% estimate of a patch regression rate, >> >someone must be bothered enough by it to triage it and send an email >> >somewhere so that already falls into a category of "serious" stuff to me. >> >> It is indeed a numbers game, but the regression rate isn't 2%, it's >> closer to 0.05%. > >Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14 >stable tree suggests some 13 commits were reverted from stable due to bugs. >That's some 0.4% and that doesn't count fixes that were applied to >fix other regressions. 0.05% is for commits that were merged in stable but later fixed or reverted because they introduced a regression. By grepping for reverts you also include things such as: - Reverts of commits that were in the corresponding mainline tree - Reverts of commits that didn't introduce regressions >But the actual numbers don't really matter that much, in principle the more >patches you add the higher is the chance of regression. You can't change >that so you better have a good reason to include a patch... You increase the chance of regressions, but you also increase the chance of fixing bugs that affect users. My claim is that the chance to fix bugs increases far more significantly than the chance to introduce regressions. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 16:19 ` Sasha Levin @ 2018-04-17 17:57 ` Jan Kara 2018-04-17 18:28 ` Sasha Levin 2018-05-03 9:36 ` Pavel Machek 1 sibling, 1 reply; 113+ messages in thread From: Jan Kara @ 2018-04-17 17:57 UTC (permalink / raw) To: Sasha Levin Cc: Jan Kara, Pavel Machek, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue 17-04-18 16:19:35, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote: > >> Even regression chance is tricky, look at the commits I've linked > >> earlier in the thread. Even the most trivial looking commits that end up > >> in stable have a chance for regression. > > > >Sure, you can never be certain and I think people (including me) > >underestimate the chance of regressions for "trivial" patches. But you just > >estimate a chance, you may be lucky, you may not... > > > >> >Another point I wanted to make is that if chance a patch causes a > >> >regression is about 2% as you said somewhere else in a thread, then by > >> >adding 20 patches that "may fix a bug that is annoying for someone" you've > >> >just increased a chance there's a regression in the release by 34%. And > >> > >> So I've said that the rejection rate is less than 2%. This includes > >> all commits that I have proposed for -stable, but didn't end up being > >> included in -stable. > >> > >> This includes commits that the author/maintainers NACKed, commits that > >> didn't do anything on older kernels, commits that were buggy but were > >> caught before the kernel was released, commits that failed to build on > >> an arch I didn't test it on originally and so on. > >> > >> After thousands of merged AUTOSEL patches I can count the number of > >> times a commit has caused a regression and had to be removed on one > >> hand. > >> > >> >this is not just a math game, this also roughly matches a real experience > >> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such > >> >regression chance? And I also note that for a regression to get reported so > >> >that it gets included into your 2% estimate of a patch regression rate, > >> >someone must be bothered enough by it to triage it and send an email > >> >somewhere so that already falls into a category of "serious" stuff to me. > >> > >> It is indeed a numbers game, but the regression rate isn't 2%, it's > >> closer to 0.05%. > > > >Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14 > >stable tree suggests some 13 commits were reverted from stable due to bugs. > >That's some 0.4% and that doesn't count fixes that were applied to > >fix other regressions. > > 0.05% is for commits that were merged in stable but later fixed or > reverted because they introduced a regression. By grepping for reverts > you also include things such as: > > - Reverts of commits that were in the corresponding mainline tree > - Reverts of commits that didn't introduce regressions Actually I was careful enough to include only commits that got merged as part of the stable process into 4.14.x but got later reverted in 4.14.y. That's where the 0.4% number came from. So I believe all of those cases (13 in absolute numbers) were user visible regressions during the stable process. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 17:57 ` Jan Kara @ 2018-04-17 18:28 ` Sasha Levin 0 siblings, 0 replies; 113+ messages in thread From: Sasha Levin @ 2018-04-17 18:28 UTC (permalink / raw) To: Jan Kara Cc: Pavel Machek, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Tue, Apr 17, 2018 at 07:57:54PM +0200, Jan Kara wrote: >Actually I was careful enough to include only commits that got merged as >part of the stable process into 4.14.x but got later reverted in 4.14.y. >That's where the 0.4% number came from. So I believe all of those cases >(13 in absolute numbers) were user visible regressions during the stable >process. I looked at them, and there are 2 things in play here: - Quite a few of those reverts are because of the PTI work. I'm not sure how we treat it, but yes - it skews statistics here. - 2 of them were reverts for device tree changes for a device that didn't exist in 4.14, and shouldn't have had any user visible changes. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-17 16:19 ` Sasha Levin 2018-04-17 17:57 ` Jan Kara @ 2018-05-03 9:36 ` Pavel Machek 2018-05-03 13:28 ` Sasha Levin 1 sibling, 1 reply; 113+ messages in thread From: Pavel Machek @ 2018-05-03 9:36 UTC (permalink / raw) To: Sasha Levin, jacek.anaszewski, Rafael J. Wysocki Cc: Jan Kara, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo [-- Attachment #1: Type: text/plain, Size: 1579 bytes --] On Tue 2018-04-17 16:19:35, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote: > >On Tue 17-04-18 13:31:51, Sasha Levin wrote: > >> We may be able to guesstimate the 'regression chance', but there's no > >> way we can guess the 'annoyance' once. There are so many different use > >> cases that we just can't even guess how many people would get "annoyed" > >> by something. > > > >As a maintainer, I hope I have reasonable idea what are common use cases > >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't > >know all of the use cases so people doing unusual stuff hit more bugs and > >have to report them to get fixes included in -stable. But for me this is a > >preferable tradeoff over the risk of regression so this is the rule I use > >when tagging for stable. Now I'm not a -stable maintainer and I fully agree > >with "those who do the work decide" principle so pick whatever patches you > >think are appropriate, I just wanted explain why I don't think more patches > >in stable are necessarily good. > > The AUTOSEL story is different for subsystems that don't do -stable, and > subsystems that are actually doing the work (like yourself). > > I'm not trying to override active maintainers, I'm trying to help them > make decisions. Ok, cool. Can you exclude LED subsystem, Hibernation and Nokia N900 stuff from autosel work? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-05-03 9:36 ` Pavel Machek @ 2018-05-03 13:28 ` Sasha Levin 0 siblings, 0 replies; 113+ messages in thread From: Sasha Levin @ 2018-05-03 13:28 UTC (permalink / raw) To: Pavel Machek Cc: jacek.anaszewski@gmail.com, Rafael J. Wysocki, Jan Kara, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Thu, May 03, 2018 at 11:36:51AM +0200, Pavel Machek wrote: >On Tue 2018-04-17 16:19:35, Sasha Levin wrote: >> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote: >> >On Tue 17-04-18 13:31:51, Sasha Levin wrote: >> >> We may be able to guesstimate the 'regression chance', but there's no >> >> way we can guess the 'annoyance' once. There are so many different use >> >> cases that we just can't even guess how many people would get "annoyed" >> >> by something. >> > >> >As a maintainer, I hope I have reasonable idea what are common use cases >> >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't >> >know all of the use cases so people doing unusual stuff hit more bugs and >> >have to report them to get fixes included in -stable. But for me this is a >> >preferable tradeoff over the risk of regression so this is the rule I use >> >when tagging for stable. Now I'm not a -stable maintainer and I fully agree >> >with "those who do the work decide" principle so pick whatever patches you >> >think are appropriate, I just wanted explain why I don't think more patches >> >in stable are necessarily good. >> >> The AUTOSEL story is different for subsystems that don't do -stable, and >> subsystems that are actually doing the work (like yourself). >> >> I'm not trying to override active maintainers, I'm trying to help them >> make decisions. > >Ok, cool. Can you exclude LED subsystem, Hibernation and Nokia N900 >stuff from autosel work? Curiousity got me, and I had to see what these subsystems do as far as stable commits: $ git log --oneline --grep 'stable@vger' --since="01-01-2016" kernel/power drivers/leds drivers/media/i2c/et8ek8 drivers/media/i2c/ad5820.c arch/x86/kernel/acpi/ | wc -l 7 Which got me a bit surprised: maybe indeed leds is mostly fine, but hibernation is definitely tricky, I've been stung by it a few times... So why not pick something an actual user reported, and see how that was dealt with? Googling first showed this: https://bugzilla.kernel.org/show_bug.cgi?id=97201 Which was fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bdbc98abb3aa323f6323b11db39c740e6f8fc5b1 But that's not in any -stable tree. Hmm.. ok.. Next one on google was: https://bugzilla.kernel.org/show_bug.cgi?id=117971 Which, in turn, was fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b3f249c94ce1f46bacd9814385b0ee2d1ae52f3 Oh look at that, it's not in -stable either... So seeing how you have concerns with my selection of -stable commits, maybe you could explain to me why these commits didn't end up in -stable? ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 17:23 ` Sasha Levin 2018-04-17 11:41 ` Jan Kara @ 2018-05-03 9:32 ` Pavel Machek 2018-05-03 13:30 ` Sasha Levin 1 sibling, 1 reply; 113+ messages in thread From: Pavel Machek @ 2018-05-03 9:32 UTC (permalink / raw) To: Sasha Levin Cc: Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo [-- Attachment #1: Type: text/plain, Size: 956 bytes --] Hi! > >- It must be obviously correct and tested. > > > >If it introduces new bug, it is not correct, and certainly not > >obviously correct. > > As you might have noticed, we don't strictly follow the rules. Yes, I noticed. And what I'm saying is that perhaps you should follow the rules more strictly. > Take a look at the whole PTI story as an example. It's way more than 100 > lines, it's not obviously corrent, it fixed more than 1 thing, and so > on, and yet it went in -stable! > > Would you argue we shouldn't have backported PTI to -stable? Actually, I was surprised with PTI going to stable. That was clearly against the rules. Maybe the security bug was ugly enough to warrant that. But please don't use it as an argument for applying any random patches... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-05-03 9:32 ` Pavel Machek @ 2018-05-03 13:30 ` Sasha Levin 0 siblings, 0 replies; 113+ messages in thread From: Sasha Levin @ 2018-05-03 13:30 UTC (permalink / raw) To: Pavel Machek Cc: Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo On Thu, May 03, 2018 at 11:32:15AM +0200, Pavel Machek wrote: >Hi! > >> >- It must be obviously correct and tested. >> > >> >If it introduces new bug, it is not correct, and certainly not >> >obviously correct. >> >> As you might have noticed, we don't strictly follow the rules. > >Yes, I noticed. And what I'm saying is that perhaps you should follow >the rules more strictly. Again, this was stated many times by Greg and others, the rules are not there to be strictly followed. >> Take a look at the whole PTI story as an example. It's way more than 100 >> lines, it's not obviously corrent, it fixed more than 1 thing, and so >> on, and yet it went in -stable! >> >> Would you argue we shouldn't have backported PTI to -stable? > >Actually, I was surprised with PTI going to stable. That was clearly >against the rules. Maybe the security bug was ugly enough to warrant >that. > >But please don't use it as an argument for applying any random >patches... How about this: if a -stable maintainer has concerns with how I follow the -stable rules, he's more than welcome to reject my patches. Sounds like a plan? ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 16:19 ` Sasha Levin 2018-04-16 16:30 ` Steven Rostedt @ 2018-04-19 11:41 ` Thomas Backlund 2018-04-19 13:59 ` Greg KH 1 sibling, 1 reply; 113+ messages in thread From: Thomas Backlund @ 2018-04-19 11:41 UTC (permalink / raw) To: Sasha Levin, Steven Rostedt Cc: Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek Den 16-04-2018 kl. 19:19, skrev Sasha Levin: > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: >> On Mon, 16 Apr 2018 16:02:03 +0000 >> Sasha Levin <Alexander.Levin@microsoft.com> wrote: >> >>> One of the things Greg is pushing strongly for is "bug compatibility": >>> we want the kernel to behave the same way between mainline and stable. >>> If the code is broken, it should be broken in the same way. >> >> Wait! What does that mean? What's the purpose of stable if it is as >> broken as mainline? > > This just means that if there is a fix that went in mainline, and the > fix is broken somehow, we'd rather take the broken fix than not. > > In this scenario, *something* will be broken, it's just a matter of > what. We'd rather have the same thing broken between mainline and > stable. > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" _is_ a _regression_ you _really_ _dont_ want in a stable supported distro. Because end-users dont care about upstream breaking stuff... its the distro that takes the heat for that... Something "already broken" is not a regression... As distro maintainer that means one now have to review _every_ patch that carries "AUTOSEL", follow all the mail threads that comes up about it, then track if it landed in -stable queue, and read every response and possible objection to all patches in the -stable queue a second time around... then check if it still got included in final stable point relase and then either revert them in distro kernel or go track down all the follow-up fixes needed... Just to avoid being "bug compatible with master" -- Thomas ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-19 11:41 ` Thomas Backlund @ 2018-04-19 13:59 ` Greg KH 2018-04-19 14:05 ` Jan Kara 2018-04-19 15:04 ` Thomas Backlund 0 siblings, 2 replies; 113+ messages in thread From: Greg KH @ 2018-04-19 13:59 UTC (permalink / raw) To: Thomas Backlund Cc: Sasha Levin, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: > Den 16-04-2018 kl. 19:19, skrev Sasha Levin: > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: > > > On Mon, 16 Apr 2018 16:02:03 +0000 > > > Sasha Levin <Alexander.Levin@microsoft.com> wrote: > > > > > > > One of the things Greg is pushing strongly for is "bug compatibility": > > > > we want the kernel to behave the same way between mainline and stable. > > > > If the code is broken, it should be broken in the same way. > > > > > > Wait! What does that mean? What's the purpose of stable if it is as > > > broken as mainline? > > > > This just means that if there is a fix that went in mainline, and the > > fix is broken somehow, we'd rather take the broken fix than not. > > > > In this scenario, *something* will be broken, it's just a matter of > > what. We'd rather have the same thing broken between mainline and > > stable. > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" > _is_ a _regression_ you _really_ _dont_ want in a stable > supported distro. Because end-users dont care about upstream breaking > stuff... its the distro that takes the heat for that... > > Something "already broken" is not a regression... > > As distro maintainer that means one now have to review _every_ patch that > carries "AUTOSEL", follow all the mail threads that comes up about it, then > track if it landed in -stable queue, and read every response and possible > objection to all patches in the -stable queue a second time around... then > check if it still got included in final stable point relase and then either > revert them in distro kernel or go track down all the follow-up fixes > needed... > > Just to avoid being "bug compatible with master" I've done this "bug compatible" "breakage" more than the AUTOSEL stuff has in the past, so you had better also be reviewing all of my normal commits as well :) Anyway, we are trying not to do this, but it does, and will, occasionally happen. Look, we just did that for one platform for 4.9.94! And the key to all of this is good testing, which we are now doing, and hopefully you are also doing as well. thanks, greg k-h ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-19 13:59 ` Greg KH @ 2018-04-19 14:05 ` Jan Kara 2018-04-19 14:22 ` Greg KH 2018-04-19 15:04 ` Thomas Backlund 1 sibling, 1 reply; 113+ messages in thread From: Jan Kara @ 2018-04-19 14:05 UTC (permalink / raw) To: Greg KH Cc: Thomas Backlund, Sasha Levin, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Thu 19-04-18 15:59:43, Greg KH wrote: > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin: > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: > > > > On Mon, 16 Apr 2018 16:02:03 +0000 > > > > Sasha Levin <Alexander.Levin@microsoft.com> wrote: > > > > > > > > > One of the things Greg is pushing strongly for is "bug compatibility": > > > > > we want the kernel to behave the same way between mainline and stable. > > > > > If the code is broken, it should be broken in the same way. > > > > > > > > Wait! What does that mean? What's the purpose of stable if it is as > > > > broken as mainline? > > > > > > This just means that if there is a fix that went in mainline, and the > > > fix is broken somehow, we'd rather take the broken fix than not. > > > > > > In this scenario, *something* will be broken, it's just a matter of > > > what. We'd rather have the same thing broken between mainline and > > > stable. > > > > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" > > _is_ a _regression_ you _really_ _dont_ want in a stable > > supported distro. Because end-users dont care about upstream breaking > > stuff... its the distro that takes the heat for that... > > > > Something "already broken" is not a regression... > > > > As distro maintainer that means one now have to review _every_ patch that > > carries "AUTOSEL", follow all the mail threads that comes up about it, then > > track if it landed in -stable queue, and read every response and possible > > objection to all patches in the -stable queue a second time around... then > > check if it still got included in final stable point relase and then either > > revert them in distro kernel or go track down all the follow-up fixes > > needed... > > > > Just to avoid being "bug compatible with master" > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff > has in the past, so you had better also be reviewing all of my normal > commits as well :) > > Anyway, we are trying not to do this, but it does, and will, > occasionally happen. Sure, that's understood. So this was just misunderstanding. Sasha's original comment really sounded like "bug compatibility" with current master is desirable and that made me go "Are you serious?" as well... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-19 14:05 ` Jan Kara @ 2018-04-19 14:22 ` Greg KH 2018-04-19 15:16 ` Thomas Backlund 2018-04-19 16:41 ` Greg KH 0 siblings, 2 replies; 113+ messages in thread From: Greg KH @ 2018-04-19 14:22 UTC (permalink / raw) To: Jan Kara Cc: Thomas Backlund, Sasha Levin, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote: > On Thu 19-04-18 15:59:43, Greg KH wrote: > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin: > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: > > > > > On Mon, 16 Apr 2018 16:02:03 +0000 > > > > > Sasha Levin <Alexander.Levin@microsoft.com> wrote: > > > > > > > > > > > One of the things Greg is pushing strongly for is "bug compatibility": > > > > > > we want the kernel to behave the same way between mainline and stable. > > > > > > If the code is broken, it should be broken in the same way. > > > > > > > > > > Wait! What does that mean? What's the purpose of stable if it is as > > > > > broken as mainline? > > > > > > > > This just means that if there is a fix that went in mainline, and the > > > > fix is broken somehow, we'd rather take the broken fix than not. > > > > > > > > In this scenario, *something* will be broken, it's just a matter of > > > > what. We'd rather have the same thing broken between mainline and > > > > stable. > > > > > > > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" > > > _is_ a _regression_ you _really_ _dont_ want in a stable > > > supported distro. Because end-users dont care about upstream breaking > > > stuff... its the distro that takes the heat for that... > > > > > > Something "already broken" is not a regression... > > > > > > As distro maintainer that means one now have to review _every_ patch that > > > carries "AUTOSEL", follow all the mail threads that comes up about it, then > > > track if it landed in -stable queue, and read every response and possible > > > objection to all patches in the -stable queue a second time around... then > > > check if it still got included in final stable point relase and then either > > > revert them in distro kernel or go track down all the follow-up fixes > > > needed... > > > > > > Just to avoid being "bug compatible with master" > > > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff > > has in the past, so you had better also be reviewing all of my normal > > commits as well :) > > > > Anyway, we are trying not to do this, but it does, and will, > > occasionally happen. > > Sure, that's understood. So this was just misunderstanding. Sasha's > original comment really sounded like "bug compatibility" with current > master is desirable and that made me go "Are you serious?" as well... As I said before in this thread, yes, sometimes I do this on purpose. As an specific example, see a recent bluetooth patch that caused a regression on some chromebook devices. The chromeos developers rightfully complainied, and I left the commit in there to provide the needed "leverage" on the upstream developers to fix this properly. Otherwise if I had reverted the stable patch, when people move to a newer kernel version, things break, and no one remembers why. I also wrote a long response as to _why_ I do this, and even though it does happen, why it still is worth taking the stable updates. Please see the archives for the full details. I don't want to duplicate this again here. thanks, greg k-h ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-19 14:22 ` Greg KH @ 2018-04-19 15:16 ` Thomas Backlund 2018-04-19 15:57 ` Greg KH 2018-04-19 16:41 ` Greg KH 1 sibling, 1 reply; 113+ messages in thread From: Thomas Backlund @ 2018-04-19 15:16 UTC (permalink / raw) To: Greg KH, Jan Kara Cc: Thomas Backlund, Sasha Levin, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek Den 19.04.2018 kl. 17:22, skrev Greg KH: > On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote: >> On Thu 19-04-18 15:59:43, Greg KH wrote: >>> On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: >>>> Den 16-04-2018 kl. 19:19, skrev Sasha Levin: >>>>> On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: >>>>>> On Mon, 16 Apr 2018 16:02:03 +0000 >>>>>> Sasha Levin <Alexander.Levin@microsoft.com> wrote: >>>>>> >>>>>>> One of the things Greg is pushing strongly for is "bug compatibility": >>>>>>> we want the kernel to behave the same way between mainline and stable. >>>>>>> If the code is broken, it should be broken in the same way. >>>>>> >>>>>> Wait! What does that mean? What's the purpose of stable if it is as >>>>>> broken as mainline? >>>>> >>>>> This just means that if there is a fix that went in mainline, and the >>>>> fix is broken somehow, we'd rather take the broken fix than not. >>>>> >>>>> In this scenario, *something* will be broken, it's just a matter of >>>>> what. We'd rather have the same thing broken between mainline and >>>>> stable. >>>>> >>>> >>>> Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" >>>> _is_ a _regression_ you _really_ _dont_ want in a stable >>>> supported distro. Because end-users dont care about upstream breaking >>>> stuff... its the distro that takes the heat for that... >>>> >>>> Something "already broken" is not a regression... >>>> >>>> As distro maintainer that means one now have to review _every_ patch that >>>> carries "AUTOSEL", follow all the mail threads that comes up about it, then >>>> track if it landed in -stable queue, and read every response and possible >>>> objection to all patches in the -stable queue a second time around... then >>>> check if it still got included in final stable point relase and then either >>>> revert them in distro kernel or go track down all the follow-up fixes >>>> needed... >>>> >>>> Just to avoid being "bug compatible with master" >>> >>> I've done this "bug compatible" "breakage" more than the AUTOSEL stuff >>> has in the past, so you had better also be reviewing all of my normal >>> commits as well :) >>> >>> Anyway, we are trying not to do this, but it does, and will, >>> occasionally happen. >> >> Sure, that's understood. So this was just misunderstanding. Sasha's >> original comment really sounded like "bug compatibility" with current >> master is desirable and that made me go "Are you serious?" as well... > > As I said before in this thread, yes, sometimes I do this on purpose. > And I guess this is the one that gets people the feeling that "stable is not as stable as it used to be" ... > As an specific example, see a recent bluetooth patch that caused a > regression on some chromebook devices. The chromeos developers > rightfully complainied, and I left the commit in there to provide the > needed "leverage" on the upstream developers to fix this properly. > Otherwise if I had reverted the stable patch, when people move to a > newer kernel version, things break, and no one remembers why. I do understand what you are trying to do... But from my distro hat on I have to treat things differently (and I dont think I'm alone doing it this way...) Known breakages gets reverted even before it hits QA, so endusers wont see the issue at all... So the only ones to see the issue are those building with latest upstream without own patches applied... > > I also wrote a long response as to _why_ I do this, and even though it > does happen, why it still is worth taking the stable updates. Please > see the archives for the full details. I don't want to duplicate this > again here. And we do use latest stable (with some delay as I dont want to overload QA & endusers with a new kernel every week :)) We just revert known broken (or add known fixes) before releasing them to our users -- Thomas ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-19 15:16 ` Thomas Backlund @ 2018-04-19 15:57 ` Greg KH 2018-04-19 16:25 ` Thomas Backlund 0 siblings, 1 reply; 113+ messages in thread From: Greg KH @ 2018-04-19 15:57 UTC (permalink / raw) To: Thomas Backlund Cc: Jan Kara, Sasha Levin, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Thu, Apr 19, 2018 at 06:16:26PM +0300, Thomas Backlund wrote: > Den 19.04.2018 kl. 17:22, skrev Greg KH: > > On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote: > > > On Thu 19-04-18 15:59:43, Greg KH wrote: > > > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: > > > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin: > > > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: > > > > > > > On Mon, 16 Apr 2018 16:02:03 +0000 > > > > > > > Sasha Levin <Alexander.Levin@microsoft.com> wrote: > > > > > > > > > > > > > > > One of the things Greg is pushing strongly for is "bug compatibility": > > > > > > > > we want the kernel to behave the same way between mainline and stable. > > > > > > > > If the code is broken, it should be broken in the same way. > > > > > > > > > > > > > > Wait! What does that mean? What's the purpose of stable if it is as > > > > > > > broken as mainline? > > > > > > > > > > > > This just means that if there is a fix that went in mainline, and the > > > > > > fix is broken somehow, we'd rather take the broken fix than not. > > > > > > > > > > > > In this scenario, *something* will be broken, it's just a matter of > > > > > > what. We'd rather have the same thing broken between mainline and > > > > > > stable. > > > > > > > > > > > > > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" > > > > > _is_ a _regression_ you _really_ _dont_ want in a stable > > > > > supported distro. Because end-users dont care about upstream breaking > > > > > stuff... its the distro that takes the heat for that... > > > > > > > > > > Something "already broken" is not a regression... > > > > > > > > > > As distro maintainer that means one now have to review _every_ patch that > > > > > carries "AUTOSEL", follow all the mail threads that comes up about it, then > > > > > track if it landed in -stable queue, and read every response and possible > > > > > objection to all patches in the -stable queue a second time around... then > > > > > check if it still got included in final stable point relase and then either > > > > > revert them in distro kernel or go track down all the follow-up fixes > > > > > needed... > > > > > > > > > > Just to avoid being "bug compatible with master" > > > > > > > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff > > > > has in the past, so you had better also be reviewing all of my normal > > > > commits as well :) > > > > > > > > Anyway, we are trying not to do this, but it does, and will, > > > > occasionally happen. > > > > > > Sure, that's understood. So this was just misunderstanding. Sasha's > > > original comment really sounded like "bug compatibility" with current > > > master is desirable and that made me go "Are you serious?" as well... > > > > As I said before in this thread, yes, sometimes I do this on purpose. > > > > And I guess this is the one that gets people the feeling that > "stable is not as stable as it used to be" ... It's always been this way, it's just that no one noticed :) > > As an specific example, see a recent bluetooth patch that caused a > > regression on some chromebook devices. The chromeos developers > > rightfully complainied, and I left the commit in there to provide the > > needed "leverage" on the upstream developers to fix this properly. > > Otherwise if I had reverted the stable patch, when people move to a > > newer kernel version, things break, and no one remembers why. > > I do understand what you are trying to do... > > But from my distro hat on I have to treat things differently (and I dont > think I'm alone doing it this way...) > > Known breakages gets reverted even before it hits QA, so endusers wont see > the issue at all... > > So the only ones to see the issue are those building with latest upstream > without own patches applied... > > > > > I also wrote a long response as to _why_ I do this, and even though it > > does happen, why it still is worth taking the stable updates. Please > > see the archives for the full details. I don't want to duplicate this > > again here. > > And we do use latest stable (with some delay as I dont want to overload QA & > endusers with a new kernel every week :)) You need to automate your QA :) > We just revert known broken (or add known fixes) before releasing them to > our users That's great, and is what you should be doing, nothing wrong there. thanks, greg k-h ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-19 15:57 ` Greg KH @ 2018-04-19 16:25 ` Thomas Backlund 0 siblings, 0 replies; 113+ messages in thread From: Thomas Backlund @ 2018-04-19 16:25 UTC (permalink / raw) To: Greg KH, Thomas Backlund Cc: Jan Kara, Sasha Levin, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek Den 19.04.2018 kl. 18:57, skrev Greg KH: > On Thu, Apr 19, 2018 at 06:16:26PM +0300, Thomas Backlund wrote: >> Den 19.04.2018 kl. 17:22, skrev Greg KH: >>> On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote: >>>> On Thu 19-04-18 15:59:43, Greg KH wrote: >>>>> On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: >>>>>> Den 16-04-2018 kl. 19:19, skrev Sasha Levin: >>>>>>> On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: >>>>>>>> On Mon, 16 Apr 2018 16:02:03 +0000 >>>>>>>> Sasha Levin <Alexander.Levin@microsoft.com> wrote: >>>>>>>> >>>>>>>>> One of the things Greg is pushing strongly for is "bug compatibility": >>>>>>>>> we want the kernel to behave the same way between mainline and stable. >>>>>>>>> If the code is broken, it should be broken in the same way. >>>>>>>> >>>>>>>> Wait! What does that mean? What's the purpose of stable if it is as >>>>>>>> broken as mainline? >>>>>>> >>>>>>> This just means that if there is a fix that went in mainline, and the >>>>>>> fix is broken somehow, we'd rather take the broken fix than not. >>>>>>> >>>>>>> In this scenario, *something* will be broken, it's just a matter of >>>>>>> what. We'd rather have the same thing broken between mainline and >>>>>>> stable. >>>>>>> >>>>>> >>>>>> Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" >>>>>> _is_ a _regression_ you _really_ _dont_ want in a stable >>>>>> supported distro. Because end-users dont care about upstream breaking >>>>>> stuff... its the distro that takes the heat for that... >>>>>> >>>>>> Something "already broken" is not a regression... >>>>>> >>>>>> As distro maintainer that means one now have to review _every_ patch that >>>>>> carries "AUTOSEL", follow all the mail threads that comes up about it, then >>>>>> track if it landed in -stable queue, and read every response and possible >>>>>> objection to all patches in the -stable queue a second time around... then >>>>>> check if it still got included in final stable point relase and then either >>>>>> revert them in distro kernel or go track down all the follow-up fixes >>>>>> needed... >>>>>> >>>>>> Just to avoid being "bug compatible with master" >>>>> >>>>> I've done this "bug compatible" "breakage" more than the AUTOSEL stuff >>>>> has in the past, so you had better also be reviewing all of my normal >>>>> commits as well :) >>>>> >>>>> Anyway, we are trying not to do this, but it does, and will, >>>>> occasionally happen. >>>> >>>> Sure, that's understood. So this was just misunderstanding. Sasha's >>>> original comment really sounded like "bug compatibility" with current >>>> master is desirable and that made me go "Are you serious?" as well... >>> >>> As I said before in this thread, yes, sometimes I do this on purpose. >>> >> >> And I guess this is the one that gets people the feeling that >> "stable is not as stable as it used to be" ... > > It's always been this way, it's just that no one noticed :) > :) >>> As an specific example, see a recent bluetooth patch that caused a >>> regression on some chromebook devices. The chromeos developers >>> rightfully complainied, and I left the commit in there to provide the >>> needed "leverage" on the upstream developers to fix this properly. >>> Otherwise if I had reverted the stable patch, when people move to a >>> newer kernel version, things break, and no one remembers why. >> >> I do understand what you are trying to do... >> >> But from my distro hat on I have to treat things differently (and I dont >> think I'm alone doing it this way...) >> >> Known breakages gets reverted even before it hits QA, so endusers wont see >> the issue at all... >> >> So the only ones to see the issue are those building with latest upstream >> without own patches applied... >> >>> >>> I also wrote a long response as to _why_ I do this, and even though it >>> does happen, why it still is worth taking the stable updates. Please >>> see the archives for the full details. I don't want to duplicate this >>> again here. >> >> And we do use latest stable (with some delay as I dont want to overload QA & >> endusers with a new kernel every week :)) > > You need to automate your QA :) > Yeah, some can be automated... but that means having a lot of different hw to test on... emulators/vms can only test so much... users part of QA test on a variety of hw with various installs/setups that exposes fun things with some hw :) >> We just revert known broken (or add known fixes) before releasing them to >> our users > > That's great, and is what you should be doing, nothing wrong there. > > thanks, > > greg k-h > -- Thomas ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-19 14:22 ` Greg KH 2018-04-19 15:16 ` Thomas Backlund @ 2018-04-19 16:41 ` Greg KH 1 sibling, 0 replies; 113+ messages in thread From: Greg KH @ 2018-04-19 16:41 UTC (permalink / raw) To: Jan Kara Cc: Thomas Backlund, Sasha Levin, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Thu, Apr 19, 2018 at 04:22:22PM +0200, Greg KH wrote: > On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote: > > On Thu 19-04-18 15:59:43, Greg KH wrote: > > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: > > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin: > > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: > > > > > > On Mon, 16 Apr 2018 16:02:03 +0000 > > > > > > Sasha Levin <Alexander.Levin@microsoft.com> wrote: > > > > > > > > > > > > > One of the things Greg is pushing strongly for is "bug compatibility": > > > > > > > we want the kernel to behave the same way between mainline and stable. > > > > > > > If the code is broken, it should be broken in the same way. > > > > > > > > > > > > Wait! What does that mean? What's the purpose of stable if it is as > > > > > > broken as mainline? > > > > > > > > > > This just means that if there is a fix that went in mainline, and the > > > > > fix is broken somehow, we'd rather take the broken fix than not. > > > > > > > > > > In this scenario, *something* will be broken, it's just a matter of > > > > > what. We'd rather have the same thing broken between mainline and > > > > > stable. > > > > > > > > > > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" > > > > _is_ a _regression_ you _really_ _dont_ want in a stable > > > > supported distro. Because end-users dont care about upstream breaking > > > > stuff... its the distro that takes the heat for that... > > > > > > > > Something "already broken" is not a regression... > > > > > > > > As distro maintainer that means one now have to review _every_ patch that > > > > carries "AUTOSEL", follow all the mail threads that comes up about it, then > > > > track if it landed in -stable queue, and read every response and possible > > > > objection to all patches in the -stable queue a second time around... then > > > > check if it still got included in final stable point relase and then either > > > > revert them in distro kernel or go track down all the follow-up fixes > > > > needed... > > > > > > > > Just to avoid being "bug compatible with master" > > > > > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff > > > has in the past, so you had better also be reviewing all of my normal > > > commits as well :) > > > > > > Anyway, we are trying not to do this, but it does, and will, > > > occasionally happen. > > > > Sure, that's understood. So this was just misunderstanding. Sasha's > > original comment really sounded like "bug compatibility" with current > > master is desirable and that made me go "Are you serious?" as well... > > As I said before in this thread, yes, sometimes I do this on purpose. > > As an specific example, see a recent bluetooth patch that caused a > regression on some chromebook devices. The chromeos developers > rightfully complainied, and I left the commit in there to provide the > needed "leverage" on the upstream developers to fix this properly. > Otherwise if I had reverted the stable patch, when people move to a > newer kernel version, things break, and no one remembers why. > > I also wrote a long response as to _why_ I do this, and even though it > does happen, why it still is worth taking the stable updates. Please > see the archives for the full details. I don't want to duplicate this > again here. And to be more specific, let's always take this as a case-by-case basis. The last time this happened was the bluetooth bug and it was a fix for a reported problem, but then the fix caused a regression so upstream reverted it and I reverted it in the stable trees. No matter what I chose to do, someone would be upset so I followed what upstream did. thanks, greg k-h ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-19 13:59 ` Greg KH 2018-04-19 14:05 ` Jan Kara @ 2018-04-19 15:04 ` Thomas Backlund 2018-04-19 15:09 ` Sasha Levin 1 sibling, 1 reply; 113+ messages in thread From: Thomas Backlund @ 2018-04-19 15:04 UTC (permalink / raw) To: Greg KH, Thomas Backlund Cc: Sasha Levin, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek Den 19.04.2018 kl. 16:59, skrev Greg KH: > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: >> Den 16-04-2018 kl. 19:19, skrev Sasha Levin: >>> On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: >>>> On Mon, 16 Apr 2018 16:02:03 +0000 >>>> Sasha Levin <Alexander.Levin@microsoft.com> wrote: >>>> >>>>> One of the things Greg is pushing strongly for is "bug compatibility": >>>>> we want the kernel to behave the same way between mainline and stable. >>>>> If the code is broken, it should be broken in the same way. >>>> >>>> Wait! What does that mean? What's the purpose of stable if it is as >>>> broken as mainline? >>> >>> This just means that if there is a fix that went in mainline, and the >>> fix is broken somehow, we'd rather take the broken fix than not. >>> >>> In this scenario, *something* will be broken, it's just a matter of >>> what. We'd rather have the same thing broken between mainline and >>> stable. >>> >> >> Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" >> _is_ a _regression_ you _really_ _dont_ want in a stable >> supported distro. Because end-users dont care about upstream breaking >> stuff... its the distro that takes the heat for that... >> >> Something "already broken" is not a regression... >> >> As distro maintainer that means one now have to review _every_ patch that >> carries "AUTOSEL", follow all the mail threads that comes up about it, then >> track if it landed in -stable queue, and read every response and possible >> objection to all patches in the -stable queue a second time around... then >> check if it still got included in final stable point relase and then either >> revert them in distro kernel or go track down all the follow-up fixes >> needed... >> >> Just to avoid being "bug compatible with master" > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff > has in the past, so you had better also be reviewing all of my normal > commits as well :) > Yeah, I do... and same goes there ... if there is a known issue, then same procedure... Either revert, or try to track down fixes... > Anyway, we are trying not to do this, but it does, and will, > occasionally happen. Look, we just did that for one platform for > 4.9.94! And the key to all of this is good testing, which we are now > doing, and hopefully you are also doing as well. Yeah, but having to test stuff with known breakages is no fun, so we try to avoid that -- Thomas ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-19 15:04 ` Thomas Backlund @ 2018-04-19 15:09 ` Sasha Levin 2018-04-19 16:20 ` Thomas Backlund 0 siblings, 1 reply; 113+ messages in thread From: Sasha Levin @ 2018-04-19 15:09 UTC (permalink / raw) To: Thomas Backlund Cc: Greg KH, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Thu, Apr 19, 2018 at 06:04:26PM +0300, Thomas Backlund wrote: >Den 19.04.2018 kl. 16:59, skrev Greg KH: >>Anyway, we are trying not to do this, but it does, and will, >>occasionally happen. Look, we just did that for one platform for >>4.9.94! And the key to all of this is good testing, which we are now >>doing, and hopefully you are also doing as well. > >Yeah, but having to test stuff with known breakages is no fun, so we >try to avoid that Known breakages are easier to deal with than unknown ones :) I think that that "bug compatability" is basically a policy on *which* regressions you'll see vs *if* you'll see a regression. We'll never pull in a commit that introduces a bug but doesn't fix another one, right? So if you have to deal with a regression anyway, might as well deal with a regression that is also seen on mainline, so that when you upgrade your stable kernel you'll keep the same set of regressions to deal with. ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-19 15:09 ` Sasha Levin @ 2018-04-19 16:20 ` Thomas Backlund 0 siblings, 0 replies; 113+ messages in thread From: Thomas Backlund @ 2018-04-19 16:20 UTC (permalink / raw) To: Sasha Levin, Thomas Backlund Cc: Greg KH, Steven Rostedt, Linus Torvalds, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek Den 19.04.2018 kl. 18:09, skrev Sasha Levin: > On Thu, Apr 19, 2018 at 06:04:26PM +0300, Thomas Backlund wrote: >> Den 19.04.2018 kl. 16:59, skrev Greg KH: >>> Anyway, we are trying not to do this, but it does, and will, >>> occasionally happen. Look, we just did that for one platform for >>> 4.9.94! And the key to all of this is good testing, which we are now >>> doing, and hopefully you are also doing as well. >> >> Yeah, but having to test stuff with known breakages is no fun, so we >> try to avoid that > > Known breakages are easier to deal with than unknown ones :) well, if a system worked before the update, but not after... Guess wich one we want... > > I think that that "bug compatability" is basically a policy on *which* > regressions you'll see vs *if* you'll see a regression. > No. Intentionally breaking known working code in a stable branch is never ok. As I said before... something that never worked is not a regression, but breaking a previously working setup is... That goes for security fixes too... there is not much point in a security fix, if it basically turns into a local DOS when the system stops working... People will just boot older code and there you have it... > We'll never pull in a commit that introduces a bug but doesn't fix > another one, right? So if you have to deal with a regression anyway, > might as well deal with a regression that is also seen on mainline, so > that when you upgrade your stable kernel you'll keep the same set of > regressions to deal with. > Here I actually like the comment Linus posted about API breakage earlier in this thread... <quote> If you break user workflows, NOTHING ELSE MATTERS. Even security is secondary to "people don't use the end result, because it doesn't work for them any more". </quote> _This_ same statement should be aknowledged / enforced in stable trees too IMHO... Because this is what will happend... simple logic... if it does not work, the enduser will boot an earlier kernel... missing "all the good fixes" (including security ones) just because one fix is bad. For example in this AUTOSEL round there is 161 fixes of wich the enduser never gets the 160 "supposedly good ones" when one is "bad"... How is that a "good thing" ? And trying to tell those that get hit "this will force upstream to fix it faster, so you get a working setup in some days/weeks/months..." is not going to work... Heh, This even reminds me that this is just as annoying as when MS started to "bundle monthly security updates" and you get 95% installed just to realize that the last 5% does not work (or install at all) and you have to rollback to something working thus missing the needed security fixes... Same flawed logic... Thnakfully we as distro maintainers can avoid some of the breakage for our enduses... -- Thomas ^ permalink raw reply [flat|nested] 113+ messages in thread
* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes 2018-04-16 15:18 ` Linus Torvalds 2018-04-16 15:30 ` Pavel Machek 2018-04-16 15:36 ` Steven Rostedt @ 2018-04-16 15:39 ` Sasha Levin 2 siblings, 0 replies; 113+ messages in thread From: Sasha Levin @ 2018-04-16 15:39 UTC (permalink / raw) To: Linus Torvalds Cc: Steven Rostedt, Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek On Mon, Apr 16, 2018 at 08:18:09AM -0700, Linus Torvalds wrote: >On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote: >> >> I wonder if the "AUTOSEL" patches should at least have an "ack-by" from >> someone before they are pulled in. Otherwise there may be some subtle >> issues that can find their way into stable releases. > >I don't know about anybody else, but I get so many of the patch-bot >patches for stable etc that I will *not* reply to normal cases. Only >if there's some issue with a patch will I reply. > >I probably do get more than most, but still - requiring active >participation for the steady flow of normal stable patches is almost >pointless. > >Just look at the subject line of this thread. The numbers are so big >that you almost need exponential notation for them. > > Linus I would be more than happy to make this an opt-in process on my end, but given the responses I've been seeing from folks so far I doubt it'll work for many people. Humans don't scale :) There are a few statistics that suggest that the current workflow is "good enough": 1. The rejection rate (commits fixed or reverted) for AUTOSEL commits is similar (actually smaller) than commits tagged for -stable. 2. Human response rate on review requests is higher than the rate Greg is getting with his review mails. This is somewhat expected, but it shows that people do what Linus does and reply just when they see something wrong. I also think that using mailing lists for these is bringing up the limitations of mailing lists. It's hard to go through the amount of patches AUTOSEL is generating this way, but right now we don't have a better alternative. ^ permalink raw reply [flat|nested] 113+ messages in thread
end of thread, other threads:[~2018-05-03 13:31 UTC | newest] Thread overview: 113+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20180409001936.162706-1-alexander.levin@microsoft.com> 2018-04-09 0:19 ` [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes Sasha Levin 2018-04-09 8:22 ` Petr Mladek 2018-04-15 14:42 ` Sasha Levin 2018-04-16 13:30 ` Steven Rostedt 2018-04-16 15:18 ` Linus Torvalds 2018-04-16 15:30 ` Pavel Machek 2018-04-16 15:50 ` Sasha Levin 2018-04-16 16:06 ` Pavel Machek 2018-04-16 16:14 ` Sasha Levin 2018-04-16 16:22 ` Steven Rostedt 2018-04-16 16:31 ` Sasha Levin 2018-04-16 16:47 ` Steven Rostedt 2018-04-16 16:53 ` Sasha Levin 2018-04-16 17:00 ` Pavel Machek 2018-04-17 10:46 ` Greg KH 2018-04-17 12:24 ` Petr Mladek 2018-04-17 12:49 ` Michal Hocko 2018-04-17 13:39 ` Sasha Levin 2018-04-17 14:22 ` Michal Hocko 2018-04-17 14:36 ` Sasha Levin 2018-04-17 18:10 ` Michal Hocko 2018-04-17 13:45 ` Sasha Levin 2018-04-18 8:33 ` Petr Mladek 2018-04-16 16:28 ` Pavel Machek 2018-04-16 16:39 ` Sasha Levin 2018-04-16 16:42 ` Pavel Machek 2018-04-16 16:45 ` Sasha Levin 2018-04-16 16:54 ` Pavel Machek 2018-04-17 10:50 ` Greg KH 2018-04-16 17:05 ` Pavel Machek 2018-04-16 17:16 ` Sasha Levin 2018-04-16 17:44 ` Steven Rostedt 2018-04-16 18:17 ` Sasha Levin 2018-04-16 18:35 ` Steven Rostedt 2018-04-16 20:17 ` Jiri Kosina 2018-04-16 20:36 ` Sasha Levin 2018-04-16 20:43 ` Jiri Kosina 2018-04-16 21:18 ` Sasha Levin 2018-04-16 21:28 ` Jiri Kosina 2018-04-17 10:39 ` Greg KH 2018-04-17 11:07 ` Michal Hocko 2018-04-17 14:04 ` Sasha Levin 2018-04-17 14:15 ` Steven Rostedt 2018-04-17 14:36 ` Greg KH 2018-04-17 14:36 ` Michal Hocko 2018-04-17 14:55 ` Sasha Levin 2018-04-17 15:52 ` Jiri Kosina 2018-04-17 16:06 ` Sasha Levin 2018-05-03 10:04 ` Pavel Machek 2018-05-03 13:02 ` Sasha Levin 2018-04-17 16:25 ` Mike Galbraith 2018-04-17 11:21 ` Jiri Kosina 2018-05-03 9:47 ` Pavel Machek 2018-05-03 13:06 ` Sasha Levin 2018-04-16 16:20 ` Steven Rostedt 2018-04-16 16:28 ` Sasha Levin 2018-04-16 16:39 ` Pavel Machek 2018-04-16 16:43 ` Sasha Levin 2018-04-16 16:53 ` Steven Rostedt 2018-04-16 16:58 ` Pavel Machek 2018-04-16 17:09 ` Sasha Levin 2018-04-16 17:33 ` Steven Rostedt 2018-04-16 17:42 ` Sasha Levin 2018-04-16 18:26 ` Steven Rostedt 2018-04-16 18:30 ` Linus Torvalds 2018-04-16 18:41 ` Steven Rostedt 2018-04-16 18:52 ` Linus Torvalds 2018-04-16 19:00 ` Linus Torvalds 2018-04-16 19:30 ` Steven Rostedt 2018-04-16 19:19 ` Linus Torvalds 2018-04-16 19:24 ` Steven Rostedt 2018-04-16 19:28 ` Linus Torvalds 2018-04-16 19:31 ` Linus Torvalds 2018-04-16 19:58 ` Steven Rostedt 2018-04-16 19:38 ` Steven Rostedt 2018-04-16 19:55 ` Linus Torvalds 2018-04-16 20:02 ` Steven Rostedt 2018-04-16 20:17 ` Linus Torvalds 2018-04-16 20:33 ` Jiri Kosina 2018-04-16 21:27 ` Steven Rostedt 2018-04-16 18:35 ` Sasha Levin 2018-04-16 18:57 ` Steven Rostedt 2018-04-16 15:36 ` Steven Rostedt 2018-04-16 16:02 ` Sasha Levin 2018-04-16 16:10 ` Pavel Machek 2018-04-16 16:12 ` Steven Rostedt 2018-04-16 16:19 ` Sasha Levin 2018-04-16 16:30 ` Steven Rostedt 2018-04-16 16:37 ` Sasha Levin 2018-04-16 17:06 ` Pavel Machek 2018-04-16 17:23 ` Sasha Levin 2018-04-17 11:41 ` Jan Kara 2018-04-17 13:31 ` Sasha Levin 2018-04-17 15:55 ` Jan Kara 2018-04-17 16:19 ` Sasha Levin 2018-04-17 17:57 ` Jan Kara 2018-04-17 18:28 ` Sasha Levin 2018-05-03 9:36 ` Pavel Machek 2018-05-03 13:28 ` Sasha Levin 2018-05-03 9:32 ` Pavel Machek 2018-05-03 13:30 ` Sasha Levin 2018-04-19 11:41 ` Thomas Backlund 2018-04-19 13:59 ` Greg KH 2018-04-19 14:05 ` Jan Kara 2018-04-19 14:22 ` Greg KH 2018-04-19 15:16 ` Thomas Backlund 2018-04-19 15:57 ` Greg KH 2018-04-19 16:25 ` Thomas Backlund 2018-04-19 16:41 ` Greg KH 2018-04-19 15:04 ` Thomas Backlund 2018-04-19 15:09 ` Sasha Levin 2018-04-19 16:20 ` Thomas Backlund 2018-04-16 15:39 ` Sasha Levin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).