[PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
       [not found] <20180409001936.162706-1-alexander.levin@microsoft.com>
@ 2018-04-09  0:19 ` Sasha Levin
  2018-04-09  8:22   ` Petr Mladek
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-09  0:19 UTC (permalink / raw)
  To: stable@vger.kernel.org, linux-kernel@vger.kernel.org
  Cc: Steven Rostedt (VMware), akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	Byungchul Park, Tejun Heo, Pavel Machek, Petr Mladek, Sasha Levin

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

[ Upstream commit dbdda842fe96f8932bae554f0adf463c27c42bc7 ]

This patch implements what I discussed in Kernel Summit. I added
lockdep annotation (hopefully correctly), and it hasn't had any splats
(since I fixed some bugs in the first iterations). It did catch
problems when I had the owner covering too much. But now that the owner
is only set when actively calling the consoles, lockdep has stayed
quiet.

Here's the design again:

I added a "console_owner" which is set to a task that is actively
writing to the consoles. It is *not* the same as the owner of the
console_lock. It is only set when doing the calls to the console
functions. It is protected by a console_owner_lock which is a raw spin
lock.

There is a console_waiter. This is set when there is an active console
owner that is not current, and waiter is not set. This too is protected
by console_owner_lock.

In printk() when it tries to write to the consoles, we have:

	if (console_trylock())
		console_unlock();

Now I added an else, which will check if there is an active owner, and
no current waiter. If that is the case, then console_waiter is set, and
the task goes into a spin until it is no longer set.

When the active console owner finishes writing the current message to
the consoles, it grabs the console_owner_lock and sees if there is a
waiter, and clears console_owner.

If there is a waiter, then it breaks out of the loop, clears the waiter
flag (because that will release the waiter from its spin), and exits.
Note, it does *not* release the console semaphore. Because it is a
semaphore, there is no owner. Another task may release it. This means
that the waiter is guaranteed to be the new console owner! Which it
becomes.

Then the waiter calls console_unlock() and continues to write to the
consoles.

If another task comes along and does a printk() it too can become the
new waiter, and we wash rinse and repeat!

By Petr Mladek about possible new deadlocks:

The thing is that we move console_sem only to printk() call
that normally calls console_unlock() as well. It means that
the transferred owner should not bring new type of dependencies.
As Steven said somewhere: "If there is a deadlock, it was
there even before."

We could look at it from this side. The possible deadlock would
look like:

CPU0                            CPU1

console_unlock()

  console_owner = current;

				spin_lockA()
				  printk()
				    spin = true;
				    while (...)

    call_console_drivers()
      spin_lockA()

This would be a deadlock. CPU0 would wait for the lock A.
While CPU1 would own the lockA and would wait for CPU0
to finish calling the console drivers and pass the console_sem
owner.

But if the above is true than the following scenario was
already possible before:

CPU0

spin_lockA()
  printk()
    console_unlock()
      call_console_drivers()
	spin_lockA()

By other words, this deadlock was there even before. Such
deadlocks are prevented by using printk_deferred() in
the sections guarded by the lock A.

By Steven Rostedt:

To demonstrate the issue, this module has been shown to lock up a
system with 4 CPUs and a slow console (like a serial console). It is
also able to lock up a 8 CPU system with only a fast (VGA) console, by
passing in "loops=100". The changes in this commit prevent this module
from locking up the system.

 #include <linux/module.h>
 #include <linux/delay.h>
 #include <linux/sched.h>
 #include <linux/mutex.h>
 #include <linux/workqueue.h>
 #include <linux/hrtimer.h>

 static bool stop_testing;
 static unsigned int loops = 1;

 static void preempt_printk_workfn(struct work_struct *work)
 {
 	int i;

 	while (!READ_ONCE(stop_testing)) {
 		for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) {
 			preempt_disable();
 			pr_emerg("%5d%-75s\n", smp_processor_id(),
 				 " XXX NOPREEMPT");
 			preempt_enable();
 		}
 		msleep(1);
 	}
 }

 static struct work_struct __percpu *works;

 static void finish(void)
 {
 	int cpu;

 	WRITE_ONCE(stop_testing, true);
 	for_each_online_cpu(cpu)
 		flush_work(per_cpu_ptr(works, cpu));
 	free_percpu(works);
 }

 static int __init test_init(void)
 {
 	int cpu;

 	works = alloc_percpu(struct work_struct);
 	if (!works)
 		return -ENOMEM;

 	/*
 	 * This is just a test module. This will break if you
 	 * do any CPU hot plugging between loading and
 	 * unloading the module.
 	 */

 	for_each_online_cpu(cpu) {
 		struct work_struct *work = per_cpu_ptr(works, cpu);

 		INIT_WORK(work, &preempt_printk_workfn);
 		schedule_work_on(cpu, work);
 	}

 	return 0;
 }

 static void __exit test_exit(void)
 {
 	finish();
 }

 module_param(loops, uint, 0);
 module_init(test_init);
 module_exit(test_exit);
 MODULE_LICENSE("GPL");

Link: http://lkml.kernel.org/r/20180110132418.7080-2-pmladek@suse.com
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[pmladek@suse.com: Commit message about possible deadlocks]
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
---
 kernel/printk/printk.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 107 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 512f7c2baedd..89c3496975cc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -86,8 +86,15 @@ EXPORT_SYMBOL_GPL(console_drivers);
 static struct lockdep_map console_lock_dep_map = {
 	.name = "console_lock"
 };
+static struct lockdep_map console_owner_dep_map = {
+	.name = "console_owner"
+};
 #endif
 
+static DEFINE_RAW_SPINLOCK(console_owner_lock);
+static struct task_struct *console_owner;
+static bool console_waiter;
+
 enum devkmsg_log_bits {
 	__DEVKMSG_LOG_BIT_ON = 0,
 	__DEVKMSG_LOG_BIT_OFF,
@@ -1753,8 +1760,56 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * semaphore.  The release will print out buffers and wake up
 		 * /dev/kmsg and syslog() users.
 		 */
-		if (console_trylock())
+		if (console_trylock()) {
 			console_unlock();
+		} else {
+			struct task_struct *owner = NULL;
+			bool waiter;
+			bool spin = false;
+
+			printk_safe_enter_irqsave(flags);
+
+			raw_spin_lock(&console_owner_lock);
+			owner = READ_ONCE(console_owner);
+			waiter = READ_ONCE(console_waiter);
+			if (!waiter && owner && owner != current) {
+				WRITE_ONCE(console_waiter, true);
+				spin = true;
+			}
+			raw_spin_unlock(&console_owner_lock);
+
+			/*
+			 * If there is an active printk() writing to the
+			 * consoles, instead of having it write our data too,
+			 * see if we can offload that load from the active
+			 * printer, and do some printing ourselves.
+			 * Go into a spin only if there isn't already a waiter
+			 * spinning, and there is an active printer, and
+			 * that active printer isn't us (recursive printk?).
+			 */
+			if (spin) {
+				/* We spin waiting for the owner to release us */
+				spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+				/* Owner will clear console_waiter on hand off */
+				while (READ_ONCE(console_waiter))
+					cpu_relax();
+
+				spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+				printk_safe_exit_irqrestore(flags);
+
+				/*
+				 * The owner passed the console lock to us.
+				 * Since we did not spin on console lock, annotate
+				 * this as a trylock. Otherwise lockdep will
+				 * complain.
+				 */
+				mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
+				console_unlock();
+				printk_safe_enter_irqsave(flags);
+			}
+			printk_safe_exit_irqrestore(flags);
+
+		}
 	}
 
 	return printed_len;
@@ -2141,6 +2196,7 @@ void console_unlock(void)
 	static u64 seen_seq;
 	unsigned long flags;
 	bool wake_klogd = false;
+	bool waiter = false;
 	bool do_cond_resched, retry;
 
 	if (console_suspended) {
@@ -2229,14 +2285,64 @@ skip:
 		console_seq++;
 		raw_spin_unlock(&logbuf_lock);
 
+		/*
+		 * While actively printing out messages, if another printk()
+		 * were to occur on another CPU, it may wait for this one to
+		 * finish. This task can not be preempted if there is a
+		 * waiter waiting to take over.
+		 */
+		raw_spin_lock(&console_owner_lock);
+		console_owner = current;
+		raw_spin_unlock(&console_owner_lock);
+
+		/* The waiter may spin on us after setting console_owner */
+		spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(ext_text, ext_len, text, len);
 		start_critical_timings();
+
+		raw_spin_lock(&console_owner_lock);
+		waiter = READ_ONCE(console_waiter);
+		console_owner = NULL;
+		raw_spin_unlock(&console_owner_lock);
+
+		/*
+		 * If there is a waiter waiting for us, then pass the
+		 * rest of the work load over to that waiter.
+		 */
+		if (waiter)
+			break;
+
+		/* There was no waiter, and nothing will spin on us here */
+		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+
 		printk_safe_exit_irqrestore(flags);
 
 		if (do_cond_resched)
 			cond_resched();
 	}
+
+	/*
+	 * If there is an active waiter waiting on the console_lock.
+	 * Pass off the printing to the waiter, and the waiter
+	 * will continue printing on its CPU, and when all writing
+	 * has finished, the last printer will wake up klogd.
+	 */
+	if (waiter) {
+		WRITE_ONCE(console_waiter, false);
+		/* The waiter is now free to continue */
+		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+		/*
+		 * Hand off console_lock to waiter. The waiter will perform
+		 * the up(). After this, the waiter is the console_lock owner.
+		 */
+		mutex_release(&console_lock_dep_map, 1, _THIS_IP_);
+		printk_safe_exit_irqrestore(flags);
+		/* Note, if waiter is set, logbuf_lock is not held */
+		return;
+	}
+
 	console_locked = 0;
 
 	/* Release the exclusive_console once it is used */
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-09  0:19 ` [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes Sasha Levin
@ 2018-04-09  8:22   ` Petr Mladek
  2018-04-15 14:42     ` Sasha Levin
  0 siblings, 1 reply; 113+ messages in thread
From: Petr Mladek @ 2018-04-09  8:22 UTC (permalink / raw)
  To: Sasha Levin
  Cc: stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	Steven Rostedt (VMware), akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	Byungchul Park, Tejun Heo, Pavel Machek

On Mon 2018-04-09 00:19:53, Sasha Levin wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> [ Upstream commit dbdda842fe96f8932bae554f0adf463c27c42bc7 ]
> 
> This patch implements what I discussed in Kernel Summit. I added
> lockdep annotation (hopefully correctly), and it hasn't had any splats
> (since I fixed some bugs in the first iterations). It did catch
> problems when I had the owner covering too much. But now that the owner
> is only set when actively calling the consoles, lockdep has stayed
> quiet.

Same here. I do not thing that this is a material for stable backport.
More details can be found in my reply to the patch for 4.15, see
https://lkml.kernel.org/r/20180409081535.dq7p5bfnpvd3xk3t@pathway.suse.cz

Best Regards,
Petr

PS: I wonder how much time you give people to react before releasing
this. The number of autosel mails is increasing and I am involved
only in very small amount of them. I wonder if some other people
gets overwhelmed by this.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-09  8:22   ` Petr Mladek
@ 2018-04-15 14:42     ` Sasha Levin
  2018-04-16 13:30       ` Steven Rostedt
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-15 14:42 UTC (permalink / raw)
  To: Petr Mladek
  Cc: stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	Steven Rostedt (VMware), akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	Byungchul Park, Tejun Heo, Pavel Machek

On Mon, Apr 09, 2018 at 10:22:46AM +0200, Petr Mladek wrote:
>PS: I wonder how much time you give people to react before releasing
>this. The number of autosel mails is increasing and I am involved
>only in very small amount of them. I wonder if some other people
>gets overwhelmed by this.

My review cycle gives at least a week, and there's usually another week
until Greg releases them.

I know it's a lot of mails, but in reality it's a lot of commits that
should go in -stable.

Would a different format for review would make it easier?

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-15 14:42     ` Sasha Levin
@ 2018-04-16 13:30       ` Steven Rostedt
  2018-04-16 15:18         ` Linus Torvalds
  0 siblings, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 13:30 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo,
	Pavel Machek

On Sun, 15 Apr 2018 14:42:51 +0000
Sasha Levin <Alexander.Levin@microsoft.com> wrote:

> On Mon, Apr 09, 2018 at 10:22:46AM +0200, Petr Mladek wrote:
> >PS: I wonder how much time you give people to react before releasing
> >this. The number of autosel mails is increasing and I am involved
> >only in very small amount of them. I wonder if some other people
> >gets overwhelmed by this.  
> 
> My review cycle gives at least a week, and there's usually another week
> until Greg releases them.
> 
> I know it's a lot of mails, but in reality it's a lot of commits that
> should go in -stable.
> 
> Would a different format for review would make it easier?

I wonder if the "AUTOSEL" patches should at least have an "ack-by" from
someone before they are pulled in. Otherwise there may be some subtle
issues that can find their way into stable releases.

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 13:30       ` Steven Rostedt
@ 2018-04-16 15:18         ` Linus Torvalds
  2018-04-16 15:30           ` Pavel Machek
                             ` (2 more replies)
  0 siblings, 3 replies; 113+ messages in thread
From: Linus Torvalds @ 2018-04-16 15:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sasha Levin, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Pavel Machek

On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I wonder if the "AUTOSEL" patches should at least have an "ack-by" from
> someone before they are pulled in. Otherwise there may be some subtle
> issues that can find their way into stable releases.

I don't know about anybody else, but I  get so many of the patch-bot
patches for stable etc that I will *not* reply to normal cases. Only
if there's some issue with a patch will I reply.

I probably do get more than most, but still - requiring active
participation for the steady flow of normal stable patches is almost
pointless.

Just look at the subject line of this thread. The numbers are so big
that you almost need exponential notation for them.

           Linus

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 15:18         ` Linus Torvalds
@ 2018-04-16 15:30           ` Pavel Machek
  2018-04-16 15:50             ` Sasha Levin
  2018-04-16 15:36           ` Steven Rostedt
  2018-04-16 15:39           ` Sasha Levin
  2 siblings, 1 reply; 113+ messages in thread
From: Pavel Machek @ 2018-04-16 15:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Sasha Levin, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 1168 bytes --]

On Mon 2018-04-16 08:18:09, Linus Torvalds wrote:
> On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > I wonder if the "AUTOSEL" patches should at least have an "ack-by" from
> > someone before they are pulled in. Otherwise there may be some subtle
> > issues that can find their way into stable releases.
> 
> I don't know about anybody else, but I  get so many of the patch-bot
> patches for stable etc that I will *not* reply to normal cases. Only
> if there's some issue with a patch will I reply.
> 
> I probably do get more than most, but still - requiring active
> participation for the steady flow of normal stable patches is almost
> pointless.
> 
> Just look at the subject line of this thread. The numbers are so big
> that you almost need exponential notation for them.

Question is if we need that many stable patches? Autosel seems to be
picking up race conditions in LED state and W+X page fixes... I'd
really like to see less stable patches.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 15:30           ` Pavel Machek
@ 2018-04-16 15:50             ` Sasha Levin
  2018-04-16 16:06               ` Pavel Machek
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 15:50 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Mon, Apr 16, 2018 at 05:30:31PM +0200, Pavel Machek wrote:
>On Mon 2018-04-16 08:18:09, Linus Torvalds wrote:
>> On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> >
>> > I wonder if the "AUTOSEL" patches should at least have an "ack-by" from
>> > someone before they are pulled in. Otherwise there may be some subtle
>> > issues that can find their way into stable releases.
>>
>> I don't know about anybody else, but I  get so many of the patch-bot
>> patches for stable etc that I will *not* reply to normal cases. Only
>> if there's some issue with a patch will I reply.
>>
>> I probably do get more than most, but still - requiring active
>> participation for the steady flow of normal stable patches is almost
>> pointless.
>>
>> Just look at the subject line of this thread. The numbers are so big
>> that you almost need exponential notation for them.
>
>Question is if we need that many stable patches? Autosel seems to be
>picking up race conditions in LED state and W+X page fixes... I'd
>really like to see less stable patches.

Why? Given that the kernel keeps seeing more and more lines of code in
each new release, tools around the kernel keep evolving (new fuzzers,
testing suites, etc), and code gets more eyes, this guarantees that
you'll see more and more stable patches for each release as well.

Is there a reason not to take LED fixes if they fix a bug and don't
cause a regression? Sure, we can draw some arbitrary line, maybe
designate some subsystems that are more "important" than others, but
what's the point?

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 15:50             ` Sasha Levin
@ 2018-04-16 16:06               ` Pavel Machek
  2018-04-16 16:14                 ` Sasha Levin
  2018-04-16 16:20                 ` Steven Rostedt
  0 siblings, 2 replies; 113+ messages in thread
From: Pavel Machek @ 2018-04-16 16:06 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 2381 bytes --]

On Mon 2018-04-16 15:50:34, Sasha Levin wrote:
> On Mon, Apr 16, 2018 at 05:30:31PM +0200, Pavel Machek wrote:
> >On Mon 2018-04-16 08:18:09, Linus Torvalds wrote:
> >> On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >> >
> >> > I wonder if the "AUTOSEL" patches should at least have an "ack-by" from
> >> > someone before they are pulled in. Otherwise there may be some subtle
> >> > issues that can find their way into stable releases.
> >>
> >> I don't know about anybody else, but I  get so many of the patch-bot
> >> patches for stable etc that I will *not* reply to normal cases. Only
> >> if there's some issue with a patch will I reply.
> >>
> >> I probably do get more than most, but still - requiring active
> >> participation for the steady flow of normal stable patches is almost
> >> pointless.
> >>
> >> Just look at the subject line of this thread. The numbers are so big
> >> that you almost need exponential notation for them.
> >
> >Question is if we need that many stable patches? Autosel seems to be
> >picking up race conditions in LED state and W+X page fixes... I'd
> >really like to see less stable patches.
> 
> Why? Given that the kernel keeps seeing more and more lines of code in
> each new release, tools around the kernel keep evolving (new fuzzers,
> testing suites, etc), and code gets more eyes, this guarantees that
> you'll see more and more stable patches for each release as well.
> 
> Is there a reason not to take LED fixes if they fix a bug and don't
> cause a regression? Sure, we can draw some arbitrary line, maybe
> designate some subsystems that are more "important" than others, but
> what's the point?

There's a tradeoff.

You want to fix serious bugs in stable, and you really don't want
regressions in stable. And ... stable not having 1000s of patches
would be nice, too.

That means you want to ignore not-so-serious bugs, because benefit of
fixing them is lower than risk of the regressions. I believe bugs that
do not bother anyone should _not_ be fixed in stable.

That was case of the LED patch. Yes, the commit fixed bug, but it
introduced regressions that were fixed by subsequent patches.

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:06               ` Pavel Machek
@ 2018-04-16 16:14                 ` Sasha Levin
  2018-04-16 16:22                   ` Steven Rostedt
                                     ` (2 more replies)
  2018-04-16 16:20                 ` Steven Rostedt
  1 sibling, 3 replies; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 16:14 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Mon, Apr 16, 2018 at 06:06:08PM +0200, Pavel Machek wrote:
>On Mon 2018-04-16 15:50:34, Sasha Levin wrote:
>> On Mon, Apr 16, 2018 at 05:30:31PM +0200, Pavel Machek wrote:
>> >On Mon 2018-04-16 08:18:09, Linus Torvalds wrote:
>> >> On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> >> >
>> >> > I wonder if the "AUTOSEL" patches should at least have an "ack-by" from
>> >> > someone before they are pulled in. Otherwise there may be some subtle
>> >> > issues that can find their way into stable releases.
>> >>
>> >> I don't know about anybody else, but I  get so many of the patch-bot
>> >> patches for stable etc that I will *not* reply to normal cases. Only
>> >> if there's some issue with a patch will I reply.
>> >>
>> >> I probably do get more than most, but still - requiring active
>> >> participation for the steady flow of normal stable patches is almost
>> >> pointless.
>> >>
>> >> Just look at the subject line of this thread. The numbers are so big
>> >> that you almost need exponential notation for them.
>> >
>> >Question is if we need that many stable patches? Autosel seems to be
>> >picking up race conditions in LED state and W+X page fixes... I'd
>> >really like to see less stable patches.
>>
>> Why? Given that the kernel keeps seeing more and more lines of code in
>> each new release, tools around the kernel keep evolving (new fuzzers,
>> testing suites, etc), and code gets more eyes, this guarantees that
>> you'll see more and more stable patches for each release as well.
>>
>> Is there a reason not to take LED fixes if they fix a bug and don't
>> cause a regression? Sure, we can draw some arbitrary line, maybe
>> designate some subsystems that are more "important" than others, but
>> what's the point?
>
>There's a tradeoff.
>
>You want to fix serious bugs in stable, and you really don't want
>regressions in stable. And ... stable not having 1000s of patches
>would be nice, too.

I don't think we should use a number cap here, but rather look at the
regression rate: how many patches broke something?

Since the rate we're seeing now with AUTOSEL is similar to what we were
seeing before AUTOSEL, what's the problem it's causing?

>That means you want to ignore not-so-serious bugs, because benefit of
>fixing them is lower than risk of the regressions. I believe bugs that
>do not bother anyone should _not_ be fixed in stable.
>
>That was case of the LED patch. Yes, the commit fixed bug, but it
>introduced regressions that were fixed by subsequent patches.

How do you know if a bug bothers someone?

If a user is annoyed by a LED issue, is he expected to triage the bug,
report it on LKML and patiently wait for the appropriate patch to be
backported?

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:14                 ` Sasha Levin
@ 2018-04-16 16:22                   ` Steven Rostedt
  2018-04-16 16:31                     ` Sasha Levin
  2018-04-16 16:28                   ` Pavel Machek
  2018-04-16 17:05                   ` Pavel Machek
  2 siblings, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 16:22 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo

On Mon, 16 Apr 2018 16:14:15 +0000
Sasha Levin <Alexander.Levin@microsoft.com> wrote:

> Since the rate we're seeing now with AUTOSEL is similar to what we were
> seeing before AUTOSEL, what's the problem it's causing?

Does that mean we just doubled the rate of regressions? That's the
problem.

> 
> How do you know if a bug bothers someone?
> 
> If a user is annoyed by a LED issue, is he expected to triage the bug,
> report it on LKML and patiently wait for the appropriate patch to be
> backported?

Yes.

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:22                   ` Steven Rostedt
@ 2018-04-16 16:31                     ` Sasha Levin
  2018-04-16 16:47                       ` Steven Rostedt
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 16:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo

On Mon, Apr 16, 2018 at 12:22:44PM -0400, Steven Rostedt wrote:
>On Mon, 16 Apr 2018 16:14:15 +0000
>Sasha Levin <Alexander.Levin@microsoft.com> wrote:
>
>> Since the rate we're seeing now with AUTOSEL is similar to what we were
>> seeing before AUTOSEL, what's the problem it's causing?
>
>Does that mean we just doubled the rate of regressions? That's the
>problem.

No, the rate stayed the same :)

If before ~2% of stable commits were buggy, this is still the case with
AUTOSEL.

>>
>> How do you know if a bug bothers someone?
>>
>> If a user is annoyed by a LED issue, is he expected to triage the bug,
>> report it on LKML and patiently wait for the appropriate patch to be
>> backported?
>
>Yes.

I'm honestly not sure how to respond.

Let me ask my wife (who is happy using Linux as a regular desktop user)
how comfortable she would be with triaging kernel bugs...

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:31                     ` Sasha Levin
@ 2018-04-16 16:47                       ` Steven Rostedt
  2018-04-16 16:53                         ` Sasha Levin
  0 siblings, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 16:47 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo

On Mon, 16 Apr 2018 16:31:09 +0000
Sasha Levin <Alexander.Levin@microsoft.com> wrote:

> On Mon, Apr 16, 2018 at 12:22:44PM -0400, Steven Rostedt wrote:
> >On Mon, 16 Apr 2018 16:14:15 +0000
> >Sasha Levin <Alexander.Levin@microsoft.com> wrote:
> >  
> >> Since the rate we're seeing now with AUTOSEL is similar to what we were
> >> seeing before AUTOSEL, what's the problem it's causing?  
> >
> >Does that mean we just doubled the rate of regressions? That's the
> >problem.  
> 
> No, the rate stayed the same :)
> 
> If before ~2% of stable commits were buggy, this is still the case with
> AUTOSEL.

Sorry, I didn't mean "rate" I meant "number". If the rate stayed the
same, that means the number increased.

> 
> >>
> >> How do you know if a bug bothers someone?
> >>
> >> If a user is annoyed by a LED issue, is he expected to triage the bug,
> >> report it on LKML and patiently wait for the appropriate patch to be
> >> backported?  
> >
> >Yes.  
> 
> I'm honestly not sure how to respond.
> 
> Let me ask my wife (who is happy using Linux as a regular desktop user)
> how comfortable she would be with triaging kernel bugs...

That's really up to the distribution, not the main kernel stable. Does
she download and compile the kernels herself? Does she use LEDs?

The point is, stable is to keep what was working continued working.
If we don't care about introducing a regression, and just want to keep
regressions the same as mainline, why not just go to mainline? That way
you can also get the new features? Mainline already has the mantra to
not break user space. When I work on new features, I sometimes stumble
on bugs with the current features. And some of those fixes require a
rewrite. It was "good enough" before, but every so often could cause a
bug that the new feature would trigger more often. Do we back port that
rewrite? Do we backport fixes to old code that are more likely to be
triggered by new features?

Ideally, we should be working on getting to no regressions to stable.

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:47                       ` Steven Rostedt
@ 2018-04-16 16:53                         ` Sasha Levin
  2018-04-16 17:00                           ` Pavel Machek
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 16:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo

On Mon, Apr 16, 2018 at 12:47:11PM -0400, Steven Rostedt wrote:
>On Mon, 16 Apr 2018 16:31:09 +0000
>Sasha Levin <Alexander.Levin@microsoft.com> wrote:
>
>> On Mon, Apr 16, 2018 at 12:22:44PM -0400, Steven Rostedt wrote:
>> >On Mon, 16 Apr 2018 16:14:15 +0000
>> >Sasha Levin <Alexander.Levin@microsoft.com> wrote:
>> >
>> >> Since the rate we're seeing now with AUTOSEL is similar to what we were
>> >> seeing before AUTOSEL, what's the problem it's causing?
>> >
>> >Does that mean we just doubled the rate of regressions? That's the
>> >problem.
>>
>> No, the rate stayed the same :)
>>
>> If before ~2% of stable commits were buggy, this is still the case with
>> AUTOSEL.
>
>Sorry, I didn't mean "rate" I meant "number". If the rate stayed the
>same, that means the number increased.

Indeed, just like the number of regressions in mainline has increased
over time.

>>
>> >>
>> >> How do you know if a bug bothers someone?
>> >>
>> >> If a user is annoyed by a LED issue, is he expected to triage the bug,
>> >> report it on LKML and patiently wait for the appropriate patch to be
>> >> backported?
>> >
>> >Yes.
>>
>> I'm honestly not sure how to respond.
>>
>> Let me ask my wife (who is happy using Linux as a regular desktop user)
>> how comfortable she would be with triaging kernel bugs...
>
>That's really up to the distribution, not the main kernel stable. Does
>she download and compile the kernels herself? Does she use LEDs?
>
>The point is, stable is to keep what was working continued working.
>If we don't care about introducing a regression, and just want to keep
>regressions the same as mainline, why not just go to mainline? That way
>you can also get the new features? Mainline already has the mantra to
>not break user space. When I work on new features, I sometimes stumble
>on bugs with the current features. And some of those fixes require a
>rewrite. It was "good enough" before, but every so often could cause a
>bug that the new feature would trigger more often. Do we back port that
>rewrite? Do we backport fixes to old code that are more likely to be
>triggered by new features?
>
>Ideally, we should be working on getting to no regressions to stable.

This is exactly what we're doing.

If a fix for a bug in -stable introduces a different regression,
should we take it or not?

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:53                         ` Sasha Levin
@ 2018-04-16 17:00                           ` Pavel Machek
  2018-04-17 10:46                             ` Greg KH
  0 siblings, 1 reply; 113+ messages in thread
From: Pavel Machek @ 2018-04-16 17:00 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 1466 bytes --]

Hi!

> >> Let me ask my wife (who is happy using Linux as a regular desktop user)
> >> how comfortable she would be with triaging kernel bugs...
> >
> >That's really up to the distribution, not the main kernel stable. Does
> >she download and compile the kernels herself? Does she use LEDs?
> >
> >The point is, stable is to keep what was working continued working.
> >If we don't care about introducing a regression, and just want to keep
> >regressions the same as mainline, why not just go to mainline? That way
> >you can also get the new features? Mainline already has the mantra to
> >not break user space. When I work on new features, I sometimes stumble
> >on bugs with the current features. And some of those fixes require a
> >rewrite. It was "good enough" before, but every so often could cause a
> >bug that the new feature would trigger more often. Do we back port that
> >rewrite? Do we backport fixes to old code that are more likely to be
> >triggered by new features?
> >
> >Ideally, we should be working on getting to no regressions to stable.
> 
> This is exactly what we're doing.
> 
> If a fix for a bug in -stable introduces a different regression,
> should we take it or not?

If a fix for bug introduces regression, would you call it "obviously
correct"?

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 17:00                           ` Pavel Machek
@ 2018-04-17 10:46                             ` Greg KH
  2018-04-17 12:24                               ` Petr Mladek
  0 siblings, 1 reply; 113+ messages in thread
From: Greg KH @ 2018-04-17 10:46 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Sasha Levin, Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Mon, Apr 16, 2018 at 07:00:10PM +0200, Pavel Machek wrote:
> Hi!
> 
> > >> Let me ask my wife (who is happy using Linux as a regular desktop user)
> > >> how comfortable she would be with triaging kernel bugs...
> > >
> > >That's really up to the distribution, not the main kernel stable. Does
> > >she download and compile the kernels herself? Does she use LEDs?
> > >
> > >The point is, stable is to keep what was working continued working.
> > >If we don't care about introducing a regression, and just want to keep
> > >regressions the same as mainline, why not just go to mainline? That way
> > >you can also get the new features? Mainline already has the mantra to
> > >not break user space. When I work on new features, I sometimes stumble
> > >on bugs with the current features. And some of those fixes require a
> > >rewrite. It was "good enough" before, but every so often could cause a
> > >bug that the new feature would trigger more often. Do we back port that
> > >rewrite? Do we backport fixes to old code that are more likely to be
> > >triggered by new features?
> > >
> > >Ideally, we should be working on getting to no regressions to stable.
> > 
> > This is exactly what we're doing.
> > 
> > If a fix for a bug in -stable introduces a different regression,
> > should we take it or not?
> 
> If a fix for bug introduces regression, would you call it "obviously
> correct"?

I honestly can't believe you all are arguing about this.  We backport
bugfixes to the stable tree.  If those fixes also are buggy we either
apply the fix for that problem that ended up in Linus's tree, or we
revert the patch.  If the fix is not in Linus's tree, sometimes we leave
the "bug" in stable for a bit to apply some pressure on the
developer/maintainer to get it fixed in Linus's tree (that's what I mean
by being "bug compatible".)

This is exactly what we have been doing for over a decade now, why are
people suddenly getting upset?

Oh, I know why, suddenly subsystems that never were taking the time to
mark patches for stable are getting patches backported and are getting
nervous.  The simple way to stop that from happening is to PROPERLY MARK
PATCHES FOR STABLE IN THE FIRST PLACE!

If you do that, then, no "automated" patches will get selected as you
already handled them all.  Or if there are some automated patches
picked, you can easily NAK them (like xfs does as they know better than
everyone else, and honestly, I trust them, and don't run xfs myself), or
do like what I do when it happens to me and go "hey, nice, I missed that
one!"

There, problem solved, if you do that, no more worrying by you at all,
and this thread can properly die.

ugh,

greg k-h

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 10:46                             ` Greg KH
@ 2018-04-17 12:24                               ` Petr Mladek
  2018-04-17 12:49                                 ` Michal Hocko
  2018-04-17 13:45                                 ` Sasha Levin
  0 siblings, 2 replies; 113+ messages in thread
From: Petr Mladek @ 2018-04-17 12:24 UTC (permalink / raw)
  To: Greg KH
  Cc: Pavel Machek, Sasha Levin, Steven Rostedt, Linus Torvalds,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue 2018-04-17 12:46:37, Greg KH wrote:
> Oh, I know why, suddenly subsystems that never were taking the time to
> mark patches for stable are getting patches backported and are getting
> nervous.

Yes, I am getting nervous because of this. The number of printk fixes
nominated for stable is increasing exponentially (just my feeling)
during last few months.

The problem is that I want to be responsible and think about possible
regressions. Sometimes it requires checking the state of the
particular kernel release. The older code base the more complicated
the decision is.

You might argue that backporting the fixes helps to get the same code
in all supported code bases. But it is not true. It never will be
the same.

Anyway, in the past the "automatically" nominated printk fixes
were trivial. They did not cause harm. But they also were not
worth it, IMHO. They fixed corner cases that were there for ages.
Most of these fixes were found by code review when working on
a feature. They were not backed by bug reports.

Last week, autosel nominated pretty non-trivial patch (started
this thread). It partly solved a problem we tried to fix last few
years.

On one side, this was an annoying problem that motivated several
people spend a lot of time on it. This might be a motivation
for a backport.

On the other hand, it took many years to come somewhere. The main
problem was the fear of regressions. We fixed/improved many things
in the mean time. It shows that the problem really is not trivial.
The same is true for the fix. We did our best to avoid regressions.
But it does not mean that there are none. Also it does not mean
that it will really give better results in all situations.

I really do not see a reason to hurry and backport this to
the older kernel releases. It means to spread the fix but also
eventual problems. It is easy to miss a dependant patch.
The less trivial fix, the more possible problems are there.

Back to the trend. Last week I got autosel mails even for
patches that were still being discussed, had issues, and
were far from upstream:

https://lkml.kernel.org/r/DM5PR2101MB1032AB19B489D46B717B50D4FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com
https://lkml.kernel.org/r/DM5PR2101MB10327FA0A7E0D2C901E33B79FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com

It might be a good idea if the mail asked to add Fixes: tag
or stable mailing list. But the mail suggested to add the
unfinished patch into stable branch directly (even before
upstreaming?).

Now, there are only hand full of printk patches in each
release, so it is still doable. I just do not understand
how other maintainers, from much more busy subsystems,
could cope with this trend.

By other words. If you want to automatize patch nomination,
you might need to automatize also patch review. Or you need
to keep the patch rate low. This might mean to nominate
only important and rather trivial fixes.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 12:24                               ` Petr Mladek
@ 2018-04-17 12:49                                 ` Michal Hocko
  2018-04-17 13:39                                   ` Sasha Levin
  2018-04-17 13:45                                 ` Sasha Levin
  1 sibling, 1 reply; 113+ messages in thread
From: Michal Hocko @ 2018-04-17 12:49 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Greg KH, Pavel Machek, Sasha Levin, Steven Rostedt,
	Linus Torvalds, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue 17-04-18 14:24:54, Petr Mladek wrote:
[...]
> Back to the trend. Last week I got autosel mails even for
> patches that were still being discussed, had issues, and
> were far from upstream:
> 
> https://lkml.kernel.org/r/DM5PR2101MB1032AB19B489D46B717B50D4FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com
> https://lkml.kernel.org/r/DM5PR2101MB10327FA0A7E0D2C901E33B79FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com
> 
> It might be a good idea if the mail asked to add Fixes: tag
> or stable mailing list. But the mail suggested to add the
> unfinished patch into stable branch directly (even before
> upstreaming?).

Well, I think that poking subsystems which ignore stable trees with such
emails early during review might be quite helpful. Maybe people start
marking for stable and we do not need the guessing later. I wouldn't
bother poking those who are known to mark stable patches though.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 12:49                                 ` Michal Hocko
@ 2018-04-17 13:39                                   ` Sasha Levin
  2018-04-17 14:22                                     ` Michal Hocko
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-17 13:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Petr Mladek, Greg KH, Pavel Machek, Steven Rostedt,
	Linus Torvalds, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue, Apr 17, 2018 at 02:49:24PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 14:24:54, Petr Mladek wrote:
>[...]
>> Back to the trend. Last week I got autosel mails even for
>> patches that were still being discussed, had issues, and
>> were far from upstream:
>>
>> https://lkml.kernel.org/r/DM5PR2101MB1032AB19B489D46B717B50D4FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com
>> https://lkml.kernel.org/r/DM5PR2101MB10327FA0A7E0D2C901E33B79FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com
>>
>> It might be a good idea if the mail asked to add Fixes: tag
>> or stable mailing list. But the mail suggested to add the
>> unfinished patch into stable branch directly (even before
>> upstreaming?).
>
>Well, I think that poking subsystems which ignore stable trees with such
>emails early during review might be quite helpful. Maybe people start
>marking for stable and we do not need the guessing later. I wouldn't
>bother poking those who are known to mark stable patches though.

Yup, mm/ needs far less poking that XFS (for example).

What makes mm/ so good about this is that it's a rather small set of
devs who are good at marking things for stable. As long as the commit
came from one of these "core" mm/ folks it's almost guaranteed to have
proper stable tags.

But mm/ commits don't come only from these people. Here's a concrete
example we can discuss:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d

This was merged in a few days ago, and seems relevant for older kernel
trees as well. Should it not have a stable tag?

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 13:39                                   ` Sasha Levin
@ 2018-04-17 14:22                                     ` Michal Hocko
  2018-04-17 14:36                                       ` Sasha Levin
  0 siblings, 1 reply; 113+ messages in thread
From: Michal Hocko @ 2018-04-17 14:22 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Petr Mladek, Greg KH, Pavel Machek, Steven Rostedt,
	Linus Torvalds, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue 17-04-18 13:39:33, Sasha Levin wrote:
[...]
> But mm/ commits don't come only from these people. Here's a concrete
> example we can discuss:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d

I would be really careful. Because that reqiures to audit all callers to
be compliant with the change. This is just _too_ easy to backport
without noticing a failure. Now consider the other side. Is there any
real bug report backing this? This behavior was like that for quite some
time but I do not remember any actual bug report and the changelog
doesn't mention one either. It is about theoretical problem. 

So if this was to be merged to stable then the changelog should contain
a big fat warning about the existing users and how they should be
checked.

Besides that I can see Reviewed-by: akpm and Andrew is usually very
careful about stable backports so there probably _was_ a reson to
exclude stable.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 14:22                                     ` Michal Hocko
@ 2018-04-17 14:36                                       ` Sasha Levin
  2018-04-17 18:10                                         ` Michal Hocko
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-17 14:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Petr Mladek, Greg KH, Pavel Machek, Steven Rostedt,
	Linus Torvalds, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue, Apr 17, 2018 at 04:22:46PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 13:39:33, Sasha Levin wrote:
>[...]
>> But mm/ commits don't come only from these people. Here's a concrete
>> example we can discuss:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d
>
>I would be really careful. Because that reqiures to audit all callers to
>be compliant with the change. This is just _too_ easy to backport
>without noticing a failure. Now consider the other side. Is there any
>real bug report backing this? This behavior was like that for quite some
>time but I do not remember any actual bug report and the changelog
>doesn't mention one either. It is about theoretical problem.

https://lkml.org/lkml/2018/3/19/430

There's even a fun little reproducer that allowed me to confirm it's an
issue (at least) on 4.15.

Heck, it might even qualify as a CVE.

>So if this was to be merged to stable then the changelog should contain
>a big fat warning about the existing users and how they should be
>checked.

So what I'm asking is why *wasn't* it sent to stable? Yes, it requires
additional work backporting this, but what I'm saying is that this
didn't happen at all.

>Besides that I can see Reviewed-by: akpm and Andrew is usually very
>careful about stable backports so there probably _was_ a reson to
>exclude stable.
>-- 
>Michal Hocko
>SUSE Labs

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 14:36                                       ` Sasha Levin
@ 2018-04-17 18:10                                         ` Michal Hocko
  0 siblings, 0 replies; 113+ messages in thread
From: Michal Hocko @ 2018-04-17 18:10 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Petr Mladek, Greg KH, Pavel Machek, Steven Rostedt,
	Linus Torvalds, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue 17-04-18 14:36:44, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 04:22:46PM +0200, Michal Hocko wrote:
> >On Tue 17-04-18 13:39:33, Sasha Levin wrote:
> >[...]
> >> But mm/ commits don't come only from these people. Here's a concrete
> >> example we can discuss:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d
> >
> >I would be really careful. Because that reqiures to audit all callers to
> >be compliant with the change. This is just _too_ easy to backport
> >without noticing a failure. Now consider the other side. Is there any
> >real bug report backing this? This behavior was like that for quite some
> >time but I do not remember any actual bug report and the changelog
> >doesn't mention one either. It is about theoretical problem.
> 
> https://lkml.org/lkml/2018/3/19/430
> 
> There's even a fun little reproducer that allowed me to confirm it's an
> issue (at least) on 4.15.
> 
> Heck, it might even qualify as a CVE.
> 
> >So if this was to be merged to stable then the changelog should contain
> >a big fat warning about the existing users and how they should be
> >checked.
> 
> So what I'm asking is why *wasn't* it sent to stable? Yes, it requires
> additional work backporting this, but what I'm saying is that this
> didn't happen at all.

Do not ask me. I wasn't involved. But I would _guess_ that the original
bug is not all that serious because it requires some specific privileges
and it is quite unlikely that somebody privileged would want to shoot
its feet. But this is just my wild guess.

Anyway, I am pretty sure that if the triggering BUG was serious enough
then it would be much safer to remove it for stable backports.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 12:24                               ` Petr Mladek
  2018-04-17 12:49                                 ` Michal Hocko
@ 2018-04-17 13:45                                 ` Sasha Levin
  2018-04-18  8:33                                   ` Petr Mladek
  1 sibling, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-17 13:45 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Greg KH, Pavel Machek, Steven Rostedt, Linus Torvalds,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote:
>Back to the trend. Last week I got autosel mails even for
>patches that were still being discussed, had issues, and
>were far from upstream:
>
> https://lkml.kernel.org/r/DM5PR2101MB1032AB19B489D46B717B50D4FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com
> https://lkml.kernel.org/r/DM5PR2101MB10327FA0A7E0D2C901E33B79FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com
>
>It might be a good idea if the mail asked to add Fixes: tag
>or stable mailing list. But the mail suggested to add the
>unfinished patch into stable branch directly (even before
>upstreaming?).

I obviously didn't suggest that this patch will go in -stable before
it's upstream.

I've started doing those because some folks can't be arsed to reply to a
review request for a patch that is months old. I found that if I send
these mails while the discussion is still going on I'd get a much better
response rate from people.

If you think any of these patches should go in stable there were two
ways about it:

 - You end up adding the -stable tag yourself, and it would follow the
   usual route where Greg picks it up.
 - You reply to that mail, and the patch would wait in a list until my
   script notices it made it upstream, at which point it would get
   queued for stable.

>Now, there are only hand full of printk patches in each
>release, so it is still doable. I just do not understand
>how other maintainers, from much more busy subsystems,
>could cope with this trend.
>
>By other words. If you want to automatize patch nomination,
>you might need to automatize also patch review. Or you need
>to keep the patch rate low. This might mean to nominate
>only important and rather trivial fixes.

I also have an effort to help review the patches. See what I'm working
on for the xfs folks:

	https://lkml.org/lkml/2018/3/29/1113

Where in addition to build tests I'd also run each commit, for each
stable kernel through a set of xfstests and provide them along with the
mail.

So yes, I'm aware that the volume of patches is huge, but there's not
much I can do about it because it's just a subset of the kernel's patch
volume and since the kernel gets more and more patches each release, the
volume of stable commits is bound to grow as well.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 13:45                                 ` Sasha Levin
@ 2018-04-18  8:33                                   ` Petr Mladek
  0 siblings, 0 replies; 113+ messages in thread
From: Petr Mladek @ 2018-04-18  8:33 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Greg KH, Pavel Machek, Steven Rostedt, Linus Torvalds,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue 2018-04-17 13:45:59, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote:
> >Back to the trend. Last week I got autosel mails even for
> >patches that were still being discussed, had issues, and
> >were far from upstream:
> >
> > https://lkml.kernel.org/r/DM5PR2101MB1032AB19B489D46B717B50D4FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com
> > https://lkml.kernel.org/r/DM5PR2101MB10327FA0A7E0D2C901E33B79FBBB0@DM5PR2101MB1032.namprd21.prod.outlook.com
> >
> >It might be a good idea if the mail asked to add Fixes: tag
> >or stable mailing list. But the mail suggested to add the
> >unfinished patch into stable branch directly (even before
> >upstreaming?).
> 
> I obviously didn't suggest that this patch will go in -stable before
> it's upstream.
> 
> I've started doing those because some folks can't be arsed to reply to a
> review request for a patch that is months old. I found that if I send
> these mails while the discussion is still going on I'd get a much better
> response rate from people.

I see. It makes sense.

> If you think any of these patches should go in stable there were two
> ways about it:
>
>  - You end up adding the -stable tag yourself, and it would follow the
>    usual route where Greg picks it up.
>  - You reply to that mail, and the patch would wait in a list until my
>    script notices it made it upstream, at which point it would get
>    queued for stable.

It would be great if the options are described in the mail.

I wonder if it would make sense to add also a tag that would
say that the commit is not suitable for stable. It might
help both sides. The maintainers will be able to share
their opinion and eventually reduce mails from autosel.
You would get feedback that maintainers considered
the patch for stable. It might be even useful for
teaching the AI.

> >Now, there are only hand full of printk patches in each
> >release, so it is still doable. I just do not understand
> >how other maintainers, from much more busy subsystems,
> >could cope with this trend.
> 
> So yes, I'm aware that the volume of patches is huge, but there's not
> much I can do about it because it's just a subset of the kernel's patch
> volume and since the kernel gets more and more patches each release, the
> volume of stable commits is bound to grow as well.

Yes, but the grow in the stable is much faster than the grow
in maintain at the moment. It might be fine if it was caused
just by engaging subsystems that ignored stable so far. But
I am not sure if it is the case. Also I am not sure about
your plans.

Anyway, I am surprised that the patches might go into stable
so easily (no response -> accepted). While it is pretty
hard to get through the review process for mainline.

Of course, many patches go into mainline without review
as well. But the difference is that they are pushed by
people that are familiar and responsible for the affected
area.

I could understand the pain. There are surely people that
do not care about stable, because it takes time, it is hard
to make decisions, flashbacks to the old code are painful,
etc. Well, this is the reason why the maintenance support
is and should be limited.

Anyway, I think that it cannot be done reasonably without
maintainers. You should be careful so that even the currently
cooperating maintainers will not start considering autosel
mails as a spam. (It is not my case. printk is small thing.
But I could imagine that it might stop being bearable
in bigger subsystems. As is already the case with xfs.)

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:14                 ` Sasha Levin
  2018-04-16 16:22                   ` Steven Rostedt
@ 2018-04-16 16:28                   ` Pavel Machek
  2018-04-16 16:39                     ` Sasha Levin
  2018-04-16 17:05                   ` Pavel Machek
  2 siblings, 1 reply; 113+ messages in thread
From: Pavel Machek @ 2018-04-16 16:28 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 1036 bytes --]


> >> Is there a reason not to take LED fixes if they fix a bug and don't
> >> cause a regression? Sure, we can draw some arbitrary line, maybe
> >> designate some subsystems that are more "important" than others, but
> >> what's the point?
> >
> >There's a tradeoff.
> >
> >You want to fix serious bugs in stable, and you really don't want
> >regressions in stable. And ... stable not having 1000s of patches
> >would be nice, too.
> 
> I don't think we should use a number cap here, but rather look at the
> regression rate: how many patches broke something?
> 
> Since the rate we're seeing now with AUTOSEL is similar to what we were
> seeing before AUTOSEL, what's the problem it's causing?

Regression rate should not be the only criteria.

More patches mean bigger chance customer's patches will have a
conflict with something in -stable, for example.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:28                   ` Pavel Machek
@ 2018-04-16 16:39                     ` Sasha Levin
  2018-04-16 16:42                       ` Pavel Machek
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 16:39 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Mon, Apr 16, 2018 at 06:28:50PM +0200, Pavel Machek wrote:
>
>> >> Is there a reason not to take LED fixes if they fix a bug and don't
>> >> cause a regression? Sure, we can draw some arbitrary line, maybe
>> >> designate some subsystems that are more "important" than others, but
>> >> what's the point?
>> >
>> >There's a tradeoff.
>> >
>> >You want to fix serious bugs in stable, and you really don't want
>> >regressions in stable. And ... stable not having 1000s of patches
>> >would be nice, too.
>>
>> I don't think we should use a number cap here, but rather look at the
>> regression rate: how many patches broke something?
>>
>> Since the rate we're seeing now with AUTOSEL is similar to what we were
>> seeing before AUTOSEL, what's the problem it's causing?
>
>Regression rate should not be the only criteria.
>
>More patches mean bigger chance customer's patches will have a
>conflict with something in -stable, for example.

Out of tree patches can't be a consideration here. There are no
guarantees for out of tree code, ever.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:39                     ` Sasha Levin
@ 2018-04-16 16:42                       ` Pavel Machek
  2018-04-16 16:45                         ` Sasha Levin
  0 siblings, 1 reply; 113+ messages in thread
From: Pavel Machek @ 2018-04-16 16:42 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 1420 bytes --]

On Mon 2018-04-16 16:39:20, Sasha Levin wrote:
> On Mon, Apr 16, 2018 at 06:28:50PM +0200, Pavel Machek wrote:
> >
> >> >> Is there a reason not to take LED fixes if they fix a bug and don't
> >> >> cause a regression? Sure, we can draw some arbitrary line, maybe
> >> >> designate some subsystems that are more "important" than others, but
> >> >> what's the point?
> >> >
> >> >There's a tradeoff.
> >> >
> >> >You want to fix serious bugs in stable, and you really don't want
> >> >regressions in stable. And ... stable not having 1000s of patches
> >> >would be nice, too.
> >>
> >> I don't think we should use a number cap here, but rather look at the
> >> regression rate: how many patches broke something?
> >>
> >> Since the rate we're seeing now with AUTOSEL is similar to what we were
> >> seeing before AUTOSEL, what's the problem it's causing?
> >
> >Regression rate should not be the only criteria.
> >
> >More patches mean bigger chance customer's patches will have a
> >conflict with something in -stable, for example.
> 
> Out of tree patches can't be a consideration here. There are no
> guarantees for out of tree code, ever.

Out of tree code is not consideration for mainline, agreed. Stable
should be different.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:42                       ` Pavel Machek
@ 2018-04-16 16:45                         ` Sasha Levin
  2018-04-16 16:54                           ` Pavel Machek
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 16:45 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Mon, Apr 16, 2018 at 06:42:30PM +0200, Pavel Machek wrote:
>On Mon 2018-04-16 16:39:20, Sasha Levin wrote:
>> On Mon, Apr 16, 2018 at 06:28:50PM +0200, Pavel Machek wrote:
>> >
>> >> >> Is there a reason not to take LED fixes if they fix a bug and don't
>> >> >> cause a regression? Sure, we can draw some arbitrary line, maybe
>> >> >> designate some subsystems that are more "important" than others, but
>> >> >> what's the point?
>> >> >
>> >> >There's a tradeoff.
>> >> >
>> >> >You want to fix serious bugs in stable, and you really don't want
>> >> >regressions in stable. And ... stable not having 1000s of patches
>> >> >would be nice, too.
>> >>
>> >> I don't think we should use a number cap here, but rather look at the
>> >> regression rate: how many patches broke something?
>> >>
>> >> Since the rate we're seeing now with AUTOSEL is similar to what we were
>> >> seeing before AUTOSEL, what's the problem it's causing?
>> >
>> >Regression rate should not be the only criteria.
>> >
>> >More patches mean bigger chance customer's patches will have a
>> >conflict with something in -stable, for example.
>>
>> Out of tree patches can't be a consideration here. There are no
>> guarantees for out of tree code, ever.
>
>Out of tree code is not consideration for mainline, agreed. Stable
>should be different.

This is a discussion we could have with in right forum, but FYI stable
doesn't even guarantee KABI compatibility between minor versions at this
point.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:45                         ` Sasha Levin
@ 2018-04-16 16:54                           ` Pavel Machek
  2018-04-17 10:50                             ` Greg KH
  0 siblings, 1 reply; 113+ messages in thread
From: Pavel Machek @ 2018-04-16 16:54 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 1920 bytes --]

On Mon 2018-04-16 16:45:16, Sasha Levin wrote:
> On Mon, Apr 16, 2018 at 06:42:30PM +0200, Pavel Machek wrote:
> >On Mon 2018-04-16 16:39:20, Sasha Levin wrote:
> >> On Mon, Apr 16, 2018 at 06:28:50PM +0200, Pavel Machek wrote:
> >> >
> >> >> >> Is there a reason not to take LED fixes if they fix a bug and don't
> >> >> >> cause a regression? Sure, we can draw some arbitrary line, maybe
> >> >> >> designate some subsystems that are more "important" than others, but
> >> >> >> what's the point?
> >> >> >
> >> >> >There's a tradeoff.
> >> >> >
> >> >> >You want to fix serious bugs in stable, and you really don't want
> >> >> >regressions in stable. And ... stable not having 1000s of patches
> >> >> >would be nice, too.
> >> >>
> >> >> I don't think we should use a number cap here, but rather look at the
> >> >> regression rate: how many patches broke something?
> >> >>
> >> >> Since the rate we're seeing now with AUTOSEL is similar to what we were
> >> >> seeing before AUTOSEL, what's the problem it's causing?
> >> >
> >> >Regression rate should not be the only criteria.
> >> >
> >> >More patches mean bigger chance customer's patches will have a
> >> >conflict with something in -stable, for example.
> >>
> >> Out of tree patches can't be a consideration here. There are no
> >> guarantees for out of tree code, ever.
> >
> >Out of tree code is not consideration for mainline, agreed. Stable
> >should be different.
> 
> This is a discussion we could have with in right forum, but FYI stable
> doesn't even guarantee KABI compatibility between minor versions at this
> point.

Stable should be useful base for distributions. They carry out of tree
patches, and yes, you should try to make their lives easy.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:54                           ` Pavel Machek
@ 2018-04-17 10:50                             ` Greg KH
  0 siblings, 0 replies; 113+ messages in thread
From: Greg KH @ 2018-04-17 10:50 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Sasha Levin, Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Mon, Apr 16, 2018 at 06:54:51PM +0200, Pavel Machek wrote:
> On Mon 2018-04-16 16:45:16, Sasha Levin wrote:
> > On Mon, Apr 16, 2018 at 06:42:30PM +0200, Pavel Machek wrote:
> > >On Mon 2018-04-16 16:39:20, Sasha Levin wrote:
> > >> On Mon, Apr 16, 2018 at 06:28:50PM +0200, Pavel Machek wrote:
> > >> >
> > >> >> >> Is there a reason not to take LED fixes if they fix a bug and don't
> > >> >> >> cause a regression? Sure, we can draw some arbitrary line, maybe
> > >> >> >> designate some subsystems that are more "important" than others, but
> > >> >> >> what's the point?
> > >> >> >
> > >> >> >There's a tradeoff.
> > >> >> >
> > >> >> >You want to fix serious bugs in stable, and you really don't want
> > >> >> >regressions in stable. And ... stable not having 1000s of patches
> > >> >> >would be nice, too.
> > >> >>
> > >> >> I don't think we should use a number cap here, but rather look at the
> > >> >> regression rate: how many patches broke something?
> > >> >>
> > >> >> Since the rate we're seeing now with AUTOSEL is similar to what we were
> > >> >> seeing before AUTOSEL, what's the problem it's causing?
> > >> >
> > >> >Regression rate should not be the only criteria.
> > >> >
> > >> >More patches mean bigger chance customer's patches will have a
> > >> >conflict with something in -stable, for example.
> > >>
> > >> Out of tree patches can't be a consideration here. There are no
> > >> guarantees for out of tree code, ever.
> > >
> > >Out of tree code is not consideration for mainline, agreed. Stable
> > >should be different.
> > 
> > This is a discussion we could have with in right forum, but FYI stable
> > doesn't even guarantee KABI compatibility between minor versions at this
> > point.
> 
> Stable should be useful base for distributions. They carry out of tree
> patches, and yes, you should try to make their lives easy.

How do you know I already don't do that?

But no, in the end, it's not my job to make their life easier if they
are off in their own corner never providing me feedback or help.  For
those companies/distros/SoCs that do provide that feedback, I am glad to
work with them.

As proof of that, there are at least 3 "major" SoC vendors that have
been merging every one of the stable releases into their internal trees
for the past 6+ months now.  I get reports when they do the merge and
test, and so far, we have only had 1 regression.  And that regression
was because that SoC vendor forgot to upstream a patch that they had in
their internal tree (i.e. they fixed it a while ago but forgot to tell
anyone else, nothing we can do about that.)

So if you are a distro/company/whatever that takes stable releases, and
have run into problems, please let me know, and I will be glad to work
with you.

If you are not that, then please don't attempt to speak for them...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:14                 ` Sasha Levin
  2018-04-16 16:22                   ` Steven Rostedt
  2018-04-16 16:28                   ` Pavel Machek
@ 2018-04-16 17:05                   ` Pavel Machek
  2018-04-16 17:16                     ` Sasha Levin
  2 siblings, 1 reply; 113+ messages in thread
From: Pavel Machek @ 2018-04-16 17:05 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 764 bytes --]

Hi!

> How do you know if a bug bothers someone?
> 
> If a user is annoyed by a LED issue, is he expected to triage the bug,
> report it on LKML and patiently wait for the appropriate patch to be
> backported?

If the user is annoyed by a LED issue, you are actually expected to
tell him that it is not going to be fixed, because it is not on the list:

 - It must fix a problem that causes a build error (but not for things
    marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
    security issue, or some "oh, that's not good" issue.  In short,
    something critical.
       
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 17:05                   ` Pavel Machek
@ 2018-04-16 17:16                     ` Sasha Levin
  2018-04-16 17:44                       ` Steven Rostedt
  2018-04-16 20:17                       ` Jiri Kosina
  0 siblings, 2 replies; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 17:16 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Mon, Apr 16, 2018 at 07:05:01PM +0200, Pavel Machek wrote:
>Hi!
>
>> How do you know if a bug bothers someone?
>>
>> If a user is annoyed by a LED issue, is he expected to triage the bug,
>> report it on LKML and patiently wait for the appropriate patch to be
>> backported?
>
>If the user is annoyed by a LED issue, you are actually expected to
>tell him that it is not going to be fixed, because it is not on the list:
>
> - It must fix a problem that causes a build error (but not for things
>    marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
>    security issue, or some "oh, that's not good" issue.  In short,
>    something critical.

So if a user is operating a nuclear power plant, and has 2 leds: green
one that says "All OK!" and a red one saying "NUCLEAR MELTDOWN!", and
once in a blue moon a race condition is causing the red one to go on and
cause panic in the little province he lives in, we should tell that user
to fuck off?

LEDs may not be critical for you, but they can be critical for someone
else. Think of all the different users we have and the wildly different
ways they use the kernel.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 17:16                     ` Sasha Levin
@ 2018-04-16 17:44                       ` Steven Rostedt
  2018-04-16 18:17                         ` Sasha Levin
  2018-04-16 20:17                       ` Jiri Kosina
  1 sibling, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 17:44 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo

On Mon, 16 Apr 2018 17:16:10 +0000
Sasha Levin <Alexander.Levin@microsoft.com> wrote:

> So if a user is operating a nuclear power plant, and has 2 leds: green
> one that says "All OK!" and a red one saying "NUCLEAR MELTDOWN!", and
> once in a blue moon a race condition is causing the red one to go on and
> cause panic in the little province he lives in, we should tell that user
> to fuck off?
> 
> LEDs may not be critical for you, but they can be critical for someone
> else. Think of all the different users we have and the wildly different
> ways they use the kernel.

We can point them to the fix and have them backport it. Or they should
ask their distribution to backport it.

Hopefully they tested the kernel they are using for something like
that, and only want critical fixes. What happens if they take the next
stable assuming that it has critical fixes only, and this fix causes a
regression that creates the "ALL OK!" when it wasn't.

Basically, I rather have stable be more bug compatible with the version
it is based on with only critical fixes (things that will cause an
oops) than to try to be bug compatible with mainline, as then we get
into a state where things are a frankenstein of the stable base version
and mainline. I could say, "Yeah this feature works better on this
4.x version of the kernel" and not worry about "4.x.y" versions having
it better.

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 17:44                       ` Steven Rostedt
@ 2018-04-16 18:17                         ` Sasha Levin
  2018-04-16 18:35                           ` Steven Rostedt
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 18:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo

On Mon, Apr 16, 2018 at 01:44:23PM -0400, Steven Rostedt wrote:
>On Mon, 16 Apr 2018 17:16:10 +0000
>Sasha Levin <Alexander.Levin@microsoft.com> wrote:
>
>
>> So if a user is operating a nuclear power plant, and has 2 leds: green
>> one that says "All OK!" and a red one saying "NUCLEAR MELTDOWN!", and
>> once in a blue moon a race condition is causing the red one to go on and
>> cause panic in the little province he lives in, we should tell that user
>> to fuck off?
>>
>> LEDs may not be critical for you, but they can be critical for someone
>> else. Think of all the different users we have and the wildly different
>> ways they use the kernel.
>
>We can point them to the fix and have them backport it. Or they should
>ask their distribution to backport it.

It may work in your subsystem, but it really doesn't work this way with
the kernel.

Let me share a concrete example with you: there's a vfs bug that's a
pain to reproduce going around. It was originally reported on
CoreOS/AWS:

	https://github.com/coreos/bugs/issues/2356

But our customers reported to us that they're hitting this issue too.

We couldn't reproduce it, and the call trace indicated it may be a
memory corrution. We could however confirm with the customers that the
latest mainline fixes the issue.

Given that we couldn't reproduce it, and neither of us is a fs/ expert,
we sent a mail to LKML, just like you suggested doing:

	https://lkml.org/lkml/2018/3/2/1038

But unlike what you said, no one pointed us to the fix, even though the
issue was fixed on mainline. Heck, no one engaged in any meaningful
conversation about the bug.

I really think that we have a different views as to how well the whole
"let me shoot a mail to LKML" process works, which leads to different
views on -stable.

>Hopefully they tested the kernel they are using for something like
>that, and only want critical fixes. What happens if they take the next
>stable assuming that it has critical fixes only, and this fix causes a
>regression that creates the "ALL OK!" when it wasn't.
>
>Basically, I rather have stable be more bug compatible with the version
>it is based on with only critical fixes (things that will cause an
>oops) than to try to be bug compatible with mainline, as then we get
>into a state where things are a frankenstein of the stable base version
>and mainline. I could say, "Yeah this feature works better on this
>4.x version of the kernel" and not worry about "4.x.y" versions having
>it better.

This is how things used to work, right? Look at redhat kernels for
example, they'd stick with a kernel for tens of years, doing the tiniest
fixes, only when customers complained, and encouraging users to upgrade
only when the kernel would go EoL, and when customers couldn't do that
because they were too locked on that kernel version.

redhat still supports 2.6.9.

I thought we agreed that this is bad? We wanted users to be closer to
mainline, and we can't do it without bringing -stable closer to mainline
as well.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 18:17                         ` Sasha Levin
@ 2018-04-16 18:35                           ` Steven Rostedt
  0 siblings, 0 replies; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 18:35 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo

On Mon, 16 Apr 2018 18:17:17 +0000
Sasha Levin <Alexander.Levin@microsoft.com> wrote:

> I thought we agreed that this is bad? We wanted users to be closer to
> mainline, and we can't do it without bringing -stable closer to mainline
> as well.

I guess the question comes down to, what do the users of stable kernels
want? For my machines, I always stay one or two releases behind
mainline. Right now my kernels are on 4.15.x, and will probably jump to
4.16.x the next time I upgrade my machines. I'm fine with something
breaking every so often as long as it's not data corruption (although I
have lots of backups of my systems in case that happens, just a PITA to
fix it). I only hit bugs on these boxes probably once a year at most in
doing so. But I mostly do what other kernel developers do and that
means the bugs I would mostly hit, other developers hit before their
code is released.

Thus, if stable users are fine with being regression compatible with
mainline, then I'm fine with it too.

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 17:16                     ` Sasha Levin
  2018-04-16 17:44                       ` Steven Rostedt
@ 2018-04-16 20:17                       ` Jiri Kosina
  2018-04-16 20:36                         ` Sasha Levin
  1 sibling, 1 reply; 113+ messages in thread
From: Jiri Kosina @ 2018-04-16 20:17 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Mon, 16 Apr 2018, Sasha Levin wrote:

> So if a user is operating a nuclear power plant, and has 2 leds: green 
> one that says "All OK!" and a red one saying "NUCLEAR MELTDOWN!", and 
> once in a blue moon a race condition is causing the red one to go on and 
> cause panic in the little province he lives in, we should tell that user 
> to fuck off?
> 
> LEDs may not be critical for you, but they can be critical for someone
> else. Think of all the different users we have and the wildly different
> ways they use the kernel.

I am pretty sure that for almost every fix there is a person on a planet 
that'd rate it "critical". We can't really use this as an argument for 
inclusion of code into -stable, as that'd mean that -stable and Linus' 
tree would have to be basically the same.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 20:17                       ` Jiri Kosina
@ 2018-04-16 20:36                         ` Sasha Levin
  2018-04-16 20:43                           ` Jiri Kosina
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 20:36 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Mon, Apr 16, 2018 at 10:17:17PM +0200, Jiri Kosina wrote:
>On Mon, 16 Apr 2018, Sasha Levin wrote:
>
>> So if a user is operating a nuclear power plant, and has 2 leds: green
>> one that says "All OK!" and a red one saying "NUCLEAR MELTDOWN!", and
>> once in a blue moon a race condition is causing the red one to go on and
>> cause panic in the little province he lives in, we should tell that user
>> to fuck off?
>>
>> LEDs may not be critical for you, but they can be critical for someone
>> else. Think of all the different users we have and the wildly different
>> ways they use the kernel.
>
>I am pretty sure that for almost every fix there is a person on a planet
>that'd rate it "critical". We can't really use this as an argument for
>inclusion of code into -stable, as that'd mean that -stable and Linus'

So I think that Linus's claim that users come first applies here as
well. If there's a user that cares about a particular feature being
broken, then we go ahead and fix his bug rather then ignoring him.

>tree would have to be basically the same.

Basically the same minus all new features/subsystems/arch/etc. But yes,
ideally we'd want all bugfixes that go in mainline. Who not?

Instead of keeping bug fixes out, we need to work on improving our
testing story. Instead of ignoring that "person that'd rate it critical"
we should add his usecase into our testing matrix.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 20:36                         ` Sasha Levin
@ 2018-04-16 20:43                           ` Jiri Kosina
  2018-04-16 21:18                             ` Sasha Levin
  0 siblings, 1 reply; 113+ messages in thread
From: Jiri Kosina @ 2018-04-16 20:43 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Mon, 16 Apr 2018, Sasha Levin wrote:

> So I think that Linus's claim that users come first applies here as
> well. If there's a user that cares about a particular feature being
> broken, then we go ahead and fix his bug rather then ignoring him.

So one extreme is fixing -stable *iff* users actually do report an issue.

The other extreme is backporting everything that potentially looks like a 
potential fix of "something" (according to some arbitrary metric), 
pro-actively.

The former voilates the "users first" rule, the latter has a very, very 
high risk of regressions.

So this whole debate is about finding a compromise.

My gut feeling always was that the statement in

	Documentation/process/stable-kernel-rules.rst

is very reasonable, but making the process way more "aggresive" when 
backporting patches is breaking much of its original spirit for me.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 20:43                           ` Jiri Kosina
@ 2018-04-16 21:18                             ` Sasha Levin
  2018-04-16 21:28                               ` Jiri Kosina
  2018-05-03  9:47                               ` Pavel Machek
  0 siblings, 2 replies; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 21:18 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote:
>On Mon, 16 Apr 2018, Sasha Levin wrote:
>
>> So I think that Linus's claim that users come first applies here as
>> well. If there's a user that cares about a particular feature being
>> broken, then we go ahead and fix his bug rather then ignoring him.
>
>So one extreme is fixing -stable *iff* users actually do report an issue.
>
>The other extreme is backporting everything that potentially looks like a
>potential fix of "something" (according to some arbitrary metric),
>pro-actively.
>
>The former voilates the "users first" rule, the latter has a very, very
>high risk of regressions.
>
>So this whole debate is about finding a compromise.
>
>My gut feeling always was that the statement in
>
>	Documentation/process/stable-kernel-rules.rst
>
>is very reasonable, but making the process way more "aggresive" when
>backporting patches is breaking much of its original spirit for me.

I agree that as an enterprise distro taking everything from -stable
isn't the best idea. Ideally you'd want to be close to the first
extreme you've mentioned and only take commits if customers are asking
you to do so.

I think that the rule we're trying to agree upon is the "It must fix
a real bug that bothers people".

I think that we can agree that it's impossible to expect every single
Linux user to go on LKML and complain about a bug he encountered, so the
rule quickly becomes "It must fix a real bug that can bother people".

My "aggressiveness" comes from the whole "bother" part: it doesn't have
to be critical, it doesn't have to cause data corruption, it doesn't
have to be a security issue. It's enough that the bug actually affects a
user in a way he didn't expect it to (if a user doesn't have
expectations, it would fall under the "This could be a problem..."
exception.

We can go into a discussion about what exactly "bothering" is, but on
the flip side, the whole -stable tag is just a way for folks to indicate
they want a given patch reviewed for stable, it's not actually a
guarantee of whether the patch will go in to -stable or not.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 21:18                             ` Sasha Levin
@ 2018-04-16 21:28                               ` Jiri Kosina
  2018-04-17 10:39                                 ` Greg KH
  2018-05-03  9:47                               ` Pavel Machek
  1 sibling, 1 reply; 113+ messages in thread
From: Jiri Kosina @ 2018-04-16 21:28 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Pavel Machek, Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Mon, 16 Apr 2018, Sasha Levin wrote:

> I agree that as an enterprise distro taking everything from -stable
> isn't the best idea. Ideally you'd want to be close to the first
> extreme you've mentioned and only take commits if customers are asking
> you to do so.
> 
> I think that the rule we're trying to agree upon is the "It must fix
> a real bug that bothers people".
> 
> I think that we can agree that it's impossible to expect every single
> Linux user to go on LKML and complain about a bug he encountered, so the
> rule quickly becomes "It must fix a real bug that can bother people".

So is there a reason why stable couldn't become some hybrid-form union of

- really critical issues (data corruption, boot issues, severe security 
  issues) taken from bleeding edge upstream
- [reviewed] cherry-picks of functional fixes from major distro kernels 
  (based on that very -stable release), as that's apparently what people 
  are hitting in the real world with that particular kernel

?

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 21:28                               ` Jiri Kosina
@ 2018-04-17 10:39                                 ` Greg KH
  2018-04-17 11:07                                   ` Michal Hocko
  2018-04-17 11:21                                   ` Jiri Kosina
  0 siblings, 2 replies; 113+ messages in thread
From: Greg KH @ 2018-04-17 10:39 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Sasha Levin, Pavel Machek, Linus Torvalds, Steven Rostedt,
	Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
> On Mon, 16 Apr 2018, Sasha Levin wrote:
> 
> > I agree that as an enterprise distro taking everything from -stable
> > isn't the best idea. Ideally you'd want to be close to the first
> > extreme you've mentioned and only take commits if customers are asking
> > you to do so.
> > 
> > I think that the rule we're trying to agree upon is the "It must fix
> > a real bug that bothers people".
> > 
> > I think that we can agree that it's impossible to expect every single
> > Linux user to go on LKML and complain about a bug he encountered, so the
> > rule quickly becomes "It must fix a real bug that can bother people".
> 
> So is there a reason why stable couldn't become some hybrid-form union of
> 
> - really critical issues (data corruption, boot issues, severe security 
>   issues) taken from bleeding edge upstream
> - [reviewed] cherry-picks of functional fixes from major distro kernels 
>   (based on that very -stable release), as that's apparently what people 
>   are hitting in the real world with that particular kernel

It already is that :)

The problem Sasha is trying to solve here is that for many subsystems,
maintainers do not mark patches for stable at all.  So real bugfixes
that do hit people are not getting to those kernels, which force the
distros to do extra work to triage a bug, dig through upstream kernels,
find and apply the patch.

By identifying the patches that should have been marked for stable,
based on the ways that the changelog text is written and the logic in
the patch itself, we circumvent that extra annoyance of users hitting
problems and complaining, or ignoring them and hoping they go away if
they reboot.

I've been doing this "by hand" for many years now, with no complaints so
far.  Sasha has taken it to the next level as I don't scale and has
started to automate it using some really nice tools.  That's all, this
isn't crazy new features being backported, it's just patches that are
obviously fixes being added to the stable tree.

Yes, sometimes those fixes need additional fixes, and that's fine,
normal stable-marked patches need that all the time.  I don't see anyone
complaining about that, right?

So nothing "new" is happening here, EXCEPT we are actually starting to
get a better kernel-wide coverage for stable fixes, which we have not
had in the past.  That's a good thing!  The number of patches applied to
stable is still a very very very tiny % compared to mainline, so nothing
new is happening here.

Oh, and if you do want to complain about huge new features being
backported, look at the mess that Spectre and Meltdown has caused in the
stable trees.  I don't see anyone complaining about those massive
changes :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 10:39                                 ` Greg KH
@ 2018-04-17 11:07                                   ` Michal Hocko
  2018-04-17 14:04                                     ` Sasha Levin
  2018-04-17 11:21                                   ` Jiri Kosina
  1 sibling, 1 reply; 113+ messages in thread
From: Michal Hocko @ 2018-04-17 11:07 UTC (permalink / raw)
  To: Greg KH
  Cc: Jiri Kosina, Sasha Levin, Pavel Machek, Linus Torvalds,
	Steven Rostedt, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue 17-04-18 12:39:36, Greg KH wrote:
> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
> > On Mon, 16 Apr 2018, Sasha Levin wrote:
> > 
> > > I agree that as an enterprise distro taking everything from -stable
> > > isn't the best idea. Ideally you'd want to be close to the first
> > > extreme you've mentioned and only take commits if customers are asking
> > > you to do so.
> > > 
> > > I think that the rule we're trying to agree upon is the "It must fix
> > > a real bug that bothers people".
> > > 
> > > I think that we can agree that it's impossible to expect every single
> > > Linux user to go on LKML and complain about a bug he encountered, so the
> > > rule quickly becomes "It must fix a real bug that can bother people".
> > 
> > So is there a reason why stable couldn't become some hybrid-form union of
> > 
> > - really critical issues (data corruption, boot issues, severe security 
> >   issues) taken from bleeding edge upstream
> > - [reviewed] cherry-picks of functional fixes from major distro kernels 
> >   (based on that very -stable release), as that's apparently what people 
> >   are hitting in the real world with that particular kernel
> 
> It already is that :)
> 
> The problem Sasha is trying to solve here is that for many subsystems,
> maintainers do not mark patches for stable at all.

The way he is trying to do that is just wrong. Generate a pressure on
those subsystems by referring to bug reports and unhappy users and I am
pretty sure they will try harder... You cannot solve the problem by
bypassing them without having deep understanding of the specific
subsytem. Once you have it, just make sure you are part of the review
process and make sure to mark patches before they are merged.

> So real bugfixes
> that do hit people are not getting to those kernels, which force the
> distros to do extra work to triage a bug, dig through upstream kernels,
> find and apply the patch.

I would say that this is the primary role of the distro. To hide the
jungle of the upstream work and provide the additional of bug filtering
and forwarding them the right direction.

> By identifying the patches that should have been marked for stable,
> based on the ways that the changelog text is written and the logic in
> the patch itself, we circumvent that extra annoyance of users hitting
> problems and complaining, or ignoring them and hoping they go away if
> they reboot.

Well, but this is a two edge sword. You are not only backporting obvious
bug fixes but also pulling many patch out of the context they were
merged to and double checking all the assumptions are still true is a
non-trivial task to do. I am still not convinced any script or AI can do
that right now.

> I've been doing this "by hand" for many years now, with no complaints so
> far.

Really? I remember quite some complains about broken stable releases and
also many discussions on KS how the current workflow doesn't really work
for some users (e.g. distributions).

> Sasha has taken it to the next level as I don't scale and has
> started to automate it using some really nice tools.  That's all, this
> isn't crazy new features being backported, it's just patches that are
> obviously fixes being added to the stable tree.

I have yet to see a tool which can recognize an "obvious fix".
Seriously! Matching keywords in the changelog and some pattern
recognition in the diff can help to do some pre filtering _can_ help a
lot but there is still a human interaction needed to do sanity checking.
And that really requires deep subsystem knowledge. I really fail to see
how that can work without relevant people involvement. Pretending that
you can do stable without maintainers will simply not work IMNHO.

> Yes, sometimes those fixes need additional fixes, and that's fine,
> normal stable-marked patches need that all the time.  I don't see anyone
> complaining about that, right?
> 
> So nothing "new" is happening here, EXCEPT we are actually starting to
> get a better kernel-wide coverage for stable fixes, which we have not
> had in the past.  That's a good thing!  The number of patches applied to
> stable is still a very very very tiny % compared to mainline, so nothing
> new is happening here.

yes I do agree, the stable process is not very much different from the
past and I would tend both processes broken because they explicitly try
to avoid maintainers which is just wrong.

> Oh, and if you do want to complain about huge new features being
> backported, look at the mess that Spectre and Meltdown has caused in the
> stable trees.  I don't see anyone complaining about those massive
> changes :)

Are you serious? Are you going the compare the biggest PITA that the
community had to undergo because of HW issues with random pattern
matching in changelog/diffs? Come on!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 11:07                                   ` Michal Hocko
@ 2018-04-17 14:04                                     ` Sasha Levin
  2018-04-17 14:15                                       ` Steven Rostedt
  2018-04-17 14:36                                       ` Michal Hocko
  0 siblings, 2 replies; 113+ messages in thread
From: Sasha Levin @ 2018-04-17 14:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Greg KH, Jiri Kosina, Pavel Machek, Linus Torvalds,
	Steven Rostedt, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 12:39:36, Greg KH wrote:
>> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
>> > On Mon, 16 Apr 2018, Sasha Levin wrote:
>> >
>> > > I agree that as an enterprise distro taking everything from -stable
>> > > isn't the best idea. Ideally you'd want to be close to the first
>> > > extreme you've mentioned and only take commits if customers are asking
>> > > you to do so.
>> > >
>> > > I think that the rule we're trying to agree upon is the "It must fix
>> > > a real bug that bothers people".
>> > >
>> > > I think that we can agree that it's impossible to expect every single
>> > > Linux user to go on LKML and complain about a bug he encountered, so the
>> > > rule quickly becomes "It must fix a real bug that can bother people".
>> >
>> > So is there a reason why stable couldn't become some hybrid-form union of
>> >
>> > - really critical issues (data corruption, boot issues, severe security
>> >   issues) taken from bleeding edge upstream
>> > - [reviewed] cherry-picks of functional fixes from major distro kernels
>> >   (based on that very -stable release), as that's apparently what people
>> >   are hitting in the real world with that particular kernel
>>
>> It already is that :)
>>
>> The problem Sasha is trying to solve here is that for many subsystems,
>> maintainers do not mark patches for stable at all.
>
>The way he is trying to do that is just wrong. Generate a pressure on
>those subsystems by referring to bug reports and unhappy users and I am
>pretty sure they will try harder... You cannot solve the problem by
>bypassing them without having deep understanding of the specific
>subsytem. Once you have it, just make sure you are part of the review
>process and make sure to mark patches before they are merged.

I think we just don't agree on how we should "pressure".

Look at the discussion I had with the XFS folks who just don't want to
deal with this -stable thing because they have to much work upstream.

There wasn't a single patch in -stable coming from XFS for the past 6+
months. I'm aware of more than one way to corrupt an XFS volume for any
distro that uses a kernel older than 4.15.

Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be
better about it, but I don't see this changing.

The solution to this, in my opinion, is to automate the whole selection
and review process. We do selection using AI, and we run every possible
test that's relevant to that subsystem.

At which point, the amount of work a human needs to do to review a patch
shrinks into something far more managable for some maintainers.

>> So real bugfixes
>> that do hit people are not getting to those kernels, which force the
>> distros to do extra work to triage a bug, dig through upstream kernels,
>> find and apply the patch.
>
>I would say that this is the primary role of the distro. To hide the
>jungle of the upstream work and provide the additional of bug filtering
>and forwarding them the right direction.

More often than triaging, you'll just be asked to upgrade to the latest
version. What sort of user experience does that provide?

[snip]

>> So nothing "new" is happening here, EXCEPT we are actually starting to
>> get a better kernel-wide coverage for stable fixes, which we have not
>> had in the past.  That's a good thing!  The number of patches applied to
>> stable is still a very very very tiny % compared to mainline, so nothing
>> new is happening here.
>
>yes I do agree, the stable process is not very much different from the
>past and I would tend both processes broken because they explicitly try
>to avoid maintainers which is just wrong.

Avoid maintainers?! We send so much "spam" trying to get maintainers
more involved in the process. How is that avoiding them?

If you're a maintainer who has specific requirements for the -stable
flow, or you have any automated testing you'd like to be run on these
commits, or you want these mails to come in a different format, or
pretty much anything else at all just shoot me a mail!

It's been almost impossible to get maintainers involved in this process.

We don't sneak anything past maintainers, there are multiple mails over
multiple weeks for each commit that would go in. You don't have to
review it right away either, just reply with "please don't merge until
I'm done reviewing" and it'll get removed from the queue.

>> Oh, and if you do want to complain about huge new features being
>> backported, look at the mess that Spectre and Meltdown has caused in the
>> stable trees.  I don't see anyone complaining about those massive
>> changes :)
>
>Are you serious? Are you going the compare the biggest PITA that the
>community had to undergo because of HW issues with random pattern
>matching in changelog/diffs? Come on!

HW Issues are irrelevant here. You had a bug that allowed arbitrary
kernel memory access. I can easily list quite a few commits, that are
not tagged for stable, that fix exactly the same thing.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 14:04                                     ` Sasha Levin
@ 2018-04-17 14:15                                       ` Steven Rostedt
  2018-04-17 14:36                                         ` Greg KH
  2018-04-17 14:36                                       ` Michal Hocko
  1 sibling, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2018-04-17 14:15 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Michal Hocko, Greg KH, Jiri Kosina, Pavel Machek, Linus Torvalds,
	Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Vlastimil Babka,
	Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	Byungchul Park, Tejun Heo

On Tue, 17 Apr 2018 14:04:36 +0000
Sasha Levin <Alexander.Levin@microsoft.com> wrote:

> The solution to this, in my opinion, is to automate the whole selection
> and review process. We do selection using AI, and we run every possible
> test that's relevant to that subsystem.
> 
> At which point, the amount of work a human needs to do to review a patch
> shrinks into something far more managable for some maintainers.

I guess the real question is, who are the stable kernels for? Is it just
a place to look at to see what distros should think about. A superset
of what distros would take. Then distros would have a nice place to
look to find what patches they should look at. But the stable tree
itself wont be used. But it's not being used today by major distros
(Red Hat and SuSE). Debian may be using it, but that's because the
stable maintainer for its kernels is also the Debian maintainer.

Who are the customers of the stable trees? They are the ones that
should be determining the "equation" for what goes into it.

Personally, I use stable as a one off from mainline. Like I mentioned
in another email. I'm currently on 4.15.x and will probably move to
4.16.x next. Unless there's some critical bug announcement, I update my
machines once a month. I originally just used mainline, but that was a
bit too unstable for my work machines ;-)

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 14:15                                       ` Steven Rostedt
@ 2018-04-17 14:36                                         ` Greg KH
  0 siblings, 0 replies; 113+ messages in thread
From: Greg KH @ 2018-04-17 14:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sasha Levin, Michal Hocko, Jiri Kosina, Pavel Machek,
	Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue, Apr 17, 2018 at 10:15:02AM -0400, Steven Rostedt wrote:
> On Tue, 17 Apr 2018 14:04:36 +0000
> Sasha Levin <Alexander.Levin@microsoft.com> wrote:
> 
> > The solution to this, in my opinion, is to automate the whole selection
> > and review process. We do selection using AI, and we run every possible
> > test that's relevant to that subsystem.
> > 
> > At which point, the amount of work a human needs to do to review a patch
> > shrinks into something far more managable for some maintainers.
> 
> I guess the real question is, who are the stable kernels for? Is it just
> a place to look at to see what distros should think about. A superset
> of what distros would take. Then distros would have a nice place to
> look to find what patches they should look at. But the stable tree
> itself wont be used. But it's not being used today by major distros
> (Red Hat and SuSE). Debian may be using it, but that's because the
> stable maintainer for its kernels is also the Debian maintainer.
> 
> Who are the customers of the stable trees? They are the ones that
> should be determining the "equation" for what goes into it.

The "customers" of the stable trees are anyone who uses Linux.

Right now, it's estimated that only about 1/3 of the kernels running out
there, at the best, are an "enterprise" kernel/distro.  2/3 of the world
either run a kernel.org-based release + their own patches, or Debian.
And Debian piggy-backs on the stable kernel releases pretty regularily.

So the majority of the Linux users out there are what we are doing this
for.  Those that do not pay for a company to sift through things and
only cherry-pick what they want to pick out (hint, they almost always
miss things, some do this better than others...)

That's who this is all for, which is why we are doing our best to keep
on top of the avalanche of patches going into upstream to get the needed
fixes (both security and "normal" fixes) out to users as soon as
possible.

So again, if you are a subsystem maintainer, tag your patches for
stable.  If you do not, you will get automated emails asking you about
patches that should be applied (like the one that started this thread).
If you want to just have us ignore your subsystem entirely, we will be
glad to do so, and we will tell the world to not use your subsystem if
at all possible (see previous comments about xfs, and I would argue IB
right now...)

> Personally, I use stable as a one off from mainline. Like I mentioned
> in another email. I'm currently on 4.15.x and will probably move to
> 4.16.x next. Unless there's some critical bug announcement, I update my
> machines once a month. I originally just used mainline, but that was a
> bit too unstable for my work machines ;-)

That's great, you are a user of these trees then.  So you benifit
directly, along with everyone else who relies on them.

And again, I'm working with the SoC vendors to directly incorporate
these trees into their device trees, and I've already seen some devices
in the wild push out updated 4.4.y kernels a few weeks after they are
released.  That's the end goal here, to have the world's devices in a
much more secure and stable shape by relying on these kernels.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 14:04                                     ` Sasha Levin
  2018-04-17 14:15                                       ` Steven Rostedt
@ 2018-04-17 14:36                                       ` Michal Hocko
  2018-04-17 14:55                                         ` Sasha Levin
  1 sibling, 1 reply; 113+ messages in thread
From: Michal Hocko @ 2018-04-17 14:36 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Greg KH, Jiri Kosina, Pavel Machek, Linus Torvalds,
	Steven Rostedt, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue 17-04-18 14:04:36, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote:
> >On Tue 17-04-18 12:39:36, Greg KH wrote:
> >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
> >> > On Mon, 16 Apr 2018, Sasha Levin wrote:
> >> >
> >> > > I agree that as an enterprise distro taking everything from -stable
> >> > > isn't the best idea. Ideally you'd want to be close to the first
> >> > > extreme you've mentioned and only take commits if customers are asking
> >> > > you to do so.
> >> > >
> >> > > I think that the rule we're trying to agree upon is the "It must fix
> >> > > a real bug that bothers people".
> >> > >
> >> > > I think that we can agree that it's impossible to expect every single
> >> > > Linux user to go on LKML and complain about a bug he encountered, so the
> >> > > rule quickly becomes "It must fix a real bug that can bother people".
> >> >
> >> > So is there a reason why stable couldn't become some hybrid-form union of
> >> >
> >> > - really critical issues (data corruption, boot issues, severe security
> >> >   issues) taken from bleeding edge upstream
> >> > - [reviewed] cherry-picks of functional fixes from major distro kernels
> >> >   (based on that very -stable release), as that's apparently what people
> >> >   are hitting in the real world with that particular kernel
> >>
> >> It already is that :)
> >>
> >> The problem Sasha is trying to solve here is that for many subsystems,
> >> maintainers do not mark patches for stable at all.
> >
> >The way he is trying to do that is just wrong. Generate a pressure on
> >those subsystems by referring to bug reports and unhappy users and I am
> >pretty sure they will try harder... You cannot solve the problem by
> >bypassing them without having deep understanding of the specific
> >subsytem. Once you have it, just make sure you are part of the review
> >process and make sure to mark patches before they are merged.
> 
> I think we just don't agree on how we should "pressure".
> 
> Look at the discussion I had with the XFS folks who just don't want to
> deal with this -stable thing because they have to much work upstream.

So do you really think that you or any script decide without them? My
recollection from that discussion was quite opposite. Dave was quite
clear that most of fixes are quite hard to evaluate and most of them
are simply not worth risking the backport.

> There wasn't a single patch in -stable coming from XFS for the past 6+
> months. I'm aware of more than one way to corrupt an XFS volume for any
> distro that uses a kernel older than 4.15.

Then try to poke/bribe somebody to have it fixed. But applying
_something_ is just not a solution. You should also evaluate whether "I
am able to corrupt" is something that "people see in the wild". Sure
there are zillions of bugs hidden in the large code base like the
kernel. People just do not tend to hit them and this will likely not
change very much in the future.

> Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be
> better about it, but I don't see this changing.

I can surely have one or two and discuss this. I am pretty sure xfs guys
are not going to pretend older kernels do not exist.

> The solution to this, in my opinion, is to automate the whole selection
> and review process. We do selection using AI, and we run every possible
> test that's relevant to that subsystem.
> 
> At which point, the amount of work a human needs to do to review a patch
> shrinks into something far more managable for some maintainers.

I really disagree. I am pretty sure maintainers are very well aware of
how the patch is important. Some do no care about stable and I agree you
should poke those. But some have really good reasons to not throw many
patches that direction because they do not feel the patch is important
enough.

Remember this is not about numbers. The more is not always better.

> >> So real bugfixes
> >> that do hit people are not getting to those kernels, which force the
> >> distros to do extra work to triage a bug, dig through upstream kernels,
> >> find and apply the patch.
> >
> >I would say that this is the primary role of the distro. To hide the
> >jungle of the upstream work and provide the additional of bug filtering
> >and forwarding them the right direction.
> 
> More often than triaging, you'll just be asked to upgrade to the latest
> version. What sort of user experience does that provide?
> 
> [snip]
> 
> >> So nothing "new" is happening here, EXCEPT we are actually starting to
> >> get a better kernel-wide coverage for stable fixes, which we have not
> >> had in the past.  That's a good thing!  The number of patches applied to
> >> stable is still a very very very tiny % compared to mainline, so nothing
> >> new is happening here.
> >
> >yes I do agree, the stable process is not very much different from the
> >past and I would tend both processes broken because they explicitly try
> >to avoid maintainers which is just wrong.
> 
> Avoid maintainers?! We send so much "spam" trying to get maintainers
> more involved in the process. How is that avoiding them?

Just read what your wrote again. I am pretty sure AUTOSEL is on filter
list on many people. We have a good volume of email traffic already and
seeing more automatic one just doesn't help. At all!

> If you're a maintainer who has specific requirements for the -stable
> flow, or you have any automated testing you'd like to be run on these
> commits, or you want these mails to come in a different format, or
> pretty much anything else at all just shoot me a mail!
> 
> It's been almost impossible to get maintainers involved in this process.

The whole stable history was that about not bothering maintainers and
here is the result.

> We don't sneak anything past maintainers, there are multiple mails over
> multiple weeks for each commit that would go in. You don't have to
> review it right away either, just reply with "please don't merge until
> I'm done reviewing" and it'll get removed from the queue.

I am not talking about sneaking or pushing behind the backs. I am just
saying that you cannot do this without direct involvement of
maintainers. If they do not respond to bug reports should at them and I
am pretty sure that those subsystems will get a bigger pressure to find
their way to select _important_ fixes to users who are not running the
bleeding edge because those users _matter_ as well (maybe even more
because they are a much larger group).

> >> Oh, and if you do want to complain about huge new features being
> >> backported, look at the mess that Spectre and Meltdown has caused in the
> >> stable trees.  I don't see anyone complaining about those massive
> >> changes :)
> >
> >Are you serious? Are you going the compare the biggest PITA that the
> >community had to undergo because of HW issues with random pattern
> >matching in changelog/diffs? Come on!
> 
> HW Issues are irrelevant here. You had a bug that allowed arbitrary
> kernel memory access. I can easily list quite a few commits, that are
> not tagged for stable, that fix exactly the same thing.

Those are important fixes and if you are aware of them then you should
be involving the respective maintainer. I haven't heard about _any_
maintainer who would refuse to help.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 14:36                                       ` Michal Hocko
@ 2018-04-17 14:55                                         ` Sasha Levin
  2018-04-17 15:52                                           ` Jiri Kosina
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-17 14:55 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Greg KH, Jiri Kosina, Pavel Machek, Linus Torvalds,
	Steven Rostedt, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue, Apr 17, 2018 at 04:36:31PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 14:04:36, Sasha Levin wrote:
>> On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote:
>> >On Tue 17-04-18 12:39:36, Greg KH wrote:
>> >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
>> >> > On Mon, 16 Apr 2018, Sasha Levin wrote:
>> >> >
>> >> > > I agree that as an enterprise distro taking everything from -stable
>> >> > > isn't the best idea. Ideally you'd want to be close to the first
>> >> > > extreme you've mentioned and only take commits if customers are asking
>> >> > > you to do so.
>> >> > >
>> >> > > I think that the rule we're trying to agree upon is the "It must fix
>> >> > > a real bug that bothers people".
>> >> > >
>> >> > > I think that we can agree that it's impossible to expect every single
>> >> > > Linux user to go on LKML and complain about a bug he encountered, so the
>> >> > > rule quickly becomes "It must fix a real bug that can bother people".
>> >> >
>> >> > So is there a reason why stable couldn't become some hybrid-form union of
>> >> >
>> >> > - really critical issues (data corruption, boot issues, severe security
>> >> >   issues) taken from bleeding edge upstream
>> >> > - [reviewed] cherry-picks of functional fixes from major distro kernels
>> >> >   (based on that very -stable release), as that's apparently what people
>> >> >   are hitting in the real world with that particular kernel
>> >>
>> >> It already is that :)
>> >>
>> >> The problem Sasha is trying to solve here is that for many subsystems,
>> >> maintainers do not mark patches for stable at all.
>> >
>> >The way he is trying to do that is just wrong. Generate a pressure on
>> >those subsystems by referring to bug reports and unhappy users and I am
>> >pretty sure they will try harder... You cannot solve the problem by
>> >bypassing them without having deep understanding of the specific
>> >subsytem. Once you have it, just make sure you are part of the review
>> >process and make sure to mark patches before they are merged.
>>
>> I think we just don't agree on how we should "pressure".
>>
>> Look at the discussion I had with the XFS folks who just don't want to
>> deal with this -stable thing because they have to much work upstream.
>
>So do you really think that you or any script decide without them? My
>recollection from that discussion was quite opposite. Dave was quite
>clear that most of fixes are quite hard to evaluate and most of them
>are simply not worth risking the backport.

No, *some* fixes are hard, not most.

I'm not trying to decide for them, I'm trying to help them decide.

>> There wasn't a single patch in -stable coming from XFS for the past 6+
>> months. I'm aware of more than one way to corrupt an XFS volume for any
>> distro that uses a kernel older than 4.15.
>
>Then try to poke/bribe somebody to have it fixed. But applying
>_something_ is just not a solution. You should also evaluate whether "I
>am able to corrupt" is something that "people see in the wild". Sure
>there are zillions of bugs hidden in the large code base like the
>kernel. People just do not tend to hit them and this will likely not
>change very much in the future.

We can't ignore bugs just because people don't notice.

Data corruption bugs in particular are a pain to report as well, the
corruption might have happened months before and there's not much to
report at that point.

There's quite a few bug classes like that.

>> Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be
>> better about it, but I don't see this changing.
>
>I can surely have one or two and discuss this. I am pretty sure xfs guys
>are not going to pretend older kernels do not exist.
>
>> The solution to this, in my opinion, is to automate the whole selection
>> and review process. We do selection using AI, and we run every possible
>> test that's relevant to that subsystem.
>>
>> At which point, the amount of work a human needs to do to review a patch
>> shrinks into something far more managable for some maintainers.
>
>I really disagree. I am pretty sure maintainers are very well aware of
>how the patch is important. Some do no care about stable and I agree you
>should poke those. But some have really good reasons to not throw many
>patches that direction because they do not feel the patch is important
>enough.
>
>Remember this is not about numbers. The more is not always better.

So what is "important"? Look at the XFS issues, they were important
enough to get fixed upstream, and have an appropriate test added to
xfstests.

Why didn't they go back to -stable?

>> >> So real bugfixes
>> >> that do hit people are not getting to those kernels, which force the
>> >> distros to do extra work to triage a bug, dig through upstream kernels,
>> >> find and apply the patch.
>> >
>> >I would say that this is the primary role of the distro. To hide the
>> >jungle of the upstream work and provide the additional of bug filtering
>> >and forwarding them the right direction.
>>
>> More often than triaging, you'll just be asked to upgrade to the latest
>> version. What sort of user experience does that provide?
>>
>> [snip]
>>
>> >> So nothing "new" is happening here, EXCEPT we are actually starting to
>> >> get a better kernel-wide coverage for stable fixes, which we have not
>> >> had in the past.  That's a good thing!  The number of patches applied to
>> >> stable is still a very very very tiny % compared to mainline, so nothing
>> >> new is happening here.
>> >
>> >yes I do agree, the stable process is not very much different from the
>> >past and I would tend both processes broken because they explicitly try
>> >to avoid maintainers which is just wrong.
>>
>> Avoid maintainers?! We send so much "spam" trying to get maintainers
>> more involved in the process. How is that avoiding them?
>
>Just read what your wrote again. I am pretty sure AUTOSEL is on filter
>list on many people. We have a good volume of email traffic already and
>seeing more automatic one just doesn't help. At all!
>
>> If you're a maintainer who has specific requirements for the -stable
>> flow, or you have any automated testing you'd like to be run on these
>> commits, or you want these mails to come in a different format, or
>> pretty much anything else at all just shoot me a mail!
>>
>> It's been almost impossible to get maintainers involved in this process.
>
>The whole stable history was that about not bothering maintainers and
>here is the result.
>
>> We don't sneak anything past maintainers, there are multiple mails over
>> multiple weeks for each commit that would go in. You don't have to
>> review it right away either, just reply with "please don't merge until
>> I'm done reviewing" and it'll get removed from the queue.
>
>I am not talking about sneaking or pushing behind the backs. I am just
>saying that you cannot do this without direct involvement of
>maintainers. If they do not respond to bug reports should at them and I
>am pretty sure that those subsystems will get a bigger pressure to find
>their way to select _important_ fixes to users who are not running the
>bleeding edge because those users _matter_ as well (maybe even more
>because they are a much larger group).
>
>> >> Oh, and if you do want to complain about huge new features being
>> >> backported, look at the mess that Spectre and Meltdown has caused in the
>> >> stable trees.  I don't see anyone complaining about those massive
>> >> changes :)
>> >
>> >Are you serious? Are you going the compare the biggest PITA that the
>> >community had to undergo because of HW issues with random pattern
>> >matching in changelog/diffs? Come on!
>>
>> HW Issues are irrelevant here. You had a bug that allowed arbitrary
>> kernel memory access. I can easily list quite a few commits, that are
>> not tagged for stable, that fix exactly the same thing.
>
>Those are important fixes and if you are aware of them then you should
>be involving the respective maintainer. I haven't heard about _any_
>maintainer who would refuse to help.

Let's do it this way: let's assume my AUTOSEL project is bad and I'll
get rid of it tomorrow.

How do I get the XFS folks to send their stuff to -stable? (we have
quite a few customers who use XFS)

How do I get the KVM folks to be more consistent about tagging patches
for -stable? (we support nested KVM!)

How Do I get people who are not aware of how the -stable project to tag
their commits properly? (there's quite a long tail of authors sending 1
important bugfix and disappearing forever)

We can agree that just asking them nicely doesn't work: Greg has been
poking maintainers for years, the -stable project got bunch of
publicity, and the instructions for including a patch in -stable are
pretty straightforward.

You're saying that AUTOSEL doesn't work, so let's ignore that too.

How should we proceed?

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 14:55                                         ` Sasha Levin
@ 2018-04-17 15:52                                           ` Jiri Kosina
  2018-04-17 16:06                                             ` Sasha Levin
  2018-04-17 16:25                                             ` Mike Galbraith
  0 siblings, 2 replies; 113+ messages in thread
From: Jiri Kosina @ 2018-04-17 15:52 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Michal Hocko, Greg KH, Pavel Machek, Linus Torvalds,
	Steven Rostedt, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue, 17 Apr 2018, Sasha Levin wrote:

> How do I get the XFS folks to send their stuff to -stable? (we have
> quite a few customers who use XFS)

If XFS (or *any* other subsystem) doesn't have enough manpower of upstream 
maintainers to deal with stable, we just have to accept that and find an 
answer to that.

If XFS folks claim that they don't have enough mental capacity to 
create/verify XFS backports, I totally don't see how any kind of AI would 
have.

If your business relies on XFS (and so does ours, BTW) or any other 
subsystem that doesn't have enough manpower to care for stable, the proper 
solution (and contribution) would be just bringing more people into the 
XFS community.

To put it simply -- I don't think the simple lack of actual human 
brainpower can be reasonably resolved in other way than bringing more of 
it in.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 15:52                                           ` Jiri Kosina
@ 2018-04-17 16:06                                             ` Sasha Levin
  2018-05-03 10:04                                               ` Pavel Machek
  2018-04-17 16:25                                             ` Mike Galbraith
  1 sibling, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-17 16:06 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Michal Hocko, Greg KH, Pavel Machek, Linus Torvalds,
	Steven Rostedt, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote:
>On Tue, 17 Apr 2018, Sasha Levin wrote:
>
>> How do I get the XFS folks to send their stuff to -stable? (we have
>> quite a few customers who use XFS)
>
>If XFS (or *any* other subsystem) doesn't have enough manpower of upstream
>maintainers to deal with stable, we just have to accept that and find an
>answer to that.

This is exactly what I'm doing. Many subsystems don't have enough
manpower to deal with -stable, so I'm trying to help.

>If XFS folks claim that they don't have enough mental capacity to
>create/verify XFS backports, I totally don't see how any kind of AI would
>have.

Because creating backports is not all about mental capacity!

A lot of time gets wasted on going through the list of commits,
backporting each of those commits into every -stable tree we have,
building it, running tests, etc.

So it's not all about pure mental capacity, but more about the time
per-patch it takes to get -stable done.

If I can cut down on that, by suggesting a list of commits, doing builds
and tests, what's the problem?

>If your business relies on XFS (and so does ours, BTW) or any other
>subsystem that doesn't have enough manpower to care for stable, the proper
>solution (and contribution) would be just bringing more people into the
>XFS community.

Microsoft's business relies on quite a few kernel subsystems. While we
try to bring more people in the kernel (we're hiring!), as you might
know it's not easy getting kernel folks.

So just "get more people" isn't a good solution. It doesn't scale
either.

>To put it simply -- I don't think the simple lack of actual human
>brainpower can be reasonably resolved in other way than bringing more of
>it in.
>
>Thanks,
>
>-- 
>Jiri Kosina
>SUSE Labs
>

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 16:06                                             ` Sasha Levin
@ 2018-05-03 10:04                                               ` Pavel Machek
  2018-05-03 13:02                                                 ` Sasha Levin
  0 siblings, 1 reply; 113+ messages in thread
From: Pavel Machek @ 2018-05-03 10:04 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Jiri Kosina, Michal Hocko, Greg KH, Linus Torvalds,
	Steven Rostedt, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 1289 bytes --]

On Tue 2018-04-17 16:06:29, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote:
> >On Tue, 17 Apr 2018, Sasha Levin wrote:
> >
> >> How do I get the XFS folks to send their stuff to -stable? (we have
> >> quite a few customers who use XFS)
> >
> >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream
> >maintainers to deal with stable, we just have to accept that and find an
> >answer to that.
> 
> This is exactly what I'm doing. Many subsystems don't have enough
> manpower to deal with -stable, so I'm trying to help.

...and the torrent of spams from the AUTOSEL subsystem actually makes
that worse.

And when you are told particular fix to LEDs is not that important
after all, you start arguing about nuclear power plants (without
really knowing how critical subsystems work).

If you want cooperation with maintainers to work, the rules need to be
clear, first. They are documented, so follow them. If you think rules
are wrong, lets talk about changing the rules; but arguing "every bug
is important because someone may be hitting it" is not ok.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-05-03 10:04                                               ` Pavel Machek
@ 2018-05-03 13:02                                                 ` Sasha Levin
  0 siblings, 0 replies; 113+ messages in thread
From: Sasha Levin @ 2018-05-03 13:02 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jiri Kosina, Michal Hocko, Greg KH, Linus Torvalds,
	Steven Rostedt, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo

On Thu, May 03, 2018 at 12:04:41PM +0200, Pavel Machek wrote:
>On Tue 2018-04-17 16:06:29, Sasha Levin wrote:
>> On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote:
>> >On Tue, 17 Apr 2018, Sasha Levin wrote:
>> >
>> >> How do I get the XFS folks to send their stuff to -stable? (we have
>> >> quite a few customers who use XFS)
>> >
>> >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream
>> >maintainers to deal with stable, we just have to accept that and find an
>> >answer to that.
>>
>> This is exactly what I'm doing. Many subsystems don't have enough
>> manpower to deal with -stable, so I'm trying to help.
>
>...and the torrent of spams from the AUTOSEL subsystem actually makes
>that worse.
>
>And when you are told particular fix to LEDs is not that important
>after all, you start arguing about nuclear power plants (without
>really knowing how critical subsystems work).

Obviously your knowledge far surpasses mine.

>If you want cooperation with maintainers to work, the rules need to be
>clear, first. They are documented, so follow them. If you think rules
>are wrong, lets talk about changing the rules; but arguing "every bug
>is important because someone may be hitting it" is not ok.

I'm sorry but you're just unfamiliar with the process. I'd point out
that all my AUTOSEL commits go through Greg, who wrote the rules, and
accepts my patches.

The rules are there as a guideline to allow us to not take certain
patches, they're not there as a strict set of rules we must follow at
all times.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 15:52                                           ` Jiri Kosina
  2018-04-17 16:06                                             ` Sasha Levin
@ 2018-04-17 16:25                                             ` Mike Galbraith
  1 sibling, 0 replies; 113+ messages in thread
From: Mike Galbraith @ 2018-04-17 16:25 UTC (permalink / raw)
  To: Jiri Kosina, Sasha Levin
  Cc: Michal Hocko, Greg KH, Pavel Machek, Linus Torvalds,
	Steven Rostedt, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue, 2018-04-17 at 17:52 +0200, Jiri Kosina wrote:
> On Tue, 17 Apr 2018, Sasha Levin wrote:
> 
> > How do I get the XFS folks to send their stuff to -stable? (we have
> > quite a few customers who use XFS)
> 
> If XFS (or *any* other subsystem) doesn't have enough manpower of upstream 
> maintainers to deal with stable, we just have to accept that and find an 
> answer to that.
> 
> If XFS folks claim that they don't have enough mental capacity to 
> create/verify XFS backports, I totally don't see how any kind of AI would 
> have.
> 
> If your business relies on XFS (and so does ours, BTW) or any other 
> subsystem that doesn't have enough manpower to care for stable, the proper 
> solution (and contribution) would be just bringing more people into the 
> XFS community.
> 
> To put it simply -- I don't think the simple lack of actual human 
> brainpower can be reasonably resolved in other way than bringing more of 
> it in.

Not to worry... soon enough it'll be submitting properly massaged
backports of the stuff it submitted upstream :)

	-Mike

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 10:39                                 ` Greg KH
  2018-04-17 11:07                                   ` Michal Hocko
@ 2018-04-17 11:21                                   ` Jiri Kosina
  1 sibling, 0 replies; 113+ messages in thread
From: Jiri Kosina @ 2018-04-17 11:21 UTC (permalink / raw)
  To: Greg KH
  Cc: Sasha Levin, Pavel Machek, Linus Torvalds, Steven Rostedt,
	Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Tue, 17 Apr 2018, Greg KH wrote:

> It already is that :)

I have a question: I guess a stable team has an idea who they are 
preparing the tree for, IOW who is the target consumer. Who is it?

Certainly it's not major distros, as both RH and SUSE already stated that 
they are either not basing off the stable kernel (only cherry-pick fixes 
from it) (RH), or are quite far in the process of moving away from stable 
tree towards combination of what RH is doing + semi-automated evaluation 
of Fixes: tag (SUSE).

If the target audience is somewhere else, that's perfectly fine, but then 
it'd have to be stated explicitly I guess.

I can't speak for RH, but for us (at least me personally), the pace of 
patches flowing into -stable is way too high for us to keep control of 
what is landing in our tree.

In some sense, stability should be equivalent to "minimal necessary amout 
of super-critical changes". That's not what this "let's backport 
proactively almost everything that has word 'fixes' in changelog" (I'm a 
bit exaggerating of course) seems to be about.

Again, the rules stated out in

	Documentation/process/stable-kernel-rules.rst

are very nice, and are exactly something at least we would be very happy 
about. They have the nice hidden asumption in them, that someone actually 
has to actively invest human brain power to think about the fix, its 
consequences, prerequisities, etc. Not just doing a big dump of all 
commits that "might fix something".

How many of the actual patches flowing into -stable would satisfy those 
criteria these days?

IOW, I'm pretty sure our users are much happier with us supplying them 
reactive fixes than pro-active uncertainity.

> The problem Sasha is trying to solve here is that for many subsystems,
> maintainers do not mark patches for stable at all.

The pressure on those subsystems should be coming from unhappy users 
(being it end-users or vendors redistributing the tree) of the stable 
tree, who would be complaining about missing fixes for those subsystems. 
Is this actually happening? Where?

> Oh, and if you do want to complain about huge new features being 
> backported, look at the mess that Spectre and Meltdown has caused in the 
> stable trees.  I don't see anyone complaining about those massive 
> changes :)

Umm, sorry, how is this related?

There simply was no other way, and I took it for given that this is seen 
by everybody involved as an absolute exception, due to the nature of the 
issue and of the massive changes that were needed.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 21:18                             ` Sasha Levin
  2018-04-16 21:28                               ` Jiri Kosina
@ 2018-05-03  9:47                               ` Pavel Machek
  2018-05-03 13:06                                 ` Sasha Levin
  1 sibling, 1 reply; 113+ messages in thread
From: Pavel Machek @ 2018-05-03  9:47 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Jiri Kosina, Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 2295 bytes --]

On Mon 2018-04-16 21:18:47, Sasha Levin wrote:
> On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote:
> >On Mon, 16 Apr 2018, Sasha Levin wrote:
> >
> >> So I think that Linus's claim that users come first applies here as
> >> well. If there's a user that cares about a particular feature being
> >> broken, then we go ahead and fix his bug rather then ignoring him.
> >
> >So one extreme is fixing -stable *iff* users actually do report an issue.
> >
> >The other extreme is backporting everything that potentially looks like a
> >potential fix of "something" (according to some arbitrary metric),
> >pro-actively.
> >
> >The former voilates the "users first" rule, the latter has a very, very
> >high risk of regressions.
> >
> >So this whole debate is about finding a compromise.
> >
> >My gut feeling always was that the statement in
> >
> >	Documentation/process/stable-kernel-rules.rst
> >
> >is very reasonable, but making the process way more "aggresive" when
> >backporting patches is breaking much of its original spirit for me.
> 
> I agree that as an enterprise distro taking everything from -stable
> isn't the best idea. Ideally you'd want to be close to the first

Original purpose of -stable was "to be common base of enterprise
distros" and our documentation still says it is.

> I think that we can agree that it's impossible to expect every single
> Linux user to go on LKML and complain about a bug he encountered, so the
> rule quickly becomes "It must fix a real bug that can bother
> people".

I think you are playing dangerous word games.

> My "aggressiveness" comes from the whole "bother" part: it doesn't have
> to be critical, it doesn't have to cause data corruption, it doesn't
> have to be a security issue. It's enough that the bug actually affects a
> user in a way he didn't expect it to (if a user doesn't have
> expectations, it would fall under the "This could be a problem..."
> exception.

And it seems documentation says you should be less aggressive and
world tells you they expect to be less aggressive. So maybe that's
what you should do?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-05-03  9:47                               ` Pavel Machek
@ 2018-05-03 13:06                                 ` Sasha Levin
  0 siblings, 0 replies; 113+ messages in thread
From: Sasha Levin @ 2018-05-03 13:06 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jiri Kosina, Linus Torvalds, Steven Rostedt, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Thu, May 03, 2018 at 11:47:24AM +0200, Pavel Machek wrote:
>On Mon 2018-04-16 21:18:47, Sasha Levin wrote:
>> On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote:
>> >On Mon, 16 Apr 2018, Sasha Levin wrote:
>> >
>> >> So I think that Linus's claim that users come first applies here as
>> >> well. If there's a user that cares about a particular feature being
>> >> broken, then we go ahead and fix his bug rather then ignoring him.
>> >
>> >So one extreme is fixing -stable *iff* users actually do report an issue.
>> >
>> >The other extreme is backporting everything that potentially looks like a
>> >potential fix of "something" (according to some arbitrary metric),
>> >pro-actively.
>> >
>> >The former voilates the "users first" rule, the latter has a very, very
>> >high risk of regressions.
>> >
>> >So this whole debate is about finding a compromise.
>> >
>> >My gut feeling always was that the statement in
>> >
>> >	Documentation/process/stable-kernel-rules.rst
>> >
>> >is very reasonable, but making the process way more "aggresive" when
>> >backporting patches is breaking much of its original spirit for me.
>>
>> I agree that as an enterprise distro taking everything from -stable
>> isn't the best idea. Ideally you'd want to be close to the first
>
>Original purpose of -stable was "to be common base of enterprise
>distros" and our documentation still says it is.

I guess that the world changes?

At this point calling enterprise distros a niche wouldn't be too far
from the truth. Furthermore, some enterprise distros (as stated
earlier in this thread) don't even follow -stable anymore and cherry
pick their own commits.

So no, the main driving force behind -stable is not traditional
enterprise distributions.

>> I think that we can agree that it's impossible to expect every single
>> Linux user to go on LKML and complain about a bug he encountered, so the
>> rule quickly becomes "It must fix a real bug that can bother
>> people".
>
>I think you are playing dangerous word games.
>
>> My "aggressiveness" comes from the whole "bother" part: it doesn't have
>> to be critical, it doesn't have to cause data corruption, it doesn't
>> have to be a security issue. It's enough that the bug actually affects a
>> user in a way he didn't expect it to (if a user doesn't have
>> expectations, it would fall under the "This could be a problem..."
>> exception.
>
>And it seems documentation says you should be less aggressive and
>world tells you they expect to be less aggressive. So maybe that's
>what you should do?

Who is this "world" you're referring to?

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:06               ` Pavel Machek
  2018-04-16 16:14                 ` Sasha Levin
@ 2018-04-16 16:20                 ` Steven Rostedt
  2018-04-16 16:28                   ` Sasha Levin
  1 sibling, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 16:20 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Sasha Levin, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo

On Mon, 16 Apr 2018 18:06:08 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> That means you want to ignore not-so-serious bugs, because benefit of
> fixing them is lower than risk of the regressions. I believe bugs that
> do not bother anyone should _not_ be fixed in stable.
> 
> That was case of the LED patch. Yes, the commit fixed bug, but it
> introduced regressions that were fixed by subsequent patches.

I agree. I would disagree that the patch this thread is on should go to
stable. What's the point of stable if it introduces regressions by
backporting bug fixes for non major bugs.

Every fix I make I consider labeling it for stable. The ones I don't, I
feel the bug fix is not worth the risk of added regressions.

I worry that people will get lazy and stop marking commits for stable
(or even thinking about it) because they know that there's a bot that
will pull it for them. That thought crossed my mind. Why do I want to
label anything stable if a bot will probably catch it. Then I could
just wait till the bot posts it before I even think about stable.

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:20                 ` Steven Rostedt
@ 2018-04-16 16:28                   ` Sasha Levin
  2018-04-16 16:39                     ` Pavel Machek
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 16:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo

On Mon, Apr 16, 2018 at 12:20:19PM -0400, Steven Rostedt wrote:
>On Mon, 16 Apr 2018 18:06:08 +0200
>Pavel Machek <pavel@ucw.cz> wrote:
>
>> That means you want to ignore not-so-serious bugs, because benefit of
>> fixing them is lower than risk of the regressions. I believe bugs that
>> do not bother anyone should _not_ be fixed in stable.
>>
>> That was case of the LED patch. Yes, the commit fixed bug, but it
>> introduced regressions that were fixed by subsequent patches.
>
>I agree. I would disagree that the patch this thread is on should go to
>stable. What's the point of stable if it introduces regressions by
>backporting bug fixes for non major bugs.

One such reason is that users will then hit the regression when they
upgrade to the next -stable version anyways.

>Every fix I make I consider labeling it for stable. The ones I don't, I
>feel the bug fix is not worth the risk of added regressions.
>
>I worry that people will get lazy and stop marking commits for stable
>(or even thinking about it) because they know that there's a bot that
>will pull it for them. That thought crossed my mind. Why do I want to
>label anything stable if a bot will probably catch it. Then I could
>just wait till the bot posts it before I even think about stable.

People are already "lazy". You are actually an exception for marking your
commits.

Yes, folks will chime in with "sure, I mark my patches too!", but if you
look at the entire committer pool in the kernel you'll see that most
don't bother with this to begin with.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:28                   ` Sasha Levin
@ 2018-04-16 16:39                     ` Pavel Machek
  2018-04-16 16:43                       ` Sasha Levin
  0 siblings, 1 reply; 113+ messages in thread
From: Pavel Machek @ 2018-04-16 16:39 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 2318 bytes --]

On Mon 2018-04-16 16:28:00, Sasha Levin wrote:
> On Mon, Apr 16, 2018 at 12:20:19PM -0400, Steven Rostedt wrote:
> >On Mon, 16 Apr 2018 18:06:08 +0200
> >Pavel Machek <pavel@ucw.cz> wrote:
> >
> >> That means you want to ignore not-so-serious bugs, because benefit of
> >> fixing them is lower than risk of the regressions. I believe bugs that
> >> do not bother anyone should _not_ be fixed in stable.
> >>
> >> That was case of the LED patch. Yes, the commit fixed bug, but it
> >> introduced regressions that were fixed by subsequent patches.
> >
> >I agree. I would disagree that the patch this thread is on should go to
> >stable. What's the point of stable if it introduces regressions by
> >backporting bug fixes for non major bugs.
> 
> One such reason is that users will then hit the regression when they
> upgrade to the next -stable version anyways.

Well, yes, testing is required when moving from 4.14 to 4.15. But
testing should not be required when moving from 4.14.5 to 4.14.6.

> >Every fix I make I consider labeling it for stable. The ones I don't, I
> >feel the bug fix is not worth the risk of added regressions.
> >
> >I worry that people will get lazy and stop marking commits for stable
> >(or even thinking about it) because they know that there's a bot that
> >will pull it for them. That thought crossed my mind. Why do I want to
> >label anything stable if a bot will probably catch it. Then I could
> >just wait till the bot posts it before I even think about stable.
> 
> People are already "lazy". You are actually an exception for marking your
> commits.
> 
> Yes, folks will chime in with "sure, I mark my patches too!", but if you
> look at the entire committer pool in the kernel you'll see that most
> don't bother with this to begin with.

So you take everything and put it into stable? I don't think that's a
solution.

If you are worried about people not putting enough "Stable: " tags in
their commits, perhaps you can write them emails "hey, I think this
should go to stable, do you agree"? You should get people marking
their commits themselves pretty quickly...
									
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:39                     ` Pavel Machek
@ 2018-04-16 16:43                       ` Sasha Levin
  2018-04-16 16:53                         ` Steven Rostedt
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 16:43 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH

On Mon, Apr 16, 2018 at 06:39:53PM +0200, Pavel Machek wrote:
>On Mon 2018-04-16 16:28:00, Sasha Levin wrote:
>> On Mon, Apr 16, 2018 at 12:20:19PM -0400, Steven Rostedt wrote:
>> >On Mon, 16 Apr 2018 18:06:08 +0200
>> >Pavel Machek <pavel@ucw.cz> wrote:
>> >
>> >> That means you want to ignore not-so-serious bugs, because benefit of
>> >> fixing them is lower than risk of the regressions. I believe bugs that
>> >> do not bother anyone should _not_ be fixed in stable.
>> >>
>> >> That was case of the LED patch. Yes, the commit fixed bug, but it
>> >> introduced regressions that were fixed by subsequent patches.
>> >
>> >I agree. I would disagree that the patch this thread is on should go to
>> >stable. What's the point of stable if it introduces regressions by
>> >backporting bug fixes for non major bugs.
>>
>> One such reason is that users will then hit the regression when they
>> upgrade to the next -stable version anyways.
>
>Well, yes, testing is required when moving from 4.14 to 4.15. But
>testing should not be required when moving from 4.14.5 to 4.14.6.

You always have to test, even without the AUTOSEL stuff. The rejection
rate was 2% even before AUTOSEL, so there was always a chance of
regression when upgrading minor stable versions.

>> >Every fix I make I consider labeling it for stable. The ones I don't, I
>> >feel the bug fix is not worth the risk of added regressions.
>> >
>> >I worry that people will get lazy and stop marking commits for stable
>> >(or even thinking about it) because they know that there's a bot that
>> >will pull it for them. That thought crossed my mind. Why do I want to
>> >label anything stable if a bot will probably catch it. Then I could
>> >just wait till the bot posts it before I even think about stable.
>>
>> People are already "lazy". You are actually an exception for marking your
>> commits.
>>
>> Yes, folks will chime in with "sure, I mark my patches too!", but if you
>> look at the entire committer pool in the kernel you'll see that most
>> don't bother with this to begin with.
>
>So you take everything and put it into stable? I don't think that's a
>solution.

I don't think I ever said that I want to put *everything*

>If you are worried about people not putting enough "Stable: " tags in
>their commits, perhaps you can write them emails "hey, I think this
>should go to stable, do you agree"? You should get people marking
>their commits themselves pretty quickly...

Greg has been doing this for years, ask him how that worked out for him.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:43                       ` Sasha Levin
@ 2018-04-16 16:53                         ` Steven Rostedt
  2018-04-16 16:58                           ` Pavel Machek
  2018-04-16 17:09                           ` Sasha Levin
  0 siblings, 2 replies; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 16:53 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, 16 Apr 2018 16:43:13 +0000
Sasha Levin <Alexander.Levin@microsoft.com> wrote:

> >If you are worried about people not putting enough "Stable: " tags in
> >their commits, perhaps you can write them emails "hey, I think this
> >should go to stable, do you agree"? You should get people marking
> >their commits themselves pretty quickly...  
> 
> Greg has been doing this for years, ask him how that worked out for him.

Then he shouldn't pull in the fix. Let it be broken. As soon as someone
complains about it being broken, then bug the maintainer again. "Hey,
this is broken in 4.x, and this looks like the fix for it. Do you
agree?"

I agree that some patches don't need this discussion. Things that are
obvious. Off-by-one and stack-overflow and other bugs like that. Or
another common bug is error paths that don't release locks. These
should just be backported. But subtle fixes like this thread should
default to (not backport unless someones complains or the
author/maintainer acks it).

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:53                         ` Steven Rostedt
@ 2018-04-16 16:58                           ` Pavel Machek
  2018-04-16 17:09                           ` Sasha Levin
  1 sibling, 0 replies; 113+ messages in thread
From: Pavel Machek @ 2018-04-16 16:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sasha Levin, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

[-- Attachment #1: Type: text/plain, Size: 1334 bytes --]

On Mon 2018-04-16 12:53:07, Steven Rostedt wrote:
> On Mon, 16 Apr 2018 16:43:13 +0000
> Sasha Levin <Alexander.Levin@microsoft.com> wrote:
> 
> > >If you are worried about people not putting enough "Stable: " tags in
> > >their commits, perhaps you can write them emails "hey, I think this
> > >should go to stable, do you agree"? You should get people marking
> > >their commits themselves pretty quickly...  
> > 
> > Greg has been doing this for years, ask him how that worked out for him.
> 
> Then he shouldn't pull in the fix. Let it be broken. As soon as someone
> complains about it being broken, then bug the maintainer again. "Hey,
> this is broken in 4.x, and this looks like the fix for it. Do you
> agree?"
> 
> I agree that some patches don't need this discussion. Things that are
> obvious. Off-by-one and stack-overflow and other bugs like that. Or
> another common bug is error paths that don't release locks. These
> should just be backported. But subtle fixes like this thread should
> default to (not backport unless someones complains or the
> author/maintainer acks it).

Agreed. And it scares me we are even discussing this.


									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:53                         ` Steven Rostedt
  2018-04-16 16:58                           ` Pavel Machek
@ 2018-04-16 17:09                           ` Sasha Levin
  2018-04-16 17:33                             ` Steven Rostedt
  1 sibling, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 17:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, Apr 16, 2018 at 12:53:07PM -0400, Steven Rostedt wrote:
>On Mon, 16 Apr 2018 16:43:13 +0000
>Sasha Levin <Alexander.Levin@microsoft.com> wrote:
>
>> >If you are worried about people not putting enough "Stable: " tags in
>> >their commits, perhaps you can write them emails "hey, I think this
>> >should go to stable, do you agree"? You should get people marking
>> >their commits themselves pretty quickly...
>>
>> Greg has been doing this for years, ask him how that worked out for him.
>
>Then he shouldn't pull in the fix. Let it be broken. As soon as someone
>complains about it being broken, then bug the maintainer again. "Hey,
>this is broken in 4.x, and this looks like the fix for it. Do you
>agree?"

If that process would work, I would also get ACK/NACK on every AUTOSEL
request I'm sending.

What usually happens with customer reported issues is that either
they're just told to upgrade to the latest kernel (where the bug is
fixed), or if the distro team can't get them to do that and go hunting
for that bug, they'll just pick it for their kernel tree without ever
telling -stable.

I had a project to get all the fixes Cannonical had in their trees that
we're not in -stable. We're talking hundreds of patches here.

>I agree that some patches don't need this discussion. Things that are
>obvious. Off-by-one and stack-overflow and other bugs like that. Or
>another common bug is error paths that don't release locks. These
>should just be backported. But subtle fixes like this thread should
>default to (not backport unless someones complains or the
>author/maintainer acks it).

Let's play a "be the -stable maintainer" game. Would you take any
of the following commits?

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=fc90441e728aa461a8ed1cfede08b0b9efef43fb
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=a918d2bcea6aab6e671bfb0901cbecc3cf68fca1
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=b1999fa6e8145305a6c8bda30ea20783717708e6

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 17:09                           ` Sasha Levin
@ 2018-04-16 17:33                             ` Steven Rostedt
  2018-04-16 17:42                               ` Sasha Levin
  0 siblings, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 17:33 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, 16 Apr 2018 17:09:38 +0000
Sasha Levin <Alexander.Levin@microsoft.com> wrote:

> Let's play a "be the -stable maintainer" game. Would you take any
> of the following commits?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=fc90441e728aa461a8ed1cfede08b0b9efef43fb

No, not automatically, or without someone from KVM letting me know what
side-effects that may have. Not stopping on a breakpoint is not that
critical, it may be a bit annoying. I would ask the KVM maintainers if
they feel it's critical enough for backporting, but without hearing
from them, I would leave it be.

> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=a918d2bcea6aab6e671bfb0901cbecc3cf68fca1

Sure. Even if it has a subtle regression, that's a critical bug being
fixed.

> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=b1999fa6e8145305a6c8bda30ea20783717708e6

I would consider unlocking a mutex that one didn't lock a critical bug,
so yes.

Again, things that deal with locking or buffer overflows, I would take
the fix, as those are critical. But other behavior issues where it's
not critical, I would leave be unless told further by someone else.

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 17:33                             ` Steven Rostedt
@ 2018-04-16 17:42                               ` Sasha Levin
  2018-04-16 18:26                                 ` Steven Rostedt
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 17:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, Apr 16, 2018 at 01:33:21PM -0400, Steven Rostedt wrote:
>On Mon, 16 Apr 2018 17:09:38 +0000
>Sasha Levin <Alexander.Levin@microsoft.com> wrote:
>
>> Let's play a "be the -stable maintainer" game. Would you take any
>> of the following commits?
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=fc90441e728aa461a8ed1cfede08b0b9efef43fb
>
>No, not automatically, or without someone from KVM letting me know what
>side-effects that may have. Not stopping on a breakpoint is not that
>critical, it may be a bit annoying. I would ask the KVM maintainers if
>they feel it's critical enough for backporting, but without hearing
>from them, I would leave it be.

Fair enough.

>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=a918d2bcea6aab6e671bfb0901cbecc3cf68fca1
>
>Sure. Even if it has a subtle regression, that's a critical bug being
>fixed.

This was later reverted, in -stable:

"""
Commit d63c7dd5bcb9 ("ipr: Fix out-of-bounds null overwrite") removed
the end of line handling when storing the update_fw sysfs attribute.
This changed the userpace API because it started refusing writes
terminated by a line feed, which broke the update tools we already have.
"""

>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=b1999fa6e8145305a6c8bda30ea20783717708e6
>
>I would consider unlocking a mutex that one didn't lock a critical bug,
>so yes.
>
>Again, things that deal with locking or buffer overflows, I would take
>the fix, as those are critical. But other behavior issues where it's
>not critical, I would leave be unless told further by someone else.

This too, was reverted:

"""
It causes run-time breakage in the 4.4-stable tree and more patches are
needed to be applied first before this one in order to resolve the
issue.
"""

This is how fun it is reviewing AUTOSEL commits :)

Even the small "trivial", "obviously correct" patches have room for
errors for various reasons.

Also note that all of these patches were tagged for stable and actually
ended up in at least one tree.

This is why I'm basing a lot of my decision making on the rejection rate.
If the AUTOSEL process does the job well enough as the "regular"
process did before, why push it back?

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 17:42                               ` Sasha Levin
@ 2018-04-16 18:26                                 ` Steven Rostedt
  2018-04-16 18:30                                   ` Linus Torvalds
  2018-04-16 18:35                                   ` Sasha Levin
  0 siblings, 2 replies; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 18:26 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, 16 Apr 2018 17:42:38 +0000
Sasha Levin <Alexander.Levin@microsoft.com> wrote:

> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=a918d2bcea6aab6e671bfb0901cbecc3cf68fca1  
> >
> >Sure. Even if it has a subtle regression, that's a critical bug being
> >fixed.  
> 
> This was later reverted, in -stable:
> 
> """
> Commit d63c7dd5bcb9 ("ipr: Fix out-of-bounds null overwrite") removed
> the end of line handling when storing the update_fw sysfs attribute.
> This changed the userpace API because it started refusing writes
> terminated by a line feed, which broke the update tools we already have.
> """

I hope it wasn't reverted. It did fix a critical bug.

The problem is that it only fixed a critical bug, but didn't go far
enough to keep the bug fix from breaking API. I see this as two bugs
being fixed. Even though the second bug was "caused" by the first fix.
the first fix was still necessary. The second bug was relying on broken
code. This hasn't changed my position on that patch from being
backported. I would not even mark this as a regression. I would say the
original code was broken too much, and fixing part of it just showed
revealed another broken part.


> 
> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=b1999fa6e8145305a6c8bda30ea20783717708e6  
> >
> >I would consider unlocking a mutex that one didn't lock a critical bug,
> >so yes.
> >
> >Again, things that deal with locking or buffer overflows, I would take
> >the fix, as those are critical. But other behavior issues where it's
> >not critical, I would leave be unless told further by someone else.  
> 
> This too, was reverted:
> 
> """
> It causes run-time breakage in the 4.4-stable tree and more patches are
> needed to be applied first before this one in order to resolve the
> issue.
> """

It wasn't reverted in mainline. Looks like there was some subtle issues
with the different stable versions. Perhaps the "fixes" was wrong.

> 
> This is how fun it is reviewing AUTOSEL commits :)
> 
> Even the small "trivial", "obviously correct" patches have room for
> errors for various reasons.

And that's fine. Any code written can have bugs in it. That's just a
given. Which pushes for why we should be extremely picky about what we
backport.

> 
> Also note that all of these patches were tagged for stable and actually
> ended up in at least one tree.
> 
> This is why I'm basing a lot of my decision making on the rejection rate.
> If the AUTOSEL process does the job well enough as the "regular"
> process did before, why push it back?

Because I think we are adding too many patches to stable. And
automating it may just make things worse. Your examples above back my
argument more than they refute it. If people can't determine what is
"obviously correct" how is automation going to do any better?

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 18:26                                 ` Steven Rostedt
@ 2018-04-16 18:30                                   ` Linus Torvalds
  2018-04-16 18:41                                     ` Steven Rostedt
  2018-04-16 18:35                                   ` Sasha Levin
  1 sibling, 1 reply; 113+ messages in thread
From: Linus Torvalds @ 2018-04-16 18:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, Apr 16, 2018 at 11:26 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> The problem is that it only fixed a critical bug, but didn't go far
> enough to keep the bug fix from breaking API.

An API breakage that gets noticed *is* a crtitical bug.

You can't call something else critical and then say "but it broken API".

Seriously. Why do I even have to mention this?

If you break user workflows, NOTHING ELSE MATTERS.

Even security is secondary to "people don't use the end result,
because it doesn't work for them any more".

Really.

Stop with this idiotic "only API". Breaking user space is just about
the only thing that really matters. The rest is "small matter of
implementation".

              Linus

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 18:30                                   ` Linus Torvalds
@ 2018-04-16 18:41                                     ` Steven Rostedt
  2018-04-16 18:52                                       ` Linus Torvalds
  0 siblings, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 18:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, 16 Apr 2018 11:30:06 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Apr 16, 2018 at 11:26 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > The problem is that it only fixed a critical bug, but didn't go far
> > enough to keep the bug fix from breaking API.  
> 
> An API breakage that gets noticed *is* a crtitical bug.

I totally agree with you. You misunderstood what I wrote.

I said there were two bugs here. The first bug was a possible accessing
bad memory bug. That needed to be fixed. The problem was by fixing
that, it broke API. But that's because the original code was broken
where it relied on broken code to get right. I never said the second
bug fix should not have been backported. I even said that the first bug
"didn't go far enough".

I hope the answer was not to revert the bug and put back the possible
bad memory access in to keep API. But it was to backport the second bug
fix that still has the first fix, but fixes the API breakage.

Yes, an API breakage is something I would label as critical to be
backported.

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 18:41                                     ` Steven Rostedt
@ 2018-04-16 18:52                                       ` Linus Torvalds
  2018-04-16 19:00                                         ` Linus Torvalds
                                                           ` (2 more replies)
  0 siblings, 3 replies; 113+ messages in thread
From: Linus Torvalds @ 2018-04-16 18:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, Apr 16, 2018 at 11:41 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
>I never said the second
> bug fix should not have been backported. I even said that the first bug
> "didn't go far enough".

You're still not getting it.

The "didn't go far enough" means that the bug fix is *BUGGY*. It needs
to be reverted.

> I hope the answer was not to revert the bug and put back the possible
> bad memory access in to keep API.

But that very must *IS* the answer. If there isn't a fix for the ABI
breakage, then the first bugfix needs to be reverted.

Really. There is no such thing as "but the fix was more important than
the bug it introduced".

This is why we started with the whole "actively revert things that
introduce regressions". Because people always kept claiming that "but
but I fixed a worse bug, and it's better to fix the worse bug even if
it then introduces another problem, because the other problem is
lesser".

NO.

We're better off making *no* progress, than making "unsteady progress".

Really. Seriously.

If you cannot fix a bug without introducing another one, don't do it.
Don't do kernel development.

The whole mentality you show is NOT ACCEPTABLE.

So the *only* answer is: "fix the bug _and_ keep the API".  There is
no other choice.

The whole "I fixed one problem but introduced another" is not how we
work. You should damn well know that. There are no excuses.

And yes, sometimes that means jumping through hoops. But that's what
it takes to keep users happy.

                 Linus

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 18:52                                       ` Linus Torvalds
@ 2018-04-16 19:00                                         ` Linus Torvalds
  2018-04-16 19:30                                           ` Steven Rostedt
  2018-04-16 19:19                                         ` Linus Torvalds
  2018-04-16 19:24                                         ` Steven Rostedt
  2 siblings, 1 reply; 113+ messages in thread
From: Linus Torvalds @ 2018-04-16 19:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, Apr 16, 2018 at 11:52 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> We're better off making *no* progress, than making "unsteady progress".
>
> Really. Seriously.

Side note: the original impetus for this was our suspend/resume mess.
It went on for *YEARS*, and it was absolutely chock-full of exactly
this "I fixed the worse problem, and introduced another one".

There's a reason I'm a hardliner on the regression issue.  We've been
there, we've done that.

The whole "two steps forwards, one step back" mentality may work
really well if you're doing line dancing.

BUT WE ARE NOT LINE DANCING. We do kernel development.

Absolutely NOTHING else is more important than the "no regressions"
rule. NOTHING.

And just since everybody always tries to weasel about this: the only
regressions that matter are the ones that people notice in real loads.

So if you write a test-case that tests that "system call number 345
returns -ENOSYS", and we add a new system call, and you say "hey, you
regressed my system call test", that's not a regression. That's just a
"change in behavior".

It becomes a regression only if there are people using tools or
workflows that actually depend on it. So if it turns out (for example)
that Firefox had some really odd bug, and the intent was to do system
call 123, but a typo had caused it to do system call 345 instead, and
another bug meant that the end result worked fine as long as system
call 345 returned ENOSYS, then the addition of that system call
actually does turn into a regression.

See? Even adding a system call can be a regression, because what
matters is not behavior per se, but users _depending_ on some specific
behavior.

               Linus

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 19:00                                         ` Linus Torvalds
@ 2018-04-16 19:30                                           ` Steven Rostedt
  0 siblings, 0 replies; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 19:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, 16 Apr 2018 12:00:08 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:



> > On Mon, Apr 16, 2018 at 11:52 AM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > We're better off making *no* progress, than making "unsteady progress".
> > >
> > > Really. Seriously.  

[ me inserted: ]

> On Mon, 16 Apr 2018 3:24:29 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > I'm talking about the given example of a simple memory bug that caused
> > a very subtle breakage of API, which had another trivial fix that
> > should be backported. I'm not sure that's what you were talking about.

> 
> Side note: the original impetus for this was our suspend/resume mess.
> It went on for *YEARS*, and it was absolutely chock-full of exactly
> this "I fixed the worse problem, and introduced another one".

What you are talking about here isn't what I was talking about above ;-)

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 18:52                                       ` Linus Torvalds
  2018-04-16 19:00                                         ` Linus Torvalds
@ 2018-04-16 19:19                                         ` Linus Torvalds
  2018-04-16 19:24                                         ` Steven Rostedt
  2 siblings, 0 replies; 113+ messages in thread
From: Linus Torvalds @ 2018-04-16 19:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, Apr 16, 2018 at 11:52 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And yes, sometimes that means jumping through hoops. But that's what
> it takes to keep users happy.

The example of "jumping through hoops" I tend to give is the pipe "packet mode".

The kernel actually has a magic pipe mode for "packet buffers", so
that if you do two small writes, the other side of the pipe can
actually say "I want to read not the pipe buffers, but the individual
packets".

Why would we do that? That's not how pipes work! If you want to send
and receive messages, use a socket, for chrissake! A pipe is just a
stream of bytes - as the elder Gods of Unix decreed!

But it turns out that we added the notion of a packetized pipe writer,
and you can actually access it in user space by setting the O_DIRECT
flag (so after you do the "pipe()" system call, do a fcntl(SETFL,
O_DIRECT) on it).

Absolutely nobody uses it, afaik, because you'd be crazy to do it.
What would be the point? sockets work, and are portable.

So why do we have it?

We have it for one single user: autofs. The way you talk to the kernel
side of things is with a magic pipe that you get at mount time, and
the user-space autofs daemon might be 32-bit even if the kernel is
64-bit, and we had a horrible ABI mistake which meant that sending the
binary data over that pipe had different format for a 32-bit autofsd.

And systemd and automount had different workarounds (or lack
there-of), for the ABI issue.

So the whole "ok, allow people to send packets, and read them as
packets" came about purely because the ABI was broken, and there was
no other way to fix things.

See 64f371bc3107 ("autofs: make the autofsv5 packet file descriptor
use a packetized pipe") for (some) of the sad details.

That commit kind of makes it sound like it's a nice solution that just
takes advantage of the packetized pipes. Nice and clean fix, right?

No. The packetized pipes exist in the first place _purely_ to make
that nice solution possible. It's literally just a "this allows us to
be ABI compatible with two different users that were confused about
the compatibility issue we had due to a broken binary structure format
acrss x86-32 and x86-64".

See commit 9883035ae7ed ("pipes: add a "packetized pipe" mode for
writing") for the other side of that.

All this just because _we_ made a mistake in our ABI, and then real
life users started using that mistake, including one user that
literally *knew* about the mistake and worked around it and started
depending on the fact t hat our compatibility mode was buggy because
of it.

So it was a bug in our ABI. But since people depended on the bug, the
bug was a feature, and needed to be kept around. In this case by
adding a totally new and unrelated feature, and using that new feature
to make those old users happy. The whole "set packetized mode on the
autofs pipe" is all done transparently inside the kernel, and
automount never knew or needed to care that we started to use a
packetized pipe to get the data it sent us in the chunks it intended.

             Linus

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 18:52                                       ` Linus Torvalds
  2018-04-16 19:00                                         ` Linus Torvalds
  2018-04-16 19:19                                         ` Linus Torvalds
@ 2018-04-16 19:24                                         ` Steven Rostedt
  2018-04-16 19:28                                           ` Linus Torvalds
  2 siblings, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 19:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, 16 Apr 2018 11:52:48 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Apr 16, 2018 at 11:41 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> >I never said the second
> > bug fix should not have been backported. I even said that the first bug
> > "didn't go far enough".  
> 
> You're still not getting it.
> 
> The "didn't go far enough" means that the bug fix is *BUGGY*. It needs
> to be reverted.

It wasn't reverted. Look at the code in question.

Commit d63c7dd5bcb

+++ b/drivers/scsi/ipr.c
@@ -4003,13 +4003,12 @@ static ssize_t ipr_store_update_fw(struct device *dev,
 	struct ipr_sglist *sglist;
 	char fname[100];
 	char *src;
-	int len, result, dnld_size;
+	int result, dnld_size;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	len = snprintf(fname, 99, "%s", buf);
-	fname[len-1] = '\0';
+	snprintf(fname, sizeof(fname), "%s", buf);
 
 	if (request_firmware(&fw_entry, fname, &ioa_cfg->pdev->dev)) {
 		dev_err(&ioa_cfg->pdev->dev, "Firmware file %s not found\n", fname);


The bug is that len returned by snprintf() can be much larger than 100.
That fname[len-1] = '\0' can allow a user to decide where to write
zeros.

That patch never got reverted in mainline. It was fixed with this:

Commit 21b81716c6bf

--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -4002,6 +4002,7 @@ static ssize_t ipr_store_update_fw(struct device *dev,
        struct ipr_sglist *sglist;
        char fname[100];
        char *src;
+       char *endline;
        int result, dnld_size;
 
        if (!capable(CAP_SYS_ADMIN))
@@ -4009,6 +4010,10 @@ static ssize_t ipr_store_update_fw(struct device *dev,
 
        snprintf(fname, sizeof(fname), "%s", buf);
 
+       endline = strchr(fname, '\n');
+       if (endline)
+               *endline = '\0';
+
        if (request_firmware(&fw_entry, fname, &ioa_cfg->pdev->dev)) {
                dev_err(&ioa_cfg->pdev->dev, "Firmware file %s not found\n", fname);
                return -EIO;

> 
> > I hope the answer was not to revert the bug and put back the possible
> > bad memory access in to keep API.  
> 
> But that very must *IS* the answer. If there isn't a fix for the ABI
> breakage, then the first bugfix needs to be reverted.

It wasn't reverted and that was my point. It just wasn't a complete
fix. And I'm saying that once the API breakage became apparent, the
second fix should have been backported as well.

I'm not saying that we should allow API breakage to fix a critical bug.
I'm saying that the API breakage was really a secondary bug that needed
to be addressed. My point is the first fix was NOT reverted!


> 
> Really. There is no such thing as "but the fix was more important than
> the bug it introduced".

I'm not saying that.

> 
> This is why we started with the whole "actively revert things that
> introduce regressions". Because people always kept claiming that "but
> but I fixed a worse bug, and it's better to fix the worse bug even if
> it then introduces another problem, because the other problem is
> lesser".
> 
> NO.

Right, but the fix to the API was also trivial. I don't understand why
you are arguing with me. I agree with you. I'm talking about this
specific instance. Where a bug was fixed, and the API breakage was
another fix that needed to be backported.

Are you saying if code could allow userspace to write zeros anywhere in
memory, that we should keep it to allow API compatibility?

> 
> We're better off making *no* progress, than making "unsteady progress".
> 
> Really. Seriously.
> 
> If you cannot fix a bug without introducing another one, don't do it.
> Don't do kernel development.

Um, I think that's impossible. As the example shows. Not many people
would have caught the original fix would caused another bug. That
requirement would pretty much keep everyone from ever doing any kernel
development.

> 
> The whole mentality you show is NOT ACCEPTABLE.
> 
> So the *only* answer is: "fix the bug _and_ keep the API".  There is
> no other choice.

I agree. But that that wasn't the question.

> 
> The whole "I fixed one problem but introduced another" is not how we
> work. You should damn well know that. There are no excuses.
> 
> And yes, sometimes that means jumping through hoops. But that's what
> it takes to keep users happy.


I'm talking about the given example of a simple memory bug that caused
a very subtle breakage of API, which had another trivial fix that
should be backported. I'm not sure that's what you were talking about.

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 19:24                                         ` Steven Rostedt
@ 2018-04-16 19:28                                           ` Linus Torvalds
  2018-04-16 19:31                                             ` Linus Torvalds
  2018-04-16 19:38                                             ` Steven Rostedt
  0 siblings, 2 replies; 113+ messages in thread
From: Linus Torvalds @ 2018-04-16 19:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, Apr 16, 2018 at 12:24 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Right, but the fix to the API was also trivial. I don't understand why
> you are arguing with me. I agree with you. I'm talking about this
> specific instance. Where a bug was fixed, and the API breakage was
> another fix that needed to be backported.

Fair enough. Were you there when the report of breakage came in?

Because *my* argument is that reverting something that causes problems
is simply *never* the wrong answer.

If you know of the fix, fine. But clearly people DID NOT KNOW. So
reverting was the right choice.

                  Linus

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 19:28                                           ` Linus Torvalds
@ 2018-04-16 19:31                                             ` Linus Torvalds
  2018-04-16 19:58                                               ` Steven Rostedt
  2018-04-16 19:38                                             ` Steven Rostedt
  1 sibling, 1 reply; 113+ messages in thread
From: Linus Torvalds @ 2018-04-16 19:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, Apr 16, 2018 at 12:28 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> If you know of the fix, fine. But clearly people DID NOT KNOW. So
> reverting was the right choice.

.. and this is obviously different in stable and in mainline.

For example, I start reverting very aggressively only at the end of a
release. If I get a bisected bug report in the last week, I generally
revert without much argument, unless the author of the patch has an
immediate fix.

In contrast, during the merge window and the early rc's, I'm perfectly
happy to say "ok, let's see if somebody can fix this" and not really
consider a revert.

But the -stable tree?

Seriously, what do you expect them to do if they get a report that a
commit they added to the stable tree regresses?

"Revert first, ask questions later" is definitely a very sane model there.

                  Linus

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 19:31                                             ` Linus Torvalds
@ 2018-04-16 19:58                                               ` Steven Rostedt
  0 siblings, 0 replies; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 19:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, 16 Apr 2018 12:31:09 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> But the -stable tree?
> 
> Seriously, what do you expect them to do if they get a report that a
> commit they added to the stable tree regresses?
> 
> "Revert first, ask questions later" is definitely a very sane model there.

The topic of our discussion is on what to backport, and how likely is
it to cause regressions. I'm arguing that the bar for backporting
should be raised, and that only "critical" fixes should be backported.
Sasha pointed this bug fix as an example, and asked me if I would
backport it under my conditions. I said yes. He then said "it was
reverted", pointing me to the commit that fixed it. That confused
me. When I looked further, I noticed that it wasn't reverted, and since
he pointed me to the API fix, I said "I hope it wasn't reverted"
meaning I hope they backported the obvious API fix and didn't just
revert the original fix.

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 19:28                                           ` Linus Torvalds
  2018-04-16 19:31                                             ` Linus Torvalds
@ 2018-04-16 19:38                                             ` Steven Rostedt
  2018-04-16 19:55                                               ` Linus Torvalds
  1 sibling, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 19:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, 16 Apr 2018 12:28:21 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Apr 16, 2018 at 12:24 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > Right, but the fix to the API was also trivial. I don't understand why
> > you are arguing with me. I agree with you. I'm talking about this
> > specific instance. Where a bug was fixed, and the API breakage was
> > another fix that needed to be backported.  
> 
> Fair enough. Were you there when the report of breakage came in?

No I wasn't.

> 
> Because *my* argument is that reverting something that causes problems
> is simply *never* the wrong answer.
> 
> If you know of the fix, fine. But clearly people DID NOT KNOW. So
> reverting was the right choice.

But I don't see in the git history that this was ever reverted. My reply
saying that "I hope it wasn't reverted", was a response for it being
reverted in stable, not mainline.  Considering that the original bug
would allow userspace to write zeros anywhere in memory, I would have
definitely worked on finding why the API breakage happened and fixing
it properly before putting such a large hole back into the kernel.

I'm assuming that may have been what happened because the commit was
never reverted in your tree, and if I was responsible for that code, I
would be up all night looking for an API fix to make sure the original
fix isn't reverted.

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 19:38                                             ` Steven Rostedt
@ 2018-04-16 19:55                                               ` Linus Torvalds
  2018-04-16 20:02                                                 ` Steven Rostedt
  0 siblings, 1 reply; 113+ messages in thread
From: Linus Torvalds @ 2018-04-16 19:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, Apr 16, 2018 at 12:38 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> But I don't see in the git history that this was ever reverted. My reply
> saying that "I hope it wasn't reverted", was a response for it being
> reverted in stable, not mainline.

See my other email.

If your'e stable maintainer, and you get a report of a commit that
causes problems, your first reaction probably really should just be
"revert it".

You can always re-apply it later, but a patch that causes problems is
absolutely very much suspect, and automatically should make any stable
maintainer go "that needs much more analysis".

Sure, hopefully automation finds the fix too (ie commit 21b81716c6bf
"ipr: Fix regression when loading firmware") in mainline.

It did have the proper "fixes" tag, so it should hopefully have been
easy to find by the automation that stable people use.

But at the same time, I still  maintain that "just revert it" is
rather likely the right solution for stable. If it had a bug once,
maybe it shouldn't have been applied in the first place.

The author can then get notified by the other stable automation, and
at that point argue for "yeah, it was buggy, but together with this
other fix it's really important".

But even when that is the case, I really don't see that the author
should complain about it being reverted. Because it's *such* a
no-brainer in stable.

               Linus

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 19:55                                               ` Linus Torvalds
@ 2018-04-16 20:02                                                 ` Steven Rostedt
  2018-04-16 20:17                                                   ` Linus Torvalds
  0 siblings, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 20:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, 16 Apr 2018 12:55:46 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Apr 16, 2018 at 12:38 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > But I don't see in the git history that this was ever reverted. My reply
> > saying that "I hope it wasn't reverted", was a response for it being
> > reverted in stable, not mainline.  
> 
> See my other email.

Already replied.

> 
> If your'e stable maintainer, and you get a report of a commit that
> causes problems, your first reaction probably really should just be
> "revert it".
> 
> You can always re-apply it later, but a patch that causes problems is
> absolutely very much suspect, and automatically should make any stable
> maintainer go "that needs much more analysis".
> 
> Sure, hopefully automation finds the fix too (ie commit 21b81716c6bf
> "ipr: Fix regression when loading firmware") in mainline.
> 
> It did have the proper "fixes" tag, so it should hopefully have been
> easy to find by the automation that stable people use.
> 
> But at the same time, I still  maintain that "just revert it" is
> rather likely the right solution for stable. If it had a bug once,
> maybe it shouldn't have been applied in the first place.
> 
> The author can then get notified by the other stable automation, and
> at that point argue for "yeah, it was buggy, but together with this
> other fix it's really important".
> 
> But even when that is the case, I really don't see that the author
> should complain about it being reverted. Because it's *such* a
> no-brainer in stable.

But this is going way off topic to what we were discussing. The
discussion is about what gets backported. Is automating the process
going to make stable better? Or is it likely to add more regressions.

Sasha's response has been that his automated process has the same rate
of regressions as what gets tagged by authors. My argument is that
perhaps authors should tag less to stable.

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 20:02                                                 ` Steven Rostedt
@ 2018-04-16 20:17                                                   ` Linus Torvalds
  2018-04-16 20:33                                                     ` Jiri Kosina
  2018-04-16 21:27                                                     ` Steven Rostedt
  0 siblings, 2 replies; 113+ messages in thread
From: Linus Torvalds @ 2018-04-16 20:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, Apr 16, 2018 at 1:02 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> But this is going way off topic to what we were discussing. The
> discussion is about what gets backported. Is automating the process
> going to make stable better? Or is it likely to add more regressions.
>
> Sasha's response has been that his automated process has the same rate
> of regressions as what gets tagged by authors. My argument is that
> perhaps authors should tag less to stable.

The ones who should matter most for that discussion is the distros,
since they are the actual users of stable (as well as the people doing
the work, of course - ie Sasha and Greg and the rest of the stable
gang).

And I suspect that they actually do want all the noise, and all the
stuff that isn't "critical". That's often the _easy_ choice. It's the
stuff that I suspect the stable maintainers go "this I don't even have
to think about", because it's a new driver ID or something.

Because the bulk of stable tends to be driver updates, afaik. Which
distros very much tend to want.

Will developers think that their patches matter so much that they
should go to stable? Yes they will. Will they overtag as a result?
Probably. But the reverse likely also happens, where people simply
don't think about stable at all, and just want to fix a bug.

In many ways "Fixes" is likely a better thing to check for in stable
backports, but that doesn't always exist either.

And just judging by the amount of stable email I get - and by how
excited _I_ would be about stable work, I think "automated process" is
simply not an option. It's a _requirement_. You'd go completely crazy
if you didn't automate 99% of all the stable work.

So can you trust the "Cc: stable" as being perfect? Hell no. But
what's your alternative? Manually selecting things for stable? Asking
the developers separately?

Because "criticality" definitely isn't what determines it. If it was,
we'd never add driver ID's etc to stable - they're clearly not
"critical".

Yet it feels like that's sometimes those driver things are the _bulk_
of it, and it is usually fairly safe (not quite as obviously safe as
you'd think, because a driver ID addition has occasionally meant not
just "now it's supported", but instead "now the generic driver doesn't
trigger for it any more", so it can actually break things).

So I think - and _hope_ - that 99% of stable should be the
non-critical stuff that people don't even need to think about.

The critical stuff is hopefully a tiny tiny percentage.

                           Linus

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 20:17                                                   ` Linus Torvalds
@ 2018-04-16 20:33                                                     ` Jiri Kosina
  2018-04-16 21:27                                                     ` Steven Rostedt
  1 sibling, 0 replies; 113+ messages in thread
From: Jiri Kosina @ 2018-04-16 20:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Sasha Levin, Pavel Machek, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo, Greg KH

On Mon, 16 Apr 2018, Linus Torvalds wrote:

> The ones who should matter most for that discussion is the distros,
> since they are the actual users of stable (as well as the people doing
> the work, of course - ie Sasha and Greg and the rest of the stable
> gang).
> 
> And I suspect that they actually do want all the noise, and all the
> stuff that isn't "critical". That's often the _easy_ choice. It's the
> stuff that I suspect the stable maintainers go "this I don't even have
> to think about", because it's a new driver ID or something.

So I am a maintainer of SUSE enterprise kernel, and I can tell you I 
personally really don't want all the noise, simply because

	(a) noone asked us to distribute it (if they did, we would do so)
	(b) the risk of regressions

We've always been very cautious about what is coming from stable, and 
actually filtering out patches we actively don't want for one reason or 
another.

And yes, there is also a history of regressions caused by stable updates 
that were painful for us ... I brought this up a multiple times at 
ksummit-discuss@ over past years.

So with the upcoming release, we've actually abandonded stable and are 
relying more on auto-processing the Fixes: tag.

That is not perfect in both ways (it doesn't cover everything, and we can 
miss complex semantical dependencies between patches even though they 
"apply"), but we didn't base our decision this time on aligning our 
schedule with stable, and so far we don't seem to be suffering. And we 
have much better overview / control over what is landing in our enterprise 
tree (of course this all is shepherded by machinery around processing 
Fixes: tag, which then though has to be *actively* acted upon, depending 
on a case-by-case human assessment of how critical it actually is).

> Because the bulk of stable tends to be driver updates, afaik. Which 
> distros very much tend to want.

For "community" distros (like Fedora, openSUSE), perhaps, yeah.

For "enterprise" kernels, quite frankly, we much rather get the driver 
updates/backports from the respective HW vedndors we're cooperating with, 
as they have actually tested and verified the backport on the metal.

> The critical stuff is hopefully a tiny tiny percentage.

But quite frankly, that's the only thing we as distro *really* want -- to 
save our users from hitting the critical issues with all the consequences 
(data loss, unbootable systems, etc). All the rest we can easily handle on 
a reactive basis, which heavily depends on the userbase spectrum (and 
that's probably completely different for each -stable tree consumer 
anyway).

This is a POV of me as an distro kernel maintainer, but mileage of others 
definitely can / will vary of course.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 20:17                                                   ` Linus Torvalds
  2018-04-16 20:33                                                     ` Jiri Kosina
@ 2018-04-16 21:27                                                     ` Steven Rostedt
  1 sibling, 0 replies; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 21:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sasha Levin, Pavel Machek, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, 16 Apr 2018 13:17:24 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Apr 16, 2018 at 1:02 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > But this is going way off topic to what we were discussing. The
> > discussion is about what gets backported. Is automating the process
> > going to make stable better? Or is it likely to add more regressions.
> >
> > Sasha's response has been that his automated process has the same rate
> > of regressions as what gets tagged by authors. My argument is that
> > perhaps authors should tag less to stable.  
> 
> The ones who should matter most for that discussion is the distros,
> since they are the actual users of stable (as well as the people doing
> the work, of course - ie Sasha and Greg and the rest of the stable
> gang).

That was actually my final conclusion before we started out
discussion ;-)

http://lkml.kernel.org/r/20180416143510.79ba5c63@gandalf.local.home

> 
> And I suspect that they actually do want all the noise, and all the
> stuff that isn't "critical". That's often the _easy_ choice. It's the
> stuff that I suspect the stable maintainers go "this I don't even have
> to think about", because it's a new driver ID or something.

Although Red Hat doesn't base off of the stable kernel. At least it
didn't when I was there. They may look at the stable kernel, but they
make their own decisions.

If we want the distros to use stable as the base, it should be the
least common factor among them. Otherwise, if stable includes commits
that a distro would rather not backport, then they wont use stable.

> 
> Because the bulk of stable tends to be driver updates, afaik. Which
> distros very much tend to want.
> 
> Will developers think that their patches matter so much that they
> should go to stable? Yes they will. Will they overtag as a result?
> Probably. But the reverse likely also happens, where people simply
> don't think about stable at all, and just want to fix a bug.
> 
> In many ways "Fixes" is likely a better thing to check for in stable
> backports, but that doesn't always exist either.
> 
> And just judging by the amount of stable email I get - and by how
> excited _I_ would be about stable work, I think "automated process" is
> simply not an option. It's a _requirement_. You'd go completely crazy
> if you didn't automate 99% of all the stable work.
> 
> So can you trust the "Cc: stable" as being perfect? Hell no. But
> what's your alternative? Manually selecting things for stable? Asking
> the developers separately?
> 
> Because "criticality" definitely isn't what determines it. If it was,
> we'd never add driver ID's etc to stable - they're clearly not
> "critical".

True. But I believe the driver ID's was given the "exception".


> 
> Yet it feels like that's sometimes those driver things are the _bulk_
> of it, and it is usually fairly safe (not quite as obviously safe as
> you'd think, because a driver ID addition has occasionally meant not
> just "now it's supported", but instead "now the generic driver doesn't
> trigger for it any more", so it can actually break things).
> 
> So I think - and _hope_ - that 99% of stable should be the
> non-critical stuff that people don't even need to think about.
> 
> The critical stuff is hopefully a tiny tiny percentage.

Well, I'm not sure that's really the case.

$ git log --oneline v4.14.33..v4.14.34 | head -20
ffebeb0d7c37 Linux 4.14.34
fdae5b620566 net/mlx4_core: Fix memory leak while delete slave's resources
9fdeb33e1913 vhost_net: add missing lock nesting notation
8c316b625705 team: move dev_mc_sync after master_upper_dev_link in team_port_add
233ba28e1862 route: check sysctl_fib_multipath_use_neigh earlier than hash
2f8aa659d4c0 vhost: validate log when IOTLB is enabled
72b880f43990 net/mlx5e: Fix traffic being dropped on VF representor
9408bceb0649 net/mlx4_en: Fix mixed PFC and Global pause user control requests
477c73abf26a strparser: Fix sign of err codes
1c71bfe84deb net/sched: fix NULL dereference on the error path of tcf_skbmod_init()
a19024a3f343 net/sched: fix NULL dereference in the error path of tunnel_key_init()
e096c8bf4fb8 net/mlx5e: Sync netdev vxlan ports at open
baab1f0c4885 net/mlx5e: Don't override vport admin link state in switchdev mode
1ec7966ab7db ipv6: sr: fix seg6 encap performances with TSO enabled
e52a45bb392f nfp: use full 40 bits of the NSP buffer address
ddf79878f1e0 net/mlx5e: Fix memory usage issues in offloading TC flows
9282181c1cc5 net/mlx5e: Avoid using the ipv6 stub in the TC offload neigh update path
b9c6ddda3805 vti6: better validate user provided tunnel names
109dce20c6ed ip6_tunnel: better validate user provided tunnel names
72363c63b070 ip6_gre: better validate user provided tunnel names

The majority of those appear to be on the critical side.

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 18:26                                 ` Steven Rostedt
  2018-04-16 18:30                                   ` Linus Torvalds
@ 2018-04-16 18:35                                   ` Sasha Levin
  2018-04-16 18:57                                     ` Steven Rostedt
  1 sibling, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 18:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, Apr 16, 2018 at 02:26:53PM -0400, Steven Rostedt wrote:
>On Mon, 16 Apr 2018 17:42:38 +0000
>Sasha Levin <Alexander.Levin@microsoft.com> wrote:
>> Also note that all of these patches were tagged for stable and actually
>> ended up in at least one tree.
>>
>> This is why I'm basing a lot of my decision making on the rejection rate.
>> If the AUTOSEL process does the job well enough as the "regular"
>> process did before, why push it back?
>
>Because I think we are adding too many patches to stable. And
>automating it may just make things worse. Your examples above back my
>argument more than they refute it. If people can't determine what is
>"obviously correct" how is automation going to do any better?

I don't understand that statament, it sounds illogical to me.

If I were to tell you that I have a crack team of 10 kernel hackers who
dig through all mainline commits to find commits that should be
backported to stable, and they do it with less mistakes than
authors/maintainers make when they tag their own commits, would I get the
same level of objection?

On the correctness side, I have another effort to improve the quality of
testing -stable commits get, but this is somewhat unrelated to the whole
automatic selection process.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 18:35                                   ` Sasha Levin
@ 2018-04-16 18:57                                     ` Steven Rostedt
  0 siblings, 0 replies; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 18:57 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Pavel Machek, Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Greg KH

On Mon, 16 Apr 2018 18:35:44 +0000
Sasha Levin <Alexander.Levin@microsoft.com> wrote:

> If I were to tell you that I have a crack team of 10 kernel hackers who
> dig through all mainline commits to find commits that should be
> backported to stable, and they do it with less mistakes than
> authors/maintainers make when they tag their own commits, would I get the
> same level of objection?

Probably ;-)

I've been struggling with my own stable tags, and been thinking that I
too suffer from tagging too much for stable, because there's code I
fix, and think "hmm, this could have some unwanted side effects". I'm
actually worried that my own fixes could cause an API breakage that I'm
unaware of.

What I'm staying is, I think we should start looking at fixes that fix
bugs we consider critical. Those being:

 * off-by-one
 * memory overflow
 * locking mismatch
 * API regressions

For my sub-system

 * wrong data coming out

Which can be a critical issue. Wrong data is worse than no data. But
then, there's the times a bug will produce no data, and considering
what it is, and how much of an effort it takes to fix it, I may or may
not label "no data" issues for stable. The cases where I enable
something with a bunch of parameters, and because of some mishandling
of the parameter it just screws up totally (where it's obvious that it
screwed up), I only mark those for stable if it doesn't require a
redesign of the code to fix it. There's been some cases where a
redesign was required, and I didn't mark it for stable.

The fixes for tracing that I don't usually tag for stable is when doing
complex tracing simply doesn't work and produces no data or errors
incorrectly. Depending on how complex the fix is, I mark it for stable,
otherwise, I think the fix is more likely to break something else that
is more common, then this hardly ever used feature.

The fact that nobody noticed, or hasn't complained about it usually
plays a lot in that decision. If someone complained to me about
breakage, I'm more likely to label it for stable. But if I discover it
myself, as I probably use the tracing system differently than others as
I wrote the code, then I don't usually mark it.

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 15:18         ` Linus Torvalds
  2018-04-16 15:30           ` Pavel Machek
@ 2018-04-16 15:36           ` Steven Rostedt
  2018-04-16 16:02             ` Sasha Levin
  2018-04-16 15:39           ` Sasha Levin
  2 siblings, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 15:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sasha Levin, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Pavel Machek

On Mon, 16 Apr 2018 08:18:09 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > I wonder if the "AUTOSEL" patches should at least have an "ack-by" from
> > someone before they are pulled in. Otherwise there may be some subtle
> > issues that can find their way into stable releases.  
> 
> I don't know about anybody else, but I  get so many of the patch-bot
> patches for stable etc that I will *not* reply to normal cases. Only
> if there's some issue with a patch will I reply.
> 
> I probably do get more than most, but still - requiring active
> participation for the steady flow of normal stable patches is almost
> pointless.
> 
> Just look at the subject line of this thread. The numbers are so big
> that you almost need exponential notation for them.
> 

I'm worried about just backporting patches that nobody actually looked
at. Is someone going through and vetting that these should definitely
be added to stable. I would like to have some trusted human (doesn't
even need to be the author or maintainer of the patch) to look at all
the patches before they are applied.

I would say anything more than a trivial patch would require author or
sub maintainer ack. Look at this patch, I don't think it should go to
stable, even though it does fix issues. But the fix is for systems
already having issues, and this keeps printk from making things worse.
The fix has side effects that other commits have addressed, and if this
patch gets backported, those other ones must too.

Maybe I was too strong by saying all patches should be acked, but
anything more than buffer overflows and off by one errors probably
require a bit more vetting by a human than to just pull in all patches
that a bot flags to be backported.

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 15:36           ` Steven Rostedt
@ 2018-04-16 16:02             ` Sasha Levin
  2018-04-16 16:10               ` Pavel Machek
  2018-04-16 16:12               ` Steven Rostedt
  0 siblings, 2 replies; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 16:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Pavel Machek

On Mon, Apr 16, 2018 at 11:36:29AM -0400, Steven Rostedt wrote:
>On Mon, 16 Apr 2018 08:18:09 -0700
>Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>> On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> >
>> > I wonder if the "AUTOSEL" patches should at least have an "ack-by" from
>> > someone before they are pulled in. Otherwise there may be some subtle
>> > issues that can find their way into stable releases.
>>
>> I don't know about anybody else, but I  get so many of the patch-bot
>> patches for stable etc that I will *not* reply to normal cases. Only
>> if there's some issue with a patch will I reply.
>>
>> I probably do get more than most, but still - requiring active
>> participation for the steady flow of normal stable patches is almost
>> pointless.
>>
>> Just look at the subject line of this thread. The numbers are so big
>> that you almost need exponential notation for them.
>>
>
>I'm worried about just backporting patches that nobody actually looked
>at. Is someone going through and vetting that these should definitely
>be added to stable. I would like to have some trusted human (doesn't
>even need to be the author or maintainer of the patch) to look at all
>the patches before they are applied.

I do go through every single commit sent this way and review it.
Sometimes things slip by, but it's not a fully automatic process.

Let's look at this patch as a concrete example: the only reason,
according to the stable rules, that it shouldn't go in -stable is that
it's longer than 100 lines.

Otherwise, it fixes a bug, it doesn't introduce any new features, it's
upstream, and so on. It had some fixes that went upstream as well?
Great, let's grab those as well.

>I would say anything more than a trivial patch would require author or
>sub maintainer ack. Look at this patch, I don't think it should go to
>stable, even though it does fix issues. But the fix is for systems
>already having issues, and this keeps printk from making things worse.
>The fix has side effects that other commits have addressed, and if this
>patch gets backported, those other ones must too.

Sure, let's get those patches in as well.

One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.

If anything, after this discussion I'd recommend that we take this patch
and it's follow-up fixes...

>Maybe I was too strong by saying all patches should be acked, but
>anything more than buffer overflows and off by one errors probably
>require a bit more vetting by a human than to just pull in all patches
>that a bot flags to be backported.

If anyone wants to give me a hand with going through these I'd be more
than happy to. I know that Ben Hutchings is looking at the ones that
land in 4.4 carefully. It's always good to have more than 1 set of eyes!

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:02             ` Sasha Levin
@ 2018-04-16 16:10               ` Pavel Machek
  2018-04-16 16:12               ` Steven Rostedt
  1 sibling, 0 replies; 113+ messages in thread
From: Pavel Machek @ 2018-04-16 16:10 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 2854 bytes --]

On Mon 2018-04-16 16:02:03, Sasha Levin wrote:
> On Mon, Apr 16, 2018 at 11:36:29AM -0400, Steven Rostedt wrote:
> >On Mon, 16 Apr 2018 08:18:09 -0700
> >Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >
> >> On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >> >
> >> > I wonder if the "AUTOSEL" patches should at least have an "ack-by" from
> >> > someone before they are pulled in. Otherwise there may be some subtle
> >> > issues that can find their way into stable releases.
> >>
> >> I don't know about anybody else, but I  get so many of the patch-bot
> >> patches for stable etc that I will *not* reply to normal cases. Only
> >> if there's some issue with a patch will I reply.
> >>
> >> I probably do get more than most, but still - requiring active
> >> participation for the steady flow of normal stable patches is almost
> >> pointless.
> >>
> >> Just look at the subject line of this thread. The numbers are so big
> >> that you almost need exponential notation for them.
> >>
> >
> >I'm worried about just backporting patches that nobody actually looked
> >at. Is someone going through and vetting that these should definitely
> >be added to stable. I would like to have some trusted human (doesn't
> >even need to be the author or maintainer of the patch) to look at all
> >the patches before they are applied.
> 
> I do go through every single commit sent this way and review it.
> Sometimes things slip by, but it's not a fully automatic process.
> 
> Let's look at this patch as a concrete example: the only reason,
> according to the stable rules, that it shouldn't go in -stable is that
> it's longer than 100 lines.
> 
> Otherwise, it fixes a bug, it doesn't introduce any new features, it's
> upstream, and so on. It had some fixes that went upstream as well?
> Great, let's grab those as well.
> 
> >I would say anything more than a trivial patch would require author or
> >sub maintainer ack. Look at this patch, I don't think it should go to
> >stable, even though it does fix issues. But the fix is for systems
> >already having issues, and this keeps printk from making things worse.
> >The fix has side effects that other commits have addressed, and if this
> >patch gets backported, those other ones must too.
> 
> Sure, let's get those patches in as well.
> 
> One of the things Greg is pushing strongly for is "bug compatibility":
> we want the kernel to behave the same way between mainline and stable.
> If the code is broken, it should be broken in the same way.

Maybe Greg should be Cced on this conversation?

Anyway, I don't think "bug compatibility" is a good goal.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:02             ` Sasha Levin
  2018-04-16 16:10               ` Pavel Machek
@ 2018-04-16 16:12               ` Steven Rostedt
  2018-04-16 16:19                 ` Sasha Levin
  1 sibling, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 16:12 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Pavel Machek

On Mon, 16 Apr 2018 16:02:03 +0000
Sasha Levin <Alexander.Levin@microsoft.com> wrote:

> One of the things Greg is pushing strongly for is "bug compatibility":
> we want the kernel to behave the same way between mainline and stable.
> If the code is broken, it should be broken in the same way.

Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:12               ` Steven Rostedt
@ 2018-04-16 16:19                 ` Sasha Levin
  2018-04-16 16:30                   ` Steven Rostedt
  2018-04-19 11:41                   ` Thomas Backlund
  0 siblings, 2 replies; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 16:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Pavel Machek

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
>On Mon, 16 Apr 2018 16:02:03 +0000
>Sasha Levin <Alexander.Levin@microsoft.com> wrote:
>
>> One of the things Greg is pushing strongly for is "bug compatibility":
>> we want the kernel to behave the same way between mainline and stable.
>> If the code is broken, it should be broken in the same way.
>
>Wait! What does that mean? What's the purpose of stable if it is as
>broken as mainline?

This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:19                 ` Sasha Levin
@ 2018-04-16 16:30                   ` Steven Rostedt
  2018-04-16 16:37                     ` Sasha Levin
  2018-04-19 11:41                   ` Thomas Backlund
  1 sibling, 1 reply; 113+ messages in thread
From: Steven Rostedt @ 2018-04-16 16:30 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Pavel Machek

On Mon, 16 Apr 2018 16:19:14 +0000
Sasha Levin <Alexander.Levin@microsoft.com> wrote:

> >Wait! What does that mean? What's the purpose of stable if it is as
> >broken as mainline?  
> 
> This just means that if there is a fix that went in mainline, and the
> fix is broken somehow, we'd rather take the broken fix than not.
> 
> In this scenario, *something* will be broken, it's just a matter of
> what. We'd rather have the same thing broken between mainline and
> stable.

Honestly, I think that removes all value of the stable series. I
remember when the stable series were first created. People were saying
that it wouldn't even get to more than 5 versions, because the bar for
backporting was suppose to be very high. Today it's just a fork of the
kernel at a given version. No more features, but we will be OK with
regressions. I'm struggling to see what the benefit of it is suppose to
be?

-- Steve

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:30                   ` Steven Rostedt
@ 2018-04-16 16:37                     ` Sasha Levin
  2018-04-16 17:06                       ` Pavel Machek
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 16:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Pavel Machek

On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote:
>On Mon, 16 Apr 2018 16:19:14 +0000
>Sasha Levin <Alexander.Levin@microsoft.com> wrote:
>
>> >Wait! What does that mean? What's the purpose of stable if it is as
>> >broken as mainline?
>>
>> This just means that if there is a fix that went in mainline, and the
>> fix is broken somehow, we'd rather take the broken fix than not.
>>
>> In this scenario, *something* will be broken, it's just a matter of
>> what. We'd rather have the same thing broken between mainline and
>> stable.
>
>Honestly, I think that removes all value of the stable series. I
>remember when the stable series were first created. People were saying
>that it wouldn't even get to more than 5 versions, because the bar for
>backporting was suppose to be very high. Today it's just a fork of the
>kernel at a given version. No more features, but we will be OK with
>regressions. I'm struggling to see what the benefit of it is suppose to
>be?

It's not "OK with regressions".

Let's look at a hypothetical example: You have a 4.15.1 kernel that has
a broken printf() behaviour so that when you:

	pr_err("%d", 5)

Would print:

	"Microsoft Rulez"

Bad, right? So you went ahead and fixed it, and now it prints "5" as you
might expect. But alas, with your patch, running:

	pr_err("%s", "hi!")

Would show a cat picture for 5 seconds.

Should we take your patch in -stable or not? If we don't, we're stuck
with the original issue while the mainline kernel will behave
differently, but if we do - we introduce a new regression.

So it's not the case that a -stable kernel will have *more* regression,
it'll just have similar ones to mainline.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:37                     ` Sasha Levin
@ 2018-04-16 17:06                       ` Pavel Machek
  2018-04-16 17:23                         ` Sasha Levin
  0 siblings, 1 reply; 113+ messages in thread
From: Pavel Machek @ 2018-04-16 17:06 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 2044 bytes --]

On Mon 2018-04-16 16:37:56, Sasha Levin wrote:
> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote:
> >On Mon, 16 Apr 2018 16:19:14 +0000
> >Sasha Levin <Alexander.Levin@microsoft.com> wrote:
> >
> >> >Wait! What does that mean? What's the purpose of stable if it is as
> >> >broken as mainline?
> >>
> >> This just means that if there is a fix that went in mainline, and the
> >> fix is broken somehow, we'd rather take the broken fix than not.
> >>
> >> In this scenario, *something* will be broken, it's just a matter of
> >> what. We'd rather have the same thing broken between mainline and
> >> stable.
> >
> >Honestly, I think that removes all value of the stable series. I
> >remember when the stable series were first created. People were saying
> >that it wouldn't even get to more than 5 versions, because the bar for
> >backporting was suppose to be very high. Today it's just a fork of the
> >kernel at a given version. No more features, but we will be OK with
> >regressions. I'm struggling to see what the benefit of it is suppose to
> >be?
> 
> It's not "OK with regressions".
> 
> Let's look at a hypothetical example: You have a 4.15.1 kernel that has
> a broken printf() behaviour so that when you:
> 
> 	pr_err("%d", 5)
> 
> Would print:
> 
> 	"Microsoft Rulez"
> 
> Bad, right? So you went ahead and fixed it, and now it prints "5" as you
> might expect. But alas, with your patch, running:
> 
> 	pr_err("%s", "hi!")
> 
> Would show a cat picture for 5 seconds.
> 
> Should we take your patch in -stable or not? If we don't, we're stuck
> with the original issue while the mainline kernel will behave
> differently, but if we do - we introduce a new regression.

Of course not.

- It must be obviously correct and tested.

If it introduces new bug, it is not correct, and certainly not
obviously correct.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 17:06                       ` Pavel Machek
@ 2018-04-16 17:23                         ` Sasha Levin
  2018-04-17 11:41                           ` Jan Kara
  2018-05-03  9:32                           ` Pavel Machek
  0 siblings, 2 replies; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 17:23 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote:
>On Mon 2018-04-16 16:37:56, Sasha Levin wrote:
>> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote:
>> >On Mon, 16 Apr 2018 16:19:14 +0000
>> >Sasha Levin <Alexander.Levin@microsoft.com> wrote:
>> >
>> >> >Wait! What does that mean? What's the purpose of stable if it is as
>> >> >broken as mainline?
>> >>
>> >> This just means that if there is a fix that went in mainline, and the
>> >> fix is broken somehow, we'd rather take the broken fix than not.
>> >>
>> >> In this scenario, *something* will be broken, it's just a matter of
>> >> what. We'd rather have the same thing broken between mainline and
>> >> stable.
>> >
>> >Honestly, I think that removes all value of the stable series. I
>> >remember when the stable series were first created. People were saying
>> >that it wouldn't even get to more than 5 versions, because the bar for
>> >backporting was suppose to be very high. Today it's just a fork of the
>> >kernel at a given version. No more features, but we will be OK with
>> >regressions. I'm struggling to see what the benefit of it is suppose to
>> >be?
>>
>> It's not "OK with regressions".
>>
>> Let's look at a hypothetical example: You have a 4.15.1 kernel that has
>> a broken printf() behaviour so that when you:
>>
>> 	pr_err("%d", 5)
>>
>> Would print:
>>
>> 	"Microsoft Rulez"
>>
>> Bad, right? So you went ahead and fixed it, and now it prints "5" as you
>> might expect. But alas, with your patch, running:
>>
>> 	pr_err("%s", "hi!")
>>
>> Would show a cat picture for 5 seconds.
>>
>> Should we take your patch in -stable or not? If we don't, we're stuck
>> with the original issue while the mainline kernel will behave
>> differently, but if we do - we introduce a new regression.
>
>Of course not.
>
>- It must be obviously correct and tested.
>
>If it introduces new bug, it is not correct, and certainly not
>obviously correct.

As you might have noticed, we don't strictly follow the rules.

Take a look at the whole PTI story as an example. It's way more than 100
lines, it's not obviously corrent, it fixed more than 1 thing, and so
on, and yet it went in -stable!

Would you argue we shouldn't have backported PTI to -stable?

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 17:23                         ` Sasha Levin
@ 2018-04-17 11:41                           ` Jan Kara
  2018-04-17 13:31                             ` Sasha Levin
  2018-05-03  9:32                           ` Pavel Machek
  1 sibling, 1 reply; 113+ messages in thread
From: Jan Kara @ 2018-04-17 11:41 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Pavel Machek, Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Mon 16-04-18 17:23:30, Sasha Levin wrote:
> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote:
> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote:
> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote:
> >> >On Mon, 16 Apr 2018 16:19:14 +0000
> >> >Sasha Levin <Alexander.Levin@microsoft.com> wrote:
> >> >
> >> >> >Wait! What does that mean? What's the purpose of stable if it is as
> >> >> >broken as mainline?
> >> >>
> >> >> This just means that if there is a fix that went in mainline, and the
> >> >> fix is broken somehow, we'd rather take the broken fix than not.
> >> >>
> >> >> In this scenario, *something* will be broken, it's just a matter of
> >> >> what. We'd rather have the same thing broken between mainline and
> >> >> stable.
> >> >
> >> >Honestly, I think that removes all value of the stable series. I
> >> >remember when the stable series were first created. People were saying
> >> >that it wouldn't even get to more than 5 versions, because the bar for
> >> >backporting was suppose to be very high. Today it's just a fork of the
> >> >kernel at a given version. No more features, but we will be OK with
> >> >regressions. I'm struggling to see what the benefit of it is suppose to
> >> >be?
> >>
> >> It's not "OK with regressions".
> >>
> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has
> >> a broken printf() behaviour so that when you:
> >>
> >> 	pr_err("%d", 5)
> >>
> >> Would print:
> >>
> >> 	"Microsoft Rulez"
> >>
> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you
> >> might expect. But alas, with your patch, running:
> >>
> >> 	pr_err("%s", "hi!")
> >>
> >> Would show a cat picture for 5 seconds.
> >>
> >> Should we take your patch in -stable or not? If we don't, we're stuck
> >> with the original issue while the mainline kernel will behave
> >> differently, but if we do - we introduce a new regression.
> >
> >Of course not.
> >
> >- It must be obviously correct and tested.
> >
> >If it introduces new bug, it is not correct, and certainly not
> >obviously correct.
> 
> As you might have noticed, we don't strictly follow the rules.
> 
> Take a look at the whole PTI story as an example. It's way more than 100
> lines, it's not obviously corrent, it fixed more than 1 thing, and so
> on, and yet it went in -stable!
> 
> Would you argue we shouldn't have backported PTI to -stable?

So I agree with that being backported. But I think this nicely demostrates
a point some people are trying to make in this thread. We do take fixes
with high risk or regression if they fix serious enough issue. Also we do
take fixes to non-serious stuff (such as addition of device ID) if the
chances of regression are really low.

So IMHO the metric for including the fix is not solely "how annoying to
user this can be" but rather something like:

score = (how annoying the bug is) * ((1 / (chance of regression due to
	including this)) - 1)^3

(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying'
and 'regression chance' parts are subjective and sometimes difficult to
estimate so don't take the formula too seriously but it demonstrates the
point. I think we all agree we want to fix annoying stuff and we don't want
regressions. But you need to somehow weight this over your expected
userbase - and this is where your argument "but someone might be annoyed by
LEDs not working so let's include it" has problems - it should rather be
"is the annoyance of non-working leds over expected user base high enough
to risk a regression due to this patch for someone in the expected user
base"? The answer to this second question is not clear at all to a casual
reviewer and that's why we IMHO have CC stable tag as maintainer is
supposed to have at least a bit better clue.

Another point I wanted to make is that if chance a patch causes a
regression is about 2% as you said somewhere else in a thread, then by
adding 20 patches that "may fix a bug that is annoying for someone" you've
just increased a chance there's a regression in the release by 34%. And
this is not just a math game, this also roughly matches a real experience
with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such
regression chance? And I also note that for a regression to get reported so
that it gets included into your 2% estimate of a patch regression rate,
someone must be bothered enough by it to triage it and send an email
somewhere so that already falls into a category of "serious" stuff to me.

So these are the reasons why I think that merging tons of patches into
stable isn't actually very good. 

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 11:41                           ` Jan Kara
@ 2018-04-17 13:31                             ` Sasha Levin
  2018-04-17 15:55                               ` Jan Kara
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-17 13:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: Pavel Machek, Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa,
	Byungchul Park, Tejun Heo

On Tue, Apr 17, 2018 at 01:41:44PM +0200, Jan Kara wrote:
>On Mon 16-04-18 17:23:30, Sasha Levin wrote:
>> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote:
>> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote:
>> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote:
>> >> >On Mon, 16 Apr 2018 16:19:14 +0000
>> >> >Sasha Levin <Alexander.Levin@microsoft.com> wrote:
>> >> >
>> >> >> >Wait! What does that mean? What's the purpose of stable if it is as
>> >> >> >broken as mainline?
>> >> >>
>> >> >> This just means that if there is a fix that went in mainline, and the
>> >> >> fix is broken somehow, we'd rather take the broken fix than not.
>> >> >>
>> >> >> In this scenario, *something* will be broken, it's just a matter of
>> >> >> what. We'd rather have the same thing broken between mainline and
>> >> >> stable.
>> >> >
>> >> >Honestly, I think that removes all value of the stable series. I
>> >> >remember when the stable series were first created. People were saying
>> >> >that it wouldn't even get to more than 5 versions, because the bar for
>> >> >backporting was suppose to be very high. Today it's just a fork of the
>> >> >kernel at a given version. No more features, but we will be OK with
>> >> >regressions. I'm struggling to see what the benefit of it is suppose to
>> >> >be?
>> >>
>> >> It's not "OK with regressions".
>> >>
>> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has
>> >> a broken printf() behaviour so that when you:
>> >>
>> >> 	pr_err("%d", 5)
>> >>
>> >> Would print:
>> >>
>> >> 	"Microsoft Rulez"
>> >>
>> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you
>> >> might expect. But alas, with your patch, running:
>> >>
>> >> 	pr_err("%s", "hi!")
>> >>
>> >> Would show a cat picture for 5 seconds.
>> >>
>> >> Should we take your patch in -stable or not? If we don't, we're stuck
>> >> with the original issue while the mainline kernel will behave
>> >> differently, but if we do - we introduce a new regression.
>> >
>> >Of course not.
>> >
>> >- It must be obviously correct and tested.
>> >
>> >If it introduces new bug, it is not correct, and certainly not
>> >obviously correct.
>>
>> As you might have noticed, we don't strictly follow the rules.
>>
>> Take a look at the whole PTI story as an example. It's way more than 100
>> lines, it's not obviously corrent, it fixed more than 1 thing, and so
>> on, and yet it went in -stable!
>>
>> Would you argue we shouldn't have backported PTI to -stable?
>
>So I agree with that being backported. But I think this nicely demostrates
>a point some people are trying to make in this thread. We do take fixes
>with high risk or regression if they fix serious enough issue. Also we do
>take fixes to non-serious stuff (such as addition of device ID) if the
>chances of regression are really low.
>
>So IMHO the metric for including the fix is not solely "how annoying to
>user this can be" but rather something like:
>
>score = (how annoying the bug is) * ((1 / (chance of regression due to
>	including this)) - 1)^3
>
>(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying'
>and 'regression chance' parts are subjective and sometimes difficult to
>estimate so don't take the formula too seriously but it demonstrates the
>point. I think we all agree we want to fix annoying stuff and we don't want
>regressions. But you need to somehow weight this over your expected
>userbase - and this is where your argument "but someone might be annoyed by
>LEDs not working so let's include it" has problems - it should rather be
>"is the annoyance of non-working leds over expected user base high enough
>to risk a regression due to this patch for someone in the expected user
>base"? The answer to this second question is not clear at all to a casual
>reviewer and that's why we IMHO have CC stable tag as maintainer is
>supposed to have at least a bit better clue.

We may be able to guesstimate the 'regression chance', but there's no
way we can guess the 'annoyance' once. There are so many different use
cases that we just can't even guess how many people would get "annoyed"
by something.

Even regression chance is tricky, look at the commits I've linked
earlier in the thread. Even the most trivial looking commits that end up
in stable have a chance for regression.

>Another point I wanted to make is that if chance a patch causes a
>regression is about 2% as you said somewhere else in a thread, then by
>adding 20 patches that "may fix a bug that is annoying for someone" you've
>just increased a chance there's a regression in the release by 34%. And

So I've said that the rejection rate is less than 2%. This includes
all commits that I have proposed for -stable, but didn't end up being
included in -stable.

This includes commits that the author/maintainers NACKed, commits that
didn't do anything on older kernels, commits that were buggy but were
caught before the kernel was released, commits that failed to build on
an arch I didn't test it on originally and so on.

After thousands of merged AUTOSEL patches I can count the number of
times a commit has caused a regression and had to be removed on one
hand.

>this is not just a math game, this also roughly matches a real experience
>with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such
>regression chance? And I also note that for a regression to get reported so
>that it gets included into your 2% estimate of a patch regression rate,
>someone must be bothered enough by it to triage it and send an email
>somewhere so that already falls into a category of "serious" stuff to me.

It is indeed a numbers game, but the regression rate isn't 2%, it's
closer to 0.05%.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 13:31                             ` Sasha Levin
@ 2018-04-17 15:55                               ` Jan Kara
  2018-04-17 16:19                                 ` Sasha Levin
  0 siblings, 1 reply; 113+ messages in thread
From: Jan Kara @ 2018-04-17 15:55 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Jan Kara, Pavel Machek, Steven Rostedt, Linus Torvalds,
	Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa,
	Byungchul Park, Tejun Heo

On Tue 17-04-18 13:31:51, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 01:41:44PM +0200, Jan Kara wrote:
> >On Mon 16-04-18 17:23:30, Sasha Levin wrote:
> >> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote:
> >> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote:
> >> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote:
> >> >> >On Mon, 16 Apr 2018 16:19:14 +0000
> >> >> >Sasha Levin <Alexander.Levin@microsoft.com> wrote:
> >> >> >
> >> >> >> >Wait! What does that mean? What's the purpose of stable if it is as
> >> >> >> >broken as mainline?
> >> >> >>
> >> >> >> This just means that if there is a fix that went in mainline, and the
> >> >> >> fix is broken somehow, we'd rather take the broken fix than not.
> >> >> >>
> >> >> >> In this scenario, *something* will be broken, it's just a matter of
> >> >> >> what. We'd rather have the same thing broken between mainline and
> >> >> >> stable.
> >> >> >
> >> >> >Honestly, I think that removes all value of the stable series. I
> >> >> >remember when the stable series were first created. People were saying
> >> >> >that it wouldn't even get to more than 5 versions, because the bar for
> >> >> >backporting was suppose to be very high. Today it's just a fork of the
> >> >> >kernel at a given version. No more features, but we will be OK with
> >> >> >regressions. I'm struggling to see what the benefit of it is suppose to
> >> >> >be?
> >> >>
> >> >> It's not "OK with regressions".
> >> >>
> >> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has
> >> >> a broken printf() behaviour so that when you:
> >> >>
> >> >> 	pr_err("%d", 5)
> >> >>
> >> >> Would print:
> >> >>
> >> >> 	"Microsoft Rulez"
> >> >>
> >> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you
> >> >> might expect. But alas, with your patch, running:
> >> >>
> >> >> 	pr_err("%s", "hi!")
> >> >>
> >> >> Would show a cat picture for 5 seconds.
> >> >>
> >> >> Should we take your patch in -stable or not? If we don't, we're stuck
> >> >> with the original issue while the mainline kernel will behave
> >> >> differently, but if we do - we introduce a new regression.
> >> >
> >> >Of course not.
> >> >
> >> >- It must be obviously correct and tested.
> >> >
> >> >If it introduces new bug, it is not correct, and certainly not
> >> >obviously correct.
> >>
> >> As you might have noticed, we don't strictly follow the rules.
> >>
> >> Take a look at the whole PTI story as an example. It's way more than 100
> >> lines, it's not obviously corrent, it fixed more than 1 thing, and so
> >> on, and yet it went in -stable!
> >>
> >> Would you argue we shouldn't have backported PTI to -stable?
> >
> >So I agree with that being backported. But I think this nicely demostrates
> >a point some people are trying to make in this thread. We do take fixes
> >with high risk or regression if they fix serious enough issue. Also we do
> >take fixes to non-serious stuff (such as addition of device ID) if the
> >chances of regression are really low.
> >
> >So IMHO the metric for including the fix is not solely "how annoying to
> >user this can be" but rather something like:
> >
> >score = (how annoying the bug is) * ((1 / (chance of regression due to
> >	including this)) - 1)^3
> >
> >(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying'
> >and 'regression chance' parts are subjective and sometimes difficult to
> >estimate so don't take the formula too seriously but it demonstrates the
> >point. I think we all agree we want to fix annoying stuff and we don't want
> >regressions. But you need to somehow weight this over your expected
> >userbase - and this is where your argument "but someone might be annoyed by
> >LEDs not working so let's include it" has problems - it should rather be
> >"is the annoyance of non-working leds over expected user base high enough
> >to risk a regression due to this patch for someone in the expected user
> >base"? The answer to this second question is not clear at all to a casual
> >reviewer and that's why we IMHO have CC stable tag as maintainer is
> >supposed to have at least a bit better clue.
> 
> We may be able to guesstimate the 'regression chance', but there's no
> way we can guess the 'annoyance' once. There are so many different use
> cases that we just can't even guess how many people would get "annoyed"
> by something.

As a maintainer, I hope I have reasonable idea what are common use cases
for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
know all of the use cases so people doing unusual stuff hit more bugs and
have to report them to get fixes included in -stable. But for me this is a
preferable tradeoff over the risk of regression so this is the rule I use
when tagging for stable. Now I'm not a -stable maintainer and I fully agree
with "those who do the work decide" principle so pick whatever patches you
think are appropriate, I just wanted explain why I don't think more patches
in stable are necessarily good. 

> Even regression chance is tricky, look at the commits I've linked
> earlier in the thread. Even the most trivial looking commits that end up
> in stable have a chance for regression.

Sure, you can never be certain and I think people (including me)
underestimate the chance of regressions for "trivial" patches. But you just
estimate a chance, you may be lucky, you may not...

> >Another point I wanted to make is that if chance a patch causes a
> >regression is about 2% as you said somewhere else in a thread, then by
> >adding 20 patches that "may fix a bug that is annoying for someone" you've
> >just increased a chance there's a regression in the release by 34%. And
> 
> So I've said that the rejection rate is less than 2%. This includes
> all commits that I have proposed for -stable, but didn't end up being
> included in -stable.
> 
> This includes commits that the author/maintainers NACKed, commits that
> didn't do anything on older kernels, commits that were buggy but were
> caught before the kernel was released, commits that failed to build on
> an arch I didn't test it on originally and so on.
> 
> After thousands of merged AUTOSEL patches I can count the number of
> times a commit has caused a regression and had to be removed on one
> hand.
> 
> >this is not just a math game, this also roughly matches a real experience
> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such
> >regression chance? And I also note that for a regression to get reported so
> >that it gets included into your 2% estimate of a patch regression rate,
> >someone must be bothered enough by it to triage it and send an email
> >somewhere so that already falls into a category of "serious" stuff to me.
> 
> It is indeed a numbers game, but the regression rate isn't 2%, it's
> closer to 0.05%.

Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14
stable tree suggests some 13 commits were reverted from stable due to bugs.
That's some 0.4% and that doesn't count fixes that were applied to
fix other regressions.

But the actual numbers don't really matter that much, in principle the more
patches you add the higher is the chance of regression. You can't change
that so you better have a good reason to include a patch...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 15:55                               ` Jan Kara
@ 2018-04-17 16:19                                 ` Sasha Levin
  2018-04-17 17:57                                   ` Jan Kara
  2018-05-03  9:36                                   ` Pavel Machek
  0 siblings, 2 replies; 113+ messages in thread
From: Sasha Levin @ 2018-04-17 16:19 UTC (permalink / raw)
  To: Jan Kara
  Cc: Pavel Machek, Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa,
	Byungchul Park, Tejun Heo

On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
>On Tue 17-04-18 13:31:51, Sasha Levin wrote:
>> We may be able to guesstimate the 'regression chance', but there's no
>> way we can guess the 'annoyance' once. There are so many different use
>> cases that we just can't even guess how many people would get "annoyed"
>> by something.
>
>As a maintainer, I hope I have reasonable idea what are common use cases
>for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
>know all of the use cases so people doing unusual stuff hit more bugs and
>have to report them to get fixes included in -stable. But for me this is a
>preferable tradeoff over the risk of regression so this is the rule I use
>when tagging for stable. Now I'm not a -stable maintainer and I fully agree
>with "those who do the work decide" principle so pick whatever patches you
>think are appropriate, I just wanted explain why I don't think more patches
>in stable are necessarily good.

The AUTOSEL story is different for subsystems that don't do -stable, and
subsystems that are actually doing the work (like yourself).

I'm not trying to override active maintainers, I'm trying to help them
make decisions.

The AUTOSEL bot will attempt to apply any patch it deems as -stable for
on all -stable branches, finding possible dependencies, build them, and
run any tests that you might deem necessary.

You would be able to start your analysis without "wasting" time on doing
a bunch of "manual labor".

There's a big difference between subsystems like yours and most of the
rest of the kernel.

>> Even regression chance is tricky, look at the commits I've linked
>> earlier in the thread. Even the most trivial looking commits that end up
>> in stable have a chance for regression.
>
>Sure, you can never be certain and I think people (including me)
>underestimate the chance of regressions for "trivial" patches. But you just
>estimate a chance, you may be lucky, you may not...
>
>> >Another point I wanted to make is that if chance a patch causes a
>> >regression is about 2% as you said somewhere else in a thread, then by
>> >adding 20 patches that "may fix a bug that is annoying for someone" you've
>> >just increased a chance there's a regression in the release by 34%. And
>>
>> So I've said that the rejection rate is less than 2%. This includes
>> all commits that I have proposed for -stable, but didn't end up being
>> included in -stable.
>>
>> This includes commits that the author/maintainers NACKed, commits that
>> didn't do anything on older kernels, commits that were buggy but were
>> caught before the kernel was released, commits that failed to build on
>> an arch I didn't test it on originally and so on.
>>
>> After thousands of merged AUTOSEL patches I can count the number of
>> times a commit has caused a regression and had to be removed on one
>> hand.
>>
>> >this is not just a math game, this also roughly matches a real experience
>> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such
>> >regression chance? And I also note that for a regression to get reported so
>> >that it gets included into your 2% estimate of a patch regression rate,
>> >someone must be bothered enough by it to triage it and send an email
>> >somewhere so that already falls into a category of "serious" stuff to me.
>>
>> It is indeed a numbers game, but the regression rate isn't 2%, it's
>> closer to 0.05%.
>
>Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14
>stable tree suggests some 13 commits were reverted from stable due to bugs.
>That's some 0.4% and that doesn't count fixes that were applied to
>fix other regressions.

0.05% is for commits that were merged in stable but later fixed or
reverted because they introduced a regression. By grepping for reverts
you also include things such as:

 - Reverts of commits that were in the corresponding mainline tree
 - Reverts of commits that didn't introduce regressions

>But the actual numbers don't really matter that much, in principle the more
>patches you add the higher is the chance of regression. You can't change
>that so you better have a good reason to include a patch...

You increase the chance of regressions, but you also increase the chance
of fixing bugs that affect users.

My claim is that the chance to fix bugs increases far more significantly
than the chance to introduce regressions.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 16:19                                 ` Sasha Levin
@ 2018-04-17 17:57                                   ` Jan Kara
  2018-04-17 18:28                                     ` Sasha Levin
  2018-05-03  9:36                                   ` Pavel Machek
  1 sibling, 1 reply; 113+ messages in thread
From: Jan Kara @ 2018-04-17 17:57 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Jan Kara, Pavel Machek, Steven Rostedt, Linus Torvalds,
	Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa,
	Byungchul Park, Tejun Heo

On Tue 17-04-18 16:19:35, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
> >> Even regression chance is tricky, look at the commits I've linked
> >> earlier in the thread. Even the most trivial looking commits that end up
> >> in stable have a chance for regression.
> >
> >Sure, you can never be certain and I think people (including me)
> >underestimate the chance of regressions for "trivial" patches. But you just
> >estimate a chance, you may be lucky, you may not...
> >
> >> >Another point I wanted to make is that if chance a patch causes a
> >> >regression is about 2% as you said somewhere else in a thread, then by
> >> >adding 20 patches that "may fix a bug that is annoying for someone" you've
> >> >just increased a chance there's a regression in the release by 34%. And
> >>
> >> So I've said that the rejection rate is less than 2%. This includes
> >> all commits that I have proposed for -stable, but didn't end up being
> >> included in -stable.
> >>
> >> This includes commits that the author/maintainers NACKed, commits that
> >> didn't do anything on older kernels, commits that were buggy but were
> >> caught before the kernel was released, commits that failed to build on
> >> an arch I didn't test it on originally and so on.
> >>
> >> After thousands of merged AUTOSEL patches I can count the number of
> >> times a commit has caused a regression and had to be removed on one
> >> hand.
> >>
> >> >this is not just a math game, this also roughly matches a real experience
> >> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such
> >> >regression chance? And I also note that for a regression to get reported so
> >> >that it gets included into your 2% estimate of a patch regression rate,
> >> >someone must be bothered enough by it to triage it and send an email
> >> >somewhere so that already falls into a category of "serious" stuff to me.
> >>
> >> It is indeed a numbers game, but the regression rate isn't 2%, it's
> >> closer to 0.05%.
> >
> >Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14
> >stable tree suggests some 13 commits were reverted from stable due to bugs.
> >That's some 0.4% and that doesn't count fixes that were applied to
> >fix other regressions.
> 
> 0.05% is for commits that were merged in stable but later fixed or
> reverted because they introduced a regression. By grepping for reverts
> you also include things such as:
> 
>  - Reverts of commits that were in the corresponding mainline tree
>  - Reverts of commits that didn't introduce regressions

Actually I was careful enough to include only commits that got merged as
part of the stable process into 4.14.x but got later reverted in 4.14.y.
That's where the 0.4% number came from. So I believe all of those cases
(13 in absolute numbers) were user visible regressions during the stable
process.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 17:57                                   ` Jan Kara
@ 2018-04-17 18:28                                     ` Sasha Levin
  0 siblings, 0 replies; 113+ messages in thread
From: Sasha Levin @ 2018-04-17 18:28 UTC (permalink / raw)
  To: Jan Kara
  Cc: Pavel Machek, Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa,
	Byungchul Park, Tejun Heo

On Tue, Apr 17, 2018 at 07:57:54PM +0200, Jan Kara wrote:
>Actually I was careful enough to include only commits that got merged as
>part of the stable process into 4.14.x but got later reverted in 4.14.y.
>That's where the 0.4% number came from. So I believe all of those cases
>(13 in absolute numbers) were user visible regressions during the stable
>process.

I looked at them, and there are 2 things in play here:

 - Quite a few of those reverts are because of the PTI work. I'm not
   sure how we treat it, but yes - it skews statistics here.
 - 2 of them were reverts for device tree changes for a device that
   didn't exist in 4.14, and shouldn't have had any user visible
   changes.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-17 16:19                                 ` Sasha Levin
  2018-04-17 17:57                                   ` Jan Kara
@ 2018-05-03  9:36                                   ` Pavel Machek
  2018-05-03 13:28                                     ` Sasha Levin
  1 sibling, 1 reply; 113+ messages in thread
From: Pavel Machek @ 2018-05-03  9:36 UTC (permalink / raw)
  To: Sasha Levin, jacek.anaszewski, Rafael J. Wysocki
  Cc: Jan Kara, Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa,
	Byungchul Park, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 1579 bytes --]

On Tue 2018-04-17 16:19:35, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
> >On Tue 17-04-18 13:31:51, Sasha Levin wrote:
> >> We may be able to guesstimate the 'regression chance', but there's no
> >> way we can guess the 'annoyance' once. There are so many different use
> >> cases that we just can't even guess how many people would get "annoyed"
> >> by something.
> >
> >As a maintainer, I hope I have reasonable idea what are common use cases
> >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
> >know all of the use cases so people doing unusual stuff hit more bugs and
> >have to report them to get fixes included in -stable. But for me this is a
> >preferable tradeoff over the risk of regression so this is the rule I use
> >when tagging for stable. Now I'm not a -stable maintainer and I fully agree
> >with "those who do the work decide" principle so pick whatever patches you
> >think are appropriate, I just wanted explain why I don't think more patches
> >in stable are necessarily good.
> 
> The AUTOSEL story is different for subsystems that don't do -stable, and
> subsystems that are actually doing the work (like yourself).
> 
> I'm not trying to override active maintainers, I'm trying to help them
> make decisions.

Ok, cool. Can you exclude LED subsystem, Hibernation and Nokia N900
stuff from autosel work?

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-05-03  9:36                                   ` Pavel Machek
@ 2018-05-03 13:28                                     ` Sasha Levin
  0 siblings, 0 replies; 113+ messages in thread
From: Sasha Levin @ 2018-05-03 13:28 UTC (permalink / raw)
  To: Pavel Machek
  Cc: jacek.anaszewski@gmail.com, Rafael J. Wysocki, Jan Kara,
	Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa,
	Byungchul Park, Tejun Heo

On Thu, May 03, 2018 at 11:36:51AM +0200, Pavel Machek wrote:
>On Tue 2018-04-17 16:19:35, Sasha Levin wrote:
>> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
>> >On Tue 17-04-18 13:31:51, Sasha Levin wrote:
>> >> We may be able to guesstimate the 'regression chance', but there's no
>> >> way we can guess the 'annoyance' once. There are so many different use
>> >> cases that we just can't even guess how many people would get "annoyed"
>> >> by something.
>> >
>> >As a maintainer, I hope I have reasonable idea what are common use cases
>> >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
>> >know all of the use cases so people doing unusual stuff hit more bugs and
>> >have to report them to get fixes included in -stable. But for me this is a
>> >preferable tradeoff over the risk of regression so this is the rule I use
>> >when tagging for stable. Now I'm not a -stable maintainer and I fully agree
>> >with "those who do the work decide" principle so pick whatever patches you
>> >think are appropriate, I just wanted explain why I don't think more patches
>> >in stable are necessarily good.
>>
>> The AUTOSEL story is different for subsystems that don't do -stable, and
>> subsystems that are actually doing the work (like yourself).
>>
>> I'm not trying to override active maintainers, I'm trying to help them
>> make decisions.
>
>Ok, cool. Can you exclude LED subsystem, Hibernation and Nokia N900
>stuff from autosel work?

Curiousity got me, and I had to see what these subsystems do as far as
stable commits:

$ git log --oneline --grep 'stable@vger' --since="01-01-2016" kernel/power drivers/leds drivers/media/i2c/et8ek8 drivers/media/i2c/ad5820.c arch/x86/kernel/acpi/ | wc -l
7

Which got me a bit surprised: maybe indeed leds is mostly fine, but
hibernation is definitely tricky, I've been stung by it a few times...

So why not pick something an actual user reported, and see how that was
dealt with?

Googling first showed this:

	https://bugzilla.kernel.org/show_bug.cgi?id=97201

Which was fixed by:

	https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bdbc98abb3aa323f6323b11db39c740e6f8fc5b1

But that's not in any -stable tree. Hmm.. ok..

Next one on google was:

	https://bugzilla.kernel.org/show_bug.cgi?id=117971

Which, in turn, was fixed by:

	https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b3f249c94ce1f46bacd9814385b0ee2d1ae52f3

Oh look at that, it's not in -stable either...

So seeing how you have concerns with my selection of -stable commits,
maybe you could explain to me why these commits didn't end up in
-stable?

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 17:23                         ` Sasha Levin
  2018-04-17 11:41                           ` Jan Kara
@ 2018-05-03  9:32                           ` Pavel Machek
  2018-05-03 13:30                             ` Sasha Levin
  1 sibling, 1 reply; 113+ messages in thread
From: Pavel Machek @ 2018-05-03  9:32 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 956 bytes --]

Hi!

> >- It must be obviously correct and tested.
> >
> >If it introduces new bug, it is not correct, and certainly not
> >obviously correct.
> 
> As you might have noticed, we don't strictly follow the rules.

Yes, I noticed. And what I'm saying is that perhaps you should follow
the rules more strictly.

> Take a look at the whole PTI story as an example. It's way more than 100
> lines, it's not obviously corrent, it fixed more than 1 thing, and so
> on, and yet it went in -stable!
> 
> Would you argue we shouldn't have backported PTI to -stable?

Actually, I was surprised with PTI going to stable. That was clearly
against the rules. Maybe the security bug was ugly enough to warrant
that.

But please don't use it as an argument for applying any random
patches...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-05-03  9:32                           ` Pavel Machek
@ 2018-05-03 13:30                             ` Sasha Levin
  0 siblings, 0 replies; 113+ messages in thread
From: Sasha Levin @ 2018-05-03 13:30 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo

On Thu, May 03, 2018 at 11:32:15AM +0200, Pavel Machek wrote:
>Hi!
>
>> >- It must be obviously correct and tested.
>> >
>> >If it introduces new bug, it is not correct, and certainly not
>> >obviously correct.
>>
>> As you might have noticed, we don't strictly follow the rules.
>
>Yes, I noticed. And what I'm saying is that perhaps you should follow
>the rules more strictly.

Again, this was stated many times by Greg and others, the rules are not
there to be strictly followed.

>> Take a look at the whole PTI story as an example. It's way more than 100
>> lines, it's not obviously corrent, it fixed more than 1 thing, and so
>> on, and yet it went in -stable!
>>
>> Would you argue we shouldn't have backported PTI to -stable?
>
>Actually, I was surprised with PTI going to stable. That was clearly
>against the rules. Maybe the security bug was ugly enough to warrant
>that.
>
>But please don't use it as an argument for applying any random
>patches...

How about this: if a -stable maintainer has concerns with how I follow
the -stable rules, he's more than welcome to reject my patches. Sounds
like a plan?

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 16:19                 ` Sasha Levin
  2018-04-16 16:30                   ` Steven Rostedt
@ 2018-04-19 11:41                   ` Thomas Backlund
  2018-04-19 13:59                     ` Greg KH
  1 sibling, 1 reply; 113+ messages in thread
From: Thomas Backlund @ 2018-04-19 11:41 UTC (permalink / raw)
  To: Sasha Levin, Steven Rostedt
  Cc: Linus Torvalds, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Pavel Machek

Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
>> On Mon, 16 Apr 2018 16:02:03 +0000
>> Sasha Levin <Alexander.Levin@microsoft.com> wrote:
>>
>>> One of the things Greg is pushing strongly for is "bug compatibility":
>>> we want the kernel to behave the same way between mainline and stable.
>>> If the code is broken, it should be broken in the same way.
>>
>> Wait! What does that mean? What's the purpose of stable if it is as
>> broken as mainline?
> 
> This just means that if there is a fix that went in mainline, and the
> fix is broken somehow, we'd rather take the broken fix than not.
> 
> In this scenario, *something* will be broken, it's just a matter of
> what. We'd rather have the same thing broken between mainline and
> stable.
> 

Yeah, but _intentionally_ breaking existing setups to stay "bug 
compatible" _is_ a _regression_ you _really_ _dont_ want in a stable
supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch 
that carries "AUTOSEL", follow all the mail threads that comes up about 
it, then track if it landed in -stable queue, and read every response 
and possible objection to all patches in the -stable queue a second time 
around... then check if it still got included in final stable point 
relase and then either revert them in distro kernel or go track down all 
the follow-up fixes needed...

Just to avoid being "bug compatible with master"

--
Thomas

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-19 11:41                   ` Thomas Backlund
@ 2018-04-19 13:59                     ` Greg KH
  2018-04-19 14:05                       ` Jan Kara
  2018-04-19 15:04                       ` Thomas Backlund
  0 siblings, 2 replies; 113+ messages in thread
From: Greg KH @ 2018-04-19 13:59 UTC (permalink / raw)
  To: Thomas Backlund
  Cc: Sasha Levin, Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek

On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > On Mon, 16 Apr 2018 16:02:03 +0000
> > > Sasha Levin <Alexander.Levin@microsoft.com> wrote:
> > > 
> > > > One of the things Greg is pushing strongly for is "bug compatibility":
> > > > we want the kernel to behave the same way between mainline and stable.
> > > > If the code is broken, it should be broken in the same way.
> > > 
> > > Wait! What does that mean? What's the purpose of stable if it is as
> > > broken as mainline?
> > 
> > This just means that if there is a fix that went in mainline, and the
> > fix is broken somehow, we'd rather take the broken fix than not.
> > 
> > In this scenario, *something* will be broken, it's just a matter of
> > what. We'd rather have the same thing broken between mainline and
> > stable.
> > 
> 
> Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
> _is_ a _regression_ you _really_ _dont_ want in a stable
> supported distro. Because end-users dont care about upstream breaking
> stuff... its the distro that takes the heat for that...
> 
> Something "already broken" is not a regression...
> 
> As distro maintainer that means one now have to review _every_ patch that
> carries "AUTOSEL", follow all the mail threads that comes up about it, then
> track if it landed in -stable queue, and read every response and possible
> objection to all patches in the -stable queue a second time around... then
> check if it still got included in final stable point relase and then either
> revert them in distro kernel or go track down all the follow-up fixes
> needed...
> 
> Just to avoid being "bug compatible with master"

I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.  Look, we just did that for one platform for
4.9.94!  And the key to all of this is good testing, which we are now
doing, and hopefully you are also doing as well.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-19 13:59                     ` Greg KH
@ 2018-04-19 14:05                       ` Jan Kara
  2018-04-19 14:22                         ` Greg KH
  2018-04-19 15:04                       ` Thomas Backlund
  1 sibling, 1 reply; 113+ messages in thread
From: Jan Kara @ 2018-04-19 14:05 UTC (permalink / raw)
  To: Greg KH
  Cc: Thomas Backlund, Sasha Levin, Steven Rostedt, Linus Torvalds,
	Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek

On Thu 19-04-18 15:59:43, Greg KH wrote:
> On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > On Mon, 16 Apr 2018 16:02:03 +0000
> > > > Sasha Levin <Alexander.Levin@microsoft.com> wrote:
> > > > 
> > > > > One of the things Greg is pushing strongly for is "bug compatibility":
> > > > > we want the kernel to behave the same way between mainline and stable.
> > > > > If the code is broken, it should be broken in the same way.
> > > > 
> > > > Wait! What does that mean? What's the purpose of stable if it is as
> > > > broken as mainline?
> > > 
> > > This just means that if there is a fix that went in mainline, and the
> > > fix is broken somehow, we'd rather take the broken fix than not.
> > > 
> > > In this scenario, *something* will be broken, it's just a matter of
> > > what. We'd rather have the same thing broken between mainline and
> > > stable.
> > > 
> > 
> > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
> > _is_ a _regression_ you _really_ _dont_ want in a stable
> > supported distro. Because end-users dont care about upstream breaking
> > stuff... its the distro that takes the heat for that...
> > 
> > Something "already broken" is not a regression...
> > 
> > As distro maintainer that means one now have to review _every_ patch that
> > carries "AUTOSEL", follow all the mail threads that comes up about it, then
> > track if it landed in -stable queue, and read every response and possible
> > objection to all patches in the -stable queue a second time around... then
> > check if it still got included in final stable point relase and then either
> > revert them in distro kernel or go track down all the follow-up fixes
> > needed...
> > 
> > Just to avoid being "bug compatible with master"
> 
> I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> has in the past, so you had better also be reviewing all of my normal
> commits as well :)
> 
> Anyway, we are trying not to do this, but it does, and will,
> occasionally happen.

Sure, that's understood. So this was just misunderstanding. Sasha's
original comment really sounded like "bug compatibility" with current
master is desirable and that made me go "Are you serious?" as well...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-19 14:05                       ` Jan Kara
@ 2018-04-19 14:22                         ` Greg KH
  2018-04-19 15:16                           ` Thomas Backlund
  2018-04-19 16:41                           ` Greg KH
  0 siblings, 2 replies; 113+ messages in thread
From: Greg KH @ 2018-04-19 14:22 UTC (permalink / raw)
  To: Jan Kara
  Cc: Thomas Backlund, Sasha Levin, Steven Rostedt, Linus Torvalds,
	Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa,
	Byungchul Park, Tejun Heo, Pavel Machek

On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
> On Thu 19-04-18 15:59:43, Greg KH wrote:
> > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > > On Mon, 16 Apr 2018 16:02:03 +0000
> > > > > Sasha Levin <Alexander.Levin@microsoft.com> wrote:
> > > > > 
> > > > > > One of the things Greg is pushing strongly for is "bug compatibility":
> > > > > > we want the kernel to behave the same way between mainline and stable.
> > > > > > If the code is broken, it should be broken in the same way.
> > > > > 
> > > > > Wait! What does that mean? What's the purpose of stable if it is as
> > > > > broken as mainline?
> > > > 
> > > > This just means that if there is a fix that went in mainline, and the
> > > > fix is broken somehow, we'd rather take the broken fix than not.
> > > > 
> > > > In this scenario, *something* will be broken, it's just a matter of
> > > > what. We'd rather have the same thing broken between mainline and
> > > > stable.
> > > > 
> > > 
> > > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
> > > _is_ a _regression_ you _really_ _dont_ want in a stable
> > > supported distro. Because end-users dont care about upstream breaking
> > > stuff... its the distro that takes the heat for that...
> > > 
> > > Something "already broken" is not a regression...
> > > 
> > > As distro maintainer that means one now have to review _every_ patch that
> > > carries "AUTOSEL", follow all the mail threads that comes up about it, then
> > > track if it landed in -stable queue, and read every response and possible
> > > objection to all patches in the -stable queue a second time around... then
> > > check if it still got included in final stable point relase and then either
> > > revert them in distro kernel or go track down all the follow-up fixes
> > > needed...
> > > 
> > > Just to avoid being "bug compatible with master"
> > 
> > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> > has in the past, so you had better also be reviewing all of my normal
> > commits as well :)
> > 
> > Anyway, we are trying not to do this, but it does, and will,
> > occasionally happen.
> 
> Sure, that's understood. So this was just misunderstanding. Sasha's
> original comment really sounded like "bug compatibility" with current
> master is desirable and that made me go "Are you serious?" as well...

As I said before in this thread, yes, sometimes I do this on purpose.

As an specific example, see a recent bluetooth patch that caused a
regression on some chromebook devices.  The chromeos developers
rightfully complainied, and I left the commit in there to provide the
needed "leverage" on the upstream developers to fix this properly.
Otherwise if I had reverted the stable patch, when people move to a
newer kernel version, things break, and no one remembers why.

I also wrote a long response as to _why_ I do this, and even though it
does happen, why it still is worth taking the stable updates.  Please
see the archives for the full details.  I don't want to duplicate this
again here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-19 14:22                         ` Greg KH
@ 2018-04-19 15:16                           ` Thomas Backlund
  2018-04-19 15:57                             ` Greg KH
  2018-04-19 16:41                           ` Greg KH
  1 sibling, 1 reply; 113+ messages in thread
From: Thomas Backlund @ 2018-04-19 15:16 UTC (permalink / raw)
  To: Greg KH, Jan Kara
  Cc: Thomas Backlund, Sasha Levin, Steven Rostedt, Linus Torvalds,
	Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa,
	Byungchul Park, Tejun Heo, Pavel Machek

Den 19.04.2018 kl. 17:22, skrev Greg KH:
> On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
>> On Thu 19-04-18 15:59:43, Greg KH wrote:
>>> On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
>>>> Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
>>>>> On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
>>>>>> On Mon, 16 Apr 2018 16:02:03 +0000
>>>>>> Sasha Levin <Alexander.Levin@microsoft.com> wrote:
>>>>>>
>>>>>>> One of the things Greg is pushing strongly for is "bug compatibility":
>>>>>>> we want the kernel to behave the same way between mainline and stable.
>>>>>>> If the code is broken, it should be broken in the same way.
>>>>>>
>>>>>> Wait! What does that mean? What's the purpose of stable if it is as
>>>>>> broken as mainline?
>>>>>
>>>>> This just means that if there is a fix that went in mainline, and the
>>>>> fix is broken somehow, we'd rather take the broken fix than not.
>>>>>
>>>>> In this scenario, *something* will be broken, it's just a matter of
>>>>> what. We'd rather have the same thing broken between mainline and
>>>>> stable.
>>>>>
>>>>
>>>> Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
>>>> _is_ a _regression_ you _really_ _dont_ want in a stable
>>>> supported distro. Because end-users dont care about upstream breaking
>>>> stuff... its the distro that takes the heat for that...
>>>>
>>>> Something "already broken" is not a regression...
>>>>
>>>> As distro maintainer that means one now have to review _every_ patch that
>>>> carries "AUTOSEL", follow all the mail threads that comes up about it, then
>>>> track if it landed in -stable queue, and read every response and possible
>>>> objection to all patches in the -stable queue a second time around... then
>>>> check if it still got included in final stable point relase and then either
>>>> revert them in distro kernel or go track down all the follow-up fixes
>>>> needed...
>>>>
>>>> Just to avoid being "bug compatible with master"
>>>
>>> I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
>>> has in the past, so you had better also be reviewing all of my normal
>>> commits as well :)
>>>
>>> Anyway, we are trying not to do this, but it does, and will,
>>> occasionally happen.
>>
>> Sure, that's understood. So this was just misunderstanding. Sasha's
>> original comment really sounded like "bug compatibility" with current
>> master is desirable and that made me go "Are you serious?" as well...
> 
> As I said before in this thread, yes, sometimes I do this on purpose.
> 

And I guess this is the one that gets people the feeling that
"stable is not as stable as it used to be" ...

> As an specific example, see a recent bluetooth patch that caused a
> regression on some chromebook devices.  The chromeos developers
> rightfully complainied, and I left the commit in there to provide the
> needed "leverage" on the upstream developers to fix this properly.
> Otherwise if I had reverted the stable patch, when people move to a
> newer kernel version, things break, and no one remembers why.

I do understand what you are trying to do...

But from my distro hat on I have to treat things differently (and I dont 
think I'm alone doing it this way...)

Known breakages gets reverted even before it hits QA, so endusers wont 
see the issue at all...

So the only ones to see the issue are those building with latest 
upstream without own patches applied...

> 
> I also wrote a long response as to _why_ I do this, and even though it
> does happen, why it still is worth taking the stable updates.  Please
> see the archives for the full details.  I don't want to duplicate this
> again here.

And we do use latest stable (with some delay as I dont want to overload 
QA & endusers with a new kernel every week :))

We just revert known broken (or add known fixes) before releasing them 
to our users

--
Thomas

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-19 15:16                           ` Thomas Backlund
@ 2018-04-19 15:57                             ` Greg KH
  2018-04-19 16:25                               ` Thomas Backlund
  0 siblings, 1 reply; 113+ messages in thread
From: Greg KH @ 2018-04-19 15:57 UTC (permalink / raw)
  To: Thomas Backlund
  Cc: Jan Kara, Sasha Levin, Steven Rostedt, Linus Torvalds,
	Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa,
	Byungchul Park, Tejun Heo, Pavel Machek

On Thu, Apr 19, 2018 at 06:16:26PM +0300, Thomas Backlund wrote:
> Den 19.04.2018 kl. 17:22, skrev Greg KH:
> > On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
> > > On Thu 19-04-18 15:59:43, Greg KH wrote:
> > > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > > > > On Mon, 16 Apr 2018 16:02:03 +0000
> > > > > > > Sasha Levin <Alexander.Levin@microsoft.com> wrote:
> > > > > > > 
> > > > > > > > One of the things Greg is pushing strongly for is "bug compatibility":
> > > > > > > > we want the kernel to behave the same way between mainline and stable.
> > > > > > > > If the code is broken, it should be broken in the same way.
> > > > > > > 
> > > > > > > Wait! What does that mean? What's the purpose of stable if it is as
> > > > > > > broken as mainline?
> > > > > > 
> > > > > > This just means that if there is a fix that went in mainline, and the
> > > > > > fix is broken somehow, we'd rather take the broken fix than not.
> > > > > > 
> > > > > > In this scenario, *something* will be broken, it's just a matter of
> > > > > > what. We'd rather have the same thing broken between mainline and
> > > > > > stable.
> > > > > > 
> > > > > 
> > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
> > > > > _is_ a _regression_ you _really_ _dont_ want in a stable
> > > > > supported distro. Because end-users dont care about upstream breaking
> > > > > stuff... its the distro that takes the heat for that...
> > > > > 
> > > > > Something "already broken" is not a regression...
> > > > > 
> > > > > As distro maintainer that means one now have to review _every_ patch that
> > > > > carries "AUTOSEL", follow all the mail threads that comes up about it, then
> > > > > track if it landed in -stable queue, and read every response and possible
> > > > > objection to all patches in the -stable queue a second time around... then
> > > > > check if it still got included in final stable point relase and then either
> > > > > revert them in distro kernel or go track down all the follow-up fixes
> > > > > needed...
> > > > > 
> > > > > Just to avoid being "bug compatible with master"
> > > > 
> > > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> > > > has in the past, so you had better also be reviewing all of my normal
> > > > commits as well :)
> > > > 
> > > > Anyway, we are trying not to do this, but it does, and will,
> > > > occasionally happen.
> > > 
> > > Sure, that's understood. So this was just misunderstanding. Sasha's
> > > original comment really sounded like "bug compatibility" with current
> > > master is desirable and that made me go "Are you serious?" as well...
> > 
> > As I said before in this thread, yes, sometimes I do this on purpose.
> > 
> 
> And I guess this is the one that gets people the feeling that
> "stable is not as stable as it used to be" ...

It's always been this way, it's just that no one noticed :)

> > As an specific example, see a recent bluetooth patch that caused a
> > regression on some chromebook devices.  The chromeos developers
> > rightfully complainied, and I left the commit in there to provide the
> > needed "leverage" on the upstream developers to fix this properly.
> > Otherwise if I had reverted the stable patch, when people move to a
> > newer kernel version, things break, and no one remembers why.
> 
> I do understand what you are trying to do...
> 
> But from my distro hat on I have to treat things differently (and I dont
> think I'm alone doing it this way...)
> 
> Known breakages gets reverted even before it hits QA, so endusers wont see
> the issue at all...
> 
> So the only ones to see the issue are those building with latest upstream
> without own patches applied...
> 
> > 
> > I also wrote a long response as to _why_ I do this, and even though it
> > does happen, why it still is worth taking the stable updates.  Please
> > see the archives for the full details.  I don't want to duplicate this
> > again here.
> 
> And we do use latest stable (with some delay as I dont want to overload QA &
> endusers with a new kernel every week :))

You need to automate your QA :)

> We just revert known broken (or add known fixes) before releasing them to
> our users

That's great, and is what you should be doing, nothing wrong there.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-19 15:57                             ` Greg KH
@ 2018-04-19 16:25                               ` Thomas Backlund
  0 siblings, 0 replies; 113+ messages in thread
From: Thomas Backlund @ 2018-04-19 16:25 UTC (permalink / raw)
  To: Greg KH, Thomas Backlund
  Cc: Jan Kara, Sasha Levin, Steven Rostedt, Linus Torvalds,
	Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa,
	Byungchul Park, Tejun Heo, Pavel Machek

Den 19.04.2018 kl. 18:57, skrev Greg KH:
> On Thu, Apr 19, 2018 at 06:16:26PM +0300, Thomas Backlund wrote:
>> Den 19.04.2018 kl. 17:22, skrev Greg KH:
>>> On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
>>>> On Thu 19-04-18 15:59:43, Greg KH wrote:
>>>>> On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
>>>>>> Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
>>>>>>> On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
>>>>>>>> On Mon, 16 Apr 2018 16:02:03 +0000
>>>>>>>> Sasha Levin <Alexander.Levin@microsoft.com> wrote:
>>>>>>>>
>>>>>>>>> One of the things Greg is pushing strongly for is "bug compatibility":
>>>>>>>>> we want the kernel to behave the same way between mainline and stable.
>>>>>>>>> If the code is broken, it should be broken in the same way.
>>>>>>>>
>>>>>>>> Wait! What does that mean? What's the purpose of stable if it is as
>>>>>>>> broken as mainline?
>>>>>>>
>>>>>>> This just means that if there is a fix that went in mainline, and the
>>>>>>> fix is broken somehow, we'd rather take the broken fix than not.
>>>>>>>
>>>>>>> In this scenario, *something* will be broken, it's just a matter of
>>>>>>> what. We'd rather have the same thing broken between mainline and
>>>>>>> stable.
>>>>>>>
>>>>>>
>>>>>> Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
>>>>>> _is_ a _regression_ you _really_ _dont_ want in a stable
>>>>>> supported distro. Because end-users dont care about upstream breaking
>>>>>> stuff... its the distro that takes the heat for that...
>>>>>>
>>>>>> Something "already broken" is not a regression...
>>>>>>
>>>>>> As distro maintainer that means one now have to review _every_ patch that
>>>>>> carries "AUTOSEL", follow all the mail threads that comes up about it, then
>>>>>> track if it landed in -stable queue, and read every response and possible
>>>>>> objection to all patches in the -stable queue a second time around... then
>>>>>> check if it still got included in final stable point relase and then either
>>>>>> revert them in distro kernel or go track down all the follow-up fixes
>>>>>> needed...
>>>>>>
>>>>>> Just to avoid being "bug compatible with master"
>>>>>
>>>>> I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
>>>>> has in the past, so you had better also be reviewing all of my normal
>>>>> commits as well :)
>>>>>
>>>>> Anyway, we are trying not to do this, but it does, and will,
>>>>> occasionally happen.
>>>>
>>>> Sure, that's understood. So this was just misunderstanding. Sasha's
>>>> original comment really sounded like "bug compatibility" with current
>>>> master is desirable and that made me go "Are you serious?" as well...
>>>
>>> As I said before in this thread, yes, sometimes I do this on purpose.
>>>
>>
>> And I guess this is the one that gets people the feeling that
>> "stable is not as stable as it used to be" ...
> 
> It's always been this way, it's just that no one noticed :)
>

:)


>>> As an specific example, see a recent bluetooth patch that caused a
>>> regression on some chromebook devices.  The chromeos developers
>>> rightfully complainied, and I left the commit in there to provide the
>>> needed "leverage" on the upstream developers to fix this properly.
>>> Otherwise if I had reverted the stable patch, when people move to a
>>> newer kernel version, things break, and no one remembers why.
>>
>> I do understand what you are trying to do...
>>
>> But from my distro hat on I have to treat things differently (and I dont
>> think I'm alone doing it this way...)
>>
>> Known breakages gets reverted even before it hits QA, so endusers wont see
>> the issue at all...
>>
>> So the only ones to see the issue are those building with latest upstream
>> without own patches applied...
>>
>>>
>>> I also wrote a long response as to _why_ I do this, and even though it
>>> does happen, why it still is worth taking the stable updates.  Please
>>> see the archives for the full details.  I don't want to duplicate this
>>> again here.
>>
>> And we do use latest stable (with some delay as I dont want to overload QA &
>> endusers with a new kernel every week :))
> 
> You need to automate your QA :)
> 

Yeah, some can be automated... but that means having a lot of different 
hw to test on... emulators/vms can only test so much...

users part of QA test on a variety of hw with various installs/setups 
that exposes fun things with some hw :)


>> We just revert known broken (or add known fixes) before releasing them to
>> our users
> 
> That's great, and is what you should be doing, nothing wrong there.
> 
> thanks,
> 
> greg k-h
> 

--
Thomas

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-19 14:22                         ` Greg KH
  2018-04-19 15:16                           ` Thomas Backlund
@ 2018-04-19 16:41                           ` Greg KH
  1 sibling, 0 replies; 113+ messages in thread
From: Greg KH @ 2018-04-19 16:41 UTC (permalink / raw)
  To: Jan Kara
  Cc: Thomas Backlund, Sasha Levin, Steven Rostedt, Linus Torvalds,
	Petr Mladek, stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Mathieu Desnoyers, Tetsuo Handa,
	Byungchul Park, Tejun Heo, Pavel Machek

On Thu, Apr 19, 2018 at 04:22:22PM +0200, Greg KH wrote:
> On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
> > On Thu 19-04-18 15:59:43, Greg KH wrote:
> > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > > > On Mon, 16 Apr 2018 16:02:03 +0000
> > > > > > Sasha Levin <Alexander.Levin@microsoft.com> wrote:
> > > > > > 
> > > > > > > One of the things Greg is pushing strongly for is "bug compatibility":
> > > > > > > we want the kernel to behave the same way between mainline and stable.
> > > > > > > If the code is broken, it should be broken in the same way.
> > > > > > 
> > > > > > Wait! What does that mean? What's the purpose of stable if it is as
> > > > > > broken as mainline?
> > > > > 
> > > > > This just means that if there is a fix that went in mainline, and the
> > > > > fix is broken somehow, we'd rather take the broken fix than not.
> > > > > 
> > > > > In this scenario, *something* will be broken, it's just a matter of
> > > > > what. We'd rather have the same thing broken between mainline and
> > > > > stable.
> > > > > 
> > > > 
> > > > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
> > > > _is_ a _regression_ you _really_ _dont_ want in a stable
> > > > supported distro. Because end-users dont care about upstream breaking
> > > > stuff... its the distro that takes the heat for that...
> > > > 
> > > > Something "already broken" is not a regression...
> > > > 
> > > > As distro maintainer that means one now have to review _every_ patch that
> > > > carries "AUTOSEL", follow all the mail threads that comes up about it, then
> > > > track if it landed in -stable queue, and read every response and possible
> > > > objection to all patches in the -stable queue a second time around... then
> > > > check if it still got included in final stable point relase and then either
> > > > revert them in distro kernel or go track down all the follow-up fixes
> > > > needed...
> > > > 
> > > > Just to avoid being "bug compatible with master"
> > > 
> > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> > > has in the past, so you had better also be reviewing all of my normal
> > > commits as well :)
> > > 
> > > Anyway, we are trying not to do this, but it does, and will,
> > > occasionally happen.
> > 
> > Sure, that's understood. So this was just misunderstanding. Sasha's
> > original comment really sounded like "bug compatibility" with current
> > master is desirable and that made me go "Are you serious?" as well...
> 
> As I said before in this thread, yes, sometimes I do this on purpose.
> 
> As an specific example, see a recent bluetooth patch that caused a
> regression on some chromebook devices.  The chromeos developers
> rightfully complainied, and I left the commit in there to provide the
> needed "leverage" on the upstream developers to fix this properly.
> Otherwise if I had reverted the stable patch, when people move to a
> newer kernel version, things break, and no one remembers why.
> 
> I also wrote a long response as to _why_ I do this, and even though it
> does happen, why it still is worth taking the stable updates.  Please
> see the archives for the full details.  I don't want to duplicate this
> again here.

And to be more specific, let's always take this as a case-by-case basis.
The last time this happened was the bluetooth bug and it was a fix for a
reported problem, but then the fix caused a regression so upstream
reverted it and I reverted it in the stable trees.  No matter what I
chose to do, someone would be upset so I followed what upstream did.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-19 13:59                     ` Greg KH
  2018-04-19 14:05                       ` Jan Kara
@ 2018-04-19 15:04                       ` Thomas Backlund
  2018-04-19 15:09                         ` Sasha Levin
  1 sibling, 1 reply; 113+ messages in thread
From: Thomas Backlund @ 2018-04-19 15:04 UTC (permalink / raw)
  To: Greg KH, Thomas Backlund
  Cc: Sasha Levin, Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek

Den 19.04.2018 kl. 16:59, skrev Greg KH:
> On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
>> Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
>>> On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
>>>> On Mon, 16 Apr 2018 16:02:03 +0000
>>>> Sasha Levin <Alexander.Levin@microsoft.com> wrote:
>>>>
>>>>> One of the things Greg is pushing strongly for is "bug compatibility":
>>>>> we want the kernel to behave the same way between mainline and stable.
>>>>> If the code is broken, it should be broken in the same way.
>>>>
>>>> Wait! What does that mean? What's the purpose of stable if it is as
>>>> broken as mainline?
>>>
>>> This just means that if there is a fix that went in mainline, and the
>>> fix is broken somehow, we'd rather take the broken fix than not.
>>>
>>> In this scenario, *something* will be broken, it's just a matter of
>>> what. We'd rather have the same thing broken between mainline and
>>> stable.
>>>
>>
>> Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
>> _is_ a _regression_ you _really_ _dont_ want in a stable
>> supported distro. Because end-users dont care about upstream breaking
>> stuff... its the distro that takes the heat for that...
>>
>> Something "already broken" is not a regression...
>>
>> As distro maintainer that means one now have to review _every_ patch that
>> carries "AUTOSEL", follow all the mail threads that comes up about it, then
>> track if it landed in -stable queue, and read every response and possible
>> objection to all patches in the -stable queue a second time around... then
>> check if it still got included in final stable point relase and then either
>> revert them in distro kernel or go track down all the follow-up fixes
>> needed...
>>
>> Just to avoid being "bug compatible with master"
> 
> I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> has in the past, so you had better also be reviewing all of my normal
> commits as well :)
> 

Yeah, I do... and same goes there ... if there is a known issue, then 
same procedure... Either revert, or try to track down fixes...


> Anyway, we are trying not to do this, but it does, and will,
> occasionally happen.  Look, we just did that for one platform for
> 4.9.94!  And the key to all of this is good testing, which we are now
> doing, and hopefully you are also doing as well.

Yeah, but having to test stuff with known breakages is no fun, so we try 
to avoid that

--
Thomas

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-19 15:04                       ` Thomas Backlund
@ 2018-04-19 15:09                         ` Sasha Levin
  2018-04-19 16:20                           ` Thomas Backlund
  0 siblings, 1 reply; 113+ messages in thread
From: Sasha Levin @ 2018-04-19 15:09 UTC (permalink / raw)
  To: Thomas Backlund
  Cc: Greg KH, Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek

On Thu, Apr 19, 2018 at 06:04:26PM +0300, Thomas Backlund wrote:
>Den 19.04.2018 kl. 16:59, skrev Greg KH:
>>Anyway, we are trying not to do this, but it does, and will,
>>occasionally happen.  Look, we just did that for one platform for
>>4.9.94!  And the key to all of this is good testing, which we are now
>>doing, and hopefully you are also doing as well.
>
>Yeah, but having to test stuff with known breakages is no fun, so we 
>try to avoid that

Known breakages are easier to deal with than unknown ones :)

I think that that "bug compatability" is basically a policy on *which*
regressions you'll see vs *if* you'll see a regression.

We'll never pull in a commit that introduces a bug but doesn't fix
another one, right? So if you have to deal with a regression anyway,
might as well deal with a regression that is also seen on mainline, so
that when you upgrade your stable kernel you'll keep the same set of
regressions to deal with.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-19 15:09                         ` Sasha Levin
@ 2018-04-19 16:20                           ` Thomas Backlund
  0 siblings, 0 replies; 113+ messages in thread
From: Thomas Backlund @ 2018-04-19 16:20 UTC (permalink / raw)
  To: Sasha Levin, Thomas Backlund
  Cc: Greg KH, Steven Rostedt, Linus Torvalds, Petr Mladek,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, Byungchul Park, Tejun Heo, Pavel Machek

Den 19.04.2018 kl. 18:09, skrev Sasha Levin:
> On Thu, Apr 19, 2018 at 06:04:26PM +0300, Thomas Backlund wrote:
>> Den 19.04.2018 kl. 16:59, skrev Greg KH:
>>> Anyway, we are trying not to do this, but it does, and will,
>>> occasionally happen.  Look, we just did that for one platform for
>>> 4.9.94!  And the key to all of this is good testing, which we are now
>>> doing, and hopefully you are also doing as well.
>>
>> Yeah, but having to test stuff with known breakages is no fun, so we
>> try to avoid that
> 
> Known breakages are easier to deal with than unknown ones :)

well, if a system worked before the update, but not after...
Guess wich one we want...

> 
> I think that that "bug compatability" is basically a policy on *which*
> regressions you'll see vs *if* you'll see a regression.
> 

No. Intentionally breaking known working code in a stable branch is 
never ok.

As I said before... something that never worked is not a regression,
but breaking a previously working setup is...

That goes for security fixes too... there is not much point in a 
security fix, if it basically turns into a local DOS when the system 
stops working...

People will just boot older code and there you have it...

> We'll never pull in a commit that introduces a bug but doesn't fix
> another one, right? So if you have to deal with a regression anyway,
> might as well deal with a regression that is also seen on mainline, so
> that when you upgrade your stable kernel you'll keep the same set of
> regressions to deal with.
> 

Here I actually like the comment Linus posted about API breakage earlier 
in this thread...

<quote>
If you break user workflows, NOTHING ELSE MATTERS.

Even security is secondary to "people don't use the end result,
because it doesn't work for them any more".
</quote>

_This_ same statement should be aknowledged / enforced in stable trees 
too IMHO...

Because this is what will happend...

simple logic... if it does not work, the enduser will boot an earlier 
kernel... missing "all the good fixes" (including security ones) just
because one fix is bad.

For example in this AUTOSEL round there is 161 fixes of wich the enduser
never gets the 160 "supposedly good ones" when one is "bad"...

How is that a "good thing" ?

And trying to tell those that get hit "this will force upstream to fix 
it faster, so you get a working setup in some days/weeks/months..." is
not going to work...

Heh, This even reminds me that this is just as annoying as when MS
started to "bundle monthly security updates" and you get 95% installed
just to realize that the last 5% does not work (or install at all) and
you have to rollback to something working thus missing the needed
security fixes...

Same flawed logic...

Thnakfully we as distro maintainers can avoid some of the breakage for 
our enduses...

--
Thomas

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
  2018-04-16 15:18         ` Linus Torvalds
  2018-04-16 15:30           ` Pavel Machek
  2018-04-16 15:36           ` Steven Rostedt
@ 2018-04-16 15:39           ` Sasha Levin
  2 siblings, 0 replies; 113+ messages in thread
From: Sasha Levin @ 2018-04-16 15:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Petr Mladek, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, Byungchul Park,
	Tejun Heo, Pavel Machek

On Mon, Apr 16, 2018 at 08:18:09AM -0700, Linus Torvalds wrote:
>On Mon, Apr 16, 2018 at 6:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>>
>> I wonder if the "AUTOSEL" patches should at least have an "ack-by" from
>> someone before they are pulled in. Otherwise there may be some subtle
>> issues that can find their way into stable releases.
>
>I don't know about anybody else, but I  get so many of the patch-bot
>patches for stable etc that I will *not* reply to normal cases. Only
>if there's some issue with a patch will I reply.
>
>I probably do get more than most, but still - requiring active
>participation for the steady flow of normal stable patches is almost
>pointless.
>
>Just look at the subject line of this thread. The numbers are so big
>that you almost need exponential notation for them.
>
>           Linus

I would be more than happy to make this an opt-in process on my end, but
given the responses I've been seeing from folks so far I doubt it'll
work for many people. Humans don't scale :)

There are a few statistics that suggest that the current workflow is
"good enough":

	1. The rejection rate (commits fixed or reverted) for
	AUTOSEL commits is similar (actually smaller) than commits
	tagged for -stable.

	2. Human response rate on review requests is higher than the
	rate Greg is getting with his review mails. This is somewhat
	expected, but it shows that people do what Linus does and reply
	just when they see something wrong.

I also think that using mailing lists for these is bringing up the
limitations of mailing lists. It's hard to go through the amount of
patches AUTOSEL is generating this way, but right now we don't have a
better alternative.

^ permalink raw reply	[flat|nested] 113+ messages in thread

end of thread, other threads:[~2018-05-03 13:31 UTC | newest]

Thread overview: 113+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20180409001936.162706-1-alexander.levin@microsoft.com>
2018-04-09  0:19 ` [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes Sasha Levin
2018-04-09  8:22   ` Petr Mladek
2018-04-15 14:42     ` Sasha Levin
2018-04-16 13:30       ` Steven Rostedt
2018-04-16 15:18         ` Linus Torvalds
2018-04-16 15:30           ` Pavel Machek
2018-04-16 15:50             ` Sasha Levin
2018-04-16 16:06               ` Pavel Machek
2018-04-16 16:14                 ` Sasha Levin
2018-04-16 16:22                   ` Steven Rostedt
2018-04-16 16:31                     ` Sasha Levin
2018-04-16 16:47                       ` Steven Rostedt
2018-04-16 16:53                         ` Sasha Levin
2018-04-16 17:00                           ` Pavel Machek
2018-04-17 10:46                             ` Greg KH
2018-04-17 12:24                               ` Petr Mladek
2018-04-17 12:49                                 ` Michal Hocko
2018-04-17 13:39                                   ` Sasha Levin
2018-04-17 14:22                                     ` Michal Hocko
2018-04-17 14:36                                       ` Sasha Levin
2018-04-17 18:10                                         ` Michal Hocko
2018-04-17 13:45                                 ` Sasha Levin
2018-04-18  8:33                                   ` Petr Mladek
2018-04-16 16:28                   ` Pavel Machek
2018-04-16 16:39                     ` Sasha Levin
2018-04-16 16:42                       ` Pavel Machek
2018-04-16 16:45                         ` Sasha Levin
2018-04-16 16:54                           ` Pavel Machek
2018-04-17 10:50                             ` Greg KH
2018-04-16 17:05                   ` Pavel Machek
2018-04-16 17:16                     ` Sasha Levin
2018-04-16 17:44                       ` Steven Rostedt
2018-04-16 18:17                         ` Sasha Levin
2018-04-16 18:35                           ` Steven Rostedt
2018-04-16 20:17                       ` Jiri Kosina
2018-04-16 20:36                         ` Sasha Levin
2018-04-16 20:43                           ` Jiri Kosina
2018-04-16 21:18                             ` Sasha Levin
2018-04-16 21:28                               ` Jiri Kosina
2018-04-17 10:39                                 ` Greg KH
2018-04-17 11:07                                   ` Michal Hocko
2018-04-17 14:04                                     ` Sasha Levin
2018-04-17 14:15                                       ` Steven Rostedt
2018-04-17 14:36                                         ` Greg KH
2018-04-17 14:36                                       ` Michal Hocko
2018-04-17 14:55                                         ` Sasha Levin
2018-04-17 15:52                                           ` Jiri Kosina
2018-04-17 16:06                                             ` Sasha Levin
2018-05-03 10:04                                               ` Pavel Machek
2018-05-03 13:02                                                 ` Sasha Levin
2018-04-17 16:25                                             ` Mike Galbraith
2018-04-17 11:21                                   ` Jiri Kosina
2018-05-03  9:47                               ` Pavel Machek
2018-05-03 13:06                                 ` Sasha Levin
2018-04-16 16:20                 ` Steven Rostedt
2018-04-16 16:28                   ` Sasha Levin
2018-04-16 16:39                     ` Pavel Machek
2018-04-16 16:43                       ` Sasha Levin
2018-04-16 16:53                         ` Steven Rostedt
2018-04-16 16:58                           ` Pavel Machek
2018-04-16 17:09                           ` Sasha Levin
2018-04-16 17:33                             ` Steven Rostedt
2018-04-16 17:42                               ` Sasha Levin
2018-04-16 18:26                                 ` Steven Rostedt
2018-04-16 18:30                                   ` Linus Torvalds
2018-04-16 18:41                                     ` Steven Rostedt
2018-04-16 18:52                                       ` Linus Torvalds
2018-04-16 19:00                                         ` Linus Torvalds
2018-04-16 19:30                                           ` Steven Rostedt
2018-04-16 19:19                                         ` Linus Torvalds
2018-04-16 19:24                                         ` Steven Rostedt
2018-04-16 19:28                                           ` Linus Torvalds
2018-04-16 19:31                                             ` Linus Torvalds
2018-04-16 19:58                                               ` Steven Rostedt
2018-04-16 19:38                                             ` Steven Rostedt
2018-04-16 19:55                                               ` Linus Torvalds
2018-04-16 20:02                                                 ` Steven Rostedt
2018-04-16 20:17                                                   ` Linus Torvalds
2018-04-16 20:33                                                     ` Jiri Kosina
2018-04-16 21:27                                                     ` Steven Rostedt
2018-04-16 18:35                                   ` Sasha Levin
2018-04-16 18:57                                     ` Steven Rostedt
2018-04-16 15:36           ` Steven Rostedt
2018-04-16 16:02             ` Sasha Levin
2018-04-16 16:10               ` Pavel Machek
2018-04-16 16:12               ` Steven Rostedt
2018-04-16 16:19                 ` Sasha Levin
2018-04-16 16:30                   ` Steven Rostedt
2018-04-16 16:37                     ` Sasha Levin
2018-04-16 17:06                       ` Pavel Machek
2018-04-16 17:23                         ` Sasha Levin
2018-04-17 11:41                           ` Jan Kara
2018-04-17 13:31                             ` Sasha Levin
2018-04-17 15:55                               ` Jan Kara
2018-04-17 16:19                                 ` Sasha Levin
2018-04-17 17:57                                   ` Jan Kara
2018-04-17 18:28                                     ` Sasha Levin
2018-05-03  9:36                                   ` Pavel Machek
2018-05-03 13:28                                     ` Sasha Levin
2018-05-03  9:32                           ` Pavel Machek
2018-05-03 13:30                             ` Sasha Levin
2018-04-19 11:41                   ` Thomas Backlund
2018-04-19 13:59                     ` Greg KH
2018-04-19 14:05                       ` Jan Kara
2018-04-19 14:22                         ` Greg KH
2018-04-19 15:16                           ` Thomas Backlund
2018-04-19 15:57                             ` Greg KH
2018-04-19 16:25                               ` Thomas Backlund
2018-04-19 16:41                           ` Greg KH
2018-04-19 15:04                       ` Thomas Backlund
2018-04-19 15:09                         ` Sasha Levin
2018-04-19 16:20                           ` Thomas Backlund
2018-04-16 15:39           ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).