Re: Userspace regression in LTS and stable kernels

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Userspace regression in LTS and stable kernels
       [not found]     ` <20190214122027.c0df36282d65dc9979248117@linux-foundation.org>
@ 2019-02-15  7:00       ` Greg Kroah-Hartman
  2019-02-15  7:13         ` Greg Kroah-Hartman
  2019-02-15  9:10         ` Michal Hocko
  0 siblings, 2 replies; 11+ messages in thread
From: Greg Kroah-Hartman @ 2019-02-15  7:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: stable, Linus Torvalds, Richard Weinberger, Samuel Dionne-Riel,
	LKML, graham, Oleg Nesterov, Kees Cook, Michal Hocko

On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote:
> On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger
> > <richard.weinberger@gmail.com> wrote:
> > >
> > > Your shebang line exceeds BINPRM_BUF_SIZE.
> > > Before the said commit the kernel silently truncated the shebang line
> > > (and corrupted it),
> > > now it tells the user that the line is too long.
> > 
> > It doesn't matter if it "corrupted" things by truncating it. All that
> > matters is "it used to work, now it doesn't"
> > 
> > Yes, maybe it never *should* have worked. And yes, it's sad that
> > people apparently had cases that depended on this odd behavior, but
> > there we are.
> > 
> > I see that Kees has a patch to fix it up.
> > 
> 
> Greg, I think we have a problem here.
> 
> 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang
> string") wasn't marked for backporting.  And, presumably as a
> consequence, Kees's fix "exec: load_script: allow interpreter argument
> truncation" was not marked for backporting.
> 
> 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet
> it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x.

It came in 5.0-rc1, so it fits the "in a Linus released kernel"
requirement.  If we are to wait until it shows up in a -final, that
would be months too late for almost all of these types of patches that
are picked up.

> I don't know if Oleg considered backporting that patch.  I certainly
> did (I always do), and I decided against doing so.  Yet there it is.

This came in through Sasha's tools, which give people a week or so to
say "hey, this isn't a stable patch!" and it seems everyone ignored that
:(

Where is Kees's fix?  I'll be glad to queue it up, or just revert the
above commit, which ever people think is easiest.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userspace regression in LTS and stable kernels
  2019-02-15  7:00       ` Userspace regression in LTS and stable kernels Greg Kroah-Hartman
@ 2019-02-15  7:13         ` Greg Kroah-Hartman
  2019-02-15  9:10         ` Michal Hocko
  1 sibling, 0 replies; 11+ messages in thread
From: Greg Kroah-Hartman @ 2019-02-15  7:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: stable, Linus Torvalds, Richard Weinberger, Samuel Dionne-Riel,
	LKML, graham, Oleg Nesterov, Kees Cook, Michal Hocko

On Fri, Feb 15, 2019 at 08:00:22AM +0100, Greg Kroah-Hartman wrote:
> On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote:
> > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > 
> > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger
> > > <richard.weinberger@gmail.com> wrote:
> > > >
> > > > Your shebang line exceeds BINPRM_BUF_SIZE.
> > > > Before the said commit the kernel silently truncated the shebang line
> > > > (and corrupted it),
> > > > now it tells the user that the line is too long.
> > > 
> > > It doesn't matter if it "corrupted" things by truncating it. All that
> > > matters is "it used to work, now it doesn't"
> > > 
> > > Yes, maybe it never *should* have worked. And yes, it's sad that
> > > people apparently had cases that depended on this odd behavior, but
> > > there we are.
> > > 
> > > I see that Kees has a patch to fix it up.
> > > 
> > 
> > Greg, I think we have a problem here.
> > 
> > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang
> > string") wasn't marked for backporting.  And, presumably as a
> > consequence, Kees's fix "exec: load_script: allow interpreter argument
> > truncation" was not marked for backporting.
> > 
> > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet
> > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x.
> 
> It came in 5.0-rc1, so it fits the "in a Linus released kernel"
> requirement.  If we are to wait until it shows up in a -final, that
> would be months too late for almost all of these types of patches that
> are picked up.
> 
> > I don't know if Oleg considered backporting that patch.  I certainly
> > did (I always do), and I decided against doing so.  Yet there it is.
> 
> This came in through Sasha's tools, which give people a week or so to
> say "hey, this isn't a stable patch!" and it seems everyone ignored that
> :(
> 
> Where is Kees's fix?  I'll be glad to queue it up, or just revert the
> above commit, which ever people think is easiest.

Ah, I see the fix now, _after_ I just pushed out a bunch of stable
releases.  I'll go queue it up and push it out with just that fix in it
now...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userspace regression in LTS and stable kernels
  2019-02-15  7:00       ` Userspace regression in LTS and stable kernels Greg Kroah-Hartman
  2019-02-15  7:13         ` Greg Kroah-Hartman
@ 2019-02-15  9:10         ` Michal Hocko
  2019-02-15  9:20           ` Greg Kroah-Hartman
  1 sibling, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2019-02-15  9:10 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andrew Morton, stable, Linus Torvalds, Richard Weinberger,
	Samuel Dionne-Riel, LKML, graham, Oleg Nesterov, Kees Cook

On Fri 15-02-19 08:00:22, Greg KH wrote:
> On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote:
> > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > 
> > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger
> > > <richard.weinberger@gmail.com> wrote:
> > > >
> > > > Your shebang line exceeds BINPRM_BUF_SIZE.
> > > > Before the said commit the kernel silently truncated the shebang line
> > > > (and corrupted it),
> > > > now it tells the user that the line is too long.
> > > 
> > > It doesn't matter if it "corrupted" things by truncating it. All that
> > > matters is "it used to work, now it doesn't"
> > > 
> > > Yes, maybe it never *should* have worked. And yes, it's sad that
> > > people apparently had cases that depended on this odd behavior, but
> > > there we are.
> > > 
> > > I see that Kees has a patch to fix it up.
> > > 
> > 
> > Greg, I think we have a problem here.
> > 
> > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang
> > string") wasn't marked for backporting.  And, presumably as a
> > consequence, Kees's fix "exec: load_script: allow interpreter argument
> > truncation" was not marked for backporting.
> > 
> > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet
> > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x.
> 
> It came in 5.0-rc1, so it fits the "in a Linus released kernel"
> requirement.  If we are to wait until it shows up in a -final, that
> would be months too late for almost all of these types of patches that
> are picked up.

rc1 is just a too early. Waiting few more rcs or even a final release
for something that people do not see as an issue should be just fine.
Consider this particular patch and tell me why it had to be rushed in
the first place. The original code was broken for _years_ but I do not
remember anybody would be complaining.

> > I don't know if Oleg considered backporting that patch.  I certainly
> > did (I always do), and I decided against doing so.  Yet there it is.
> 
> This came in through Sasha's tools, which give people a week or so to
> say "hey, this isn't a stable patch!" and it seems everyone ignored that
> :(

I thought we were through this already. Automagic autoselection of
patches in the core kernel (or mmotm tree patches in particular) is too
dangerous. We try hard to consider each and every patch for stable. Even
if something slips through then it is much more preferred to ask for a
stable backport in the respective email thread and wait for a conclusion
before adding it.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userspace regression in LTS and stable kernels
  2019-02-15  9:10         ` Michal Hocko
@ 2019-02-15  9:20           ` Greg Kroah-Hartman
  2019-02-15  9:42             ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Greg Kroah-Hartman @ 2019-02-15  9:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, stable, Linus Torvalds, Richard Weinberger,
	Samuel Dionne-Riel, LKML, graham, Oleg Nesterov, Kees Cook

On Fri, Feb 15, 2019 at 10:10:00AM +0100, Michal Hocko wrote:
> On Fri 15-02-19 08:00:22, Greg KH wrote:
> > On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote:
> > > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > 
> > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger
> > > > <richard.weinberger@gmail.com> wrote:
> > > > >
> > > > > Your shebang line exceeds BINPRM_BUF_SIZE.
> > > > > Before the said commit the kernel silently truncated the shebang line
> > > > > (and corrupted it),
> > > > > now it tells the user that the line is too long.
> > > > 
> > > > It doesn't matter if it "corrupted" things by truncating it. All that
> > > > matters is "it used to work, now it doesn't"
> > > > 
> > > > Yes, maybe it never *should* have worked. And yes, it's sad that
> > > > people apparently had cases that depended on this odd behavior, but
> > > > there we are.
> > > > 
> > > > I see that Kees has a patch to fix it up.
> > > > 
> > > 
> > > Greg, I think we have a problem here.
> > > 
> > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang
> > > string") wasn't marked for backporting.  And, presumably as a
> > > consequence, Kees's fix "exec: load_script: allow interpreter argument
> > > truncation" was not marked for backporting.
> > > 
> > > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet
> > > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x.
> > 
> > It came in 5.0-rc1, so it fits the "in a Linus released kernel"
> > requirement.  If we are to wait until it shows up in a -final, that
> > would be months too late for almost all of these types of patches that
> > are picked up.
> 
> rc1 is just a too early. Waiting few more rcs or even a final release
> for something that people do not see as an issue should be just fine.
> Consider this particular patch and tell me why it had to be rushed in
> the first place. The original code was broken for _years_ but I do not
> remember anybody would be complaining.

This patch was in 4.20.10, which was released on Feb 12 while 5.0-rc1
came out on Jan 6.  Over a month delay.

> > > I don't know if Oleg considered backporting that patch.  I certainly
> > > did (I always do), and I decided against doing so.  Yet there it is.
> > 
> > This came in through Sasha's tools, which give people a week or so to
> > say "hey, this isn't a stable patch!" and it seems everyone ignored that
> > :(
> 
> I thought we were through this already. Automagic autoselection of
> patches in the core kernel (or mmotm tree patches in particular) is too
> dangerous. We try hard to consider each and every patch for stable. Even
> if something slips through then it is much more preferred to ask for a
> stable backport in the respective email thread and wait for a conclusion
> before adding it.

We have a list of blacklisted files/subsystems for people that do not
want this to happen to their area of the kernel.  The patch seemed to
make sense, and it passed all known tests that we currently have.

Sometimes things will slip through like this, it happens.  And really, a
3 day turn-around-time to resolve this is pretty good, don't you think?

It also seems like we need another test to catch this problem from ever
happening again :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userspace regression in LTS and stable kernels
  2019-02-15  9:20           ` Greg Kroah-Hartman
@ 2019-02-15  9:42             ` Michal Hocko
  2019-02-15 15:19               ` Sasha Levin
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2019-02-15  9:42 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andrew Morton, stable, Linus Torvalds, Richard Weinberger,
	Samuel Dionne-Riel, LKML, graham, Oleg Nesterov, Kees Cook

On Fri 15-02-19 10:20:13, Greg KH wrote:
> On Fri, Feb 15, 2019 at 10:10:00AM +0100, Michal Hocko wrote:
> > On Fri 15-02-19 08:00:22, Greg KH wrote:
> > > On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote:
> > > > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > > 
> > > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger
> > > > > <richard.weinberger@gmail.com> wrote:
> > > > > >
> > > > > > Your shebang line exceeds BINPRM_BUF_SIZE.
> > > > > > Before the said commit the kernel silently truncated the shebang line
> > > > > > (and corrupted it),
> > > > > > now it tells the user that the line is too long.
> > > > > 
> > > > > It doesn't matter if it "corrupted" things by truncating it. All that
> > > > > matters is "it used to work, now it doesn't"
> > > > > 
> > > > > Yes, maybe it never *should* have worked. And yes, it's sad that
> > > > > people apparently had cases that depended on this odd behavior, but
> > > > > there we are.
> > > > > 
> > > > > I see that Kees has a patch to fix it up.
> > > > > 
> > > > 
> > > > Greg, I think we have a problem here.
> > > > 
> > > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang
> > > > string") wasn't marked for backporting.  And, presumably as a
> > > > consequence, Kees's fix "exec: load_script: allow interpreter argument
> > > > truncation" was not marked for backporting.
> > > > 
> > > > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet
> > > > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x.
> > > 
> > > It came in 5.0-rc1, so it fits the "in a Linus released kernel"
> > > requirement.  If we are to wait until it shows up in a -final, that
> > > would be months too late for almost all of these types of patches that
> > > are picked up.
> > 
> > rc1 is just a too early. Waiting few more rcs or even a final release
> > for something that people do not see as an issue should be just fine.
> > Consider this particular patch and tell me why it had to be rushed in
> > the first place. The original code was broken for _years_ but I do not
> > remember anybody would be complaining.
> 
> This patch was in 4.20.10, which was released on Feb 12 while 5.0-rc1
> came out on Jan 6.  Over a month delay.

Obviously not long enough.

> > > > I don't know if Oleg considered backporting that patch.  I certainly
> > > > did (I always do), and I decided against doing so.  Yet there it is.
> > > 
> > > This came in through Sasha's tools, which give people a week or so to
> > > say "hey, this isn't a stable patch!" and it seems everyone ignored that
> > > :(
> > 
> > I thought we were through this already. Automagic autoselection of
> > patches in the core kernel (or mmotm tree patches in particular) is too
> > dangerous. We try hard to consider each and every patch for stable. Even
> > if something slips through then it is much more preferred to ask for a
> > stable backport in the respective email thread and wait for a conclusion
> > before adding it.
> 
> We have a list of blacklisted files/subsystems for people that do not
> want this to happen to their area of the kernel.  The patch seemed to
> make sense, and it passed all known tests that we currently have.

Yes, the patch makes sense (I wouldn't give my acked-by otherwise). But
this is one of the area where things that make sense might still break
because it is hard to assume what userspace depends on.

> Sometimes things will slip through like this, it happens.  And really, a
> 3 day turn-around-time to resolve this is pretty good, don't you think?

Yes, but that doesn't make any difference on the fact that this was not
marked for stable and I still think this is not a stable material - at
least not at this moment.

> It also seems like we need another test to catch this problem from ever
> happening again :)

Agreed on this.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userspace regression in LTS and stable kernels
  2019-02-15  9:42             ` Michal Hocko
@ 2019-02-15 15:19               ` Sasha Levin
  2019-02-15 15:52                 ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Sasha Levin @ 2019-02-15 15:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Greg Kroah-Hartman, Andrew Morton, stable, Linus Torvalds,
	Richard Weinberger, Samuel Dionne-Riel, LKML, graham,
	Oleg Nesterov, Kees Cook

On Fri, Feb 15, 2019 at 10:42:05AM +0100, Michal Hocko wrote:
>On Fri 15-02-19 10:20:13, Greg KH wrote:
>> On Fri, Feb 15, 2019 at 10:10:00AM +0100, Michal Hocko wrote:
>> > On Fri 15-02-19 08:00:22, Greg KH wrote:
>> > > On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote:
>> > > > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> > > >
>> > > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger
>> > > > > <richard.weinberger@gmail.com> wrote:
>> > > > > >
>> > > > > > Your shebang line exceeds BINPRM_BUF_SIZE.
>> > > > > > Before the said commit the kernel silently truncated the shebang line
>> > > > > > (and corrupted it),
>> > > > > > now it tells the user that the line is too long.
>> > > > >
>> > > > > It doesn't matter if it "corrupted" things by truncating it. All that
>> > > > > matters is "it used to work, now it doesn't"
>> > > > >
>> > > > > Yes, maybe it never *should* have worked. And yes, it's sad that
>> > > > > people apparently had cases that depended on this odd behavior, but
>> > > > > there we are.
>> > > > >
>> > > > > I see that Kees has a patch to fix it up.
>> > > > >
>> > > >
>> > > > Greg, I think we have a problem here.
>> > > >
>> > > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang
>> > > > string") wasn't marked for backporting.  And, presumably as a
>> > > > consequence, Kees's fix "exec: load_script: allow interpreter argument
>> > > > truncation" was not marked for backporting.
>> > > >
>> > > > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet
>> > > > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x.
>> > >
>> > > It came in 5.0-rc1, so it fits the "in a Linus released kernel"
>> > > requirement.  If we are to wait until it shows up in a -final, that
>> > > would be months too late for almost all of these types of patches that
>> > > are picked up.
>> >
>> > rc1 is just a too early. Waiting few more rcs or even a final release
>> > for something that people do not see as an issue should be just fine.
>> > Consider this particular patch and tell me why it had to be rushed in
>> > the first place. The original code was broken for _years_ but I do not
>> > remember anybody would be complaining.
>>
>> This patch was in 4.20.10, which was released on Feb 12 while 5.0-rc1
>> came out on Jan 6.  Over a month delay.
>
>Obviously not long enough.

You're assuming that if we wouldn't have taken this patch to stable
somehow someone else would notice this bug and fix it.

What test do we have that would catch this? Which testsuite tests for
long shebang lines? Where is the test added together with this patch
that covers this and similar cases?

The fact is that many patches are not tested until they get to stable,
whether we add them the same week they went upstream or months later.
This is a great case for this: I doubt anyone but NixOS does this crazy
thing with shebang lines, so who else would discover the bug?

If this is indeed a case of us jumping the gun and shipping stuff too
early before all tests are complete, please point me to the test that we
missed and I'll make sure that for any future kernel release it gets run
before we ship a stable kernel.

>> > > > I don't know if Oleg considered backporting that patch.  I certainly
>> > > > did (I always do), and I decided against doing so.  Yet there it is.
>> > >
>> > > This came in through Sasha's tools, which give people a week or so to
>> > > say "hey, this isn't a stable patch!" and it seems everyone ignored that
>> > > :(
>> >
>> > I thought we were through this already. Automagic autoselection of
>> > patches in the core kernel (or mmotm tree patches in particular) is too
>> > dangerous. We try hard to consider each and every patch for stable. Even
>> > if something slips through then it is much more preferred to ask for a
>> > stable backport in the respective email thread and wait for a conclusion
>> > before adding it.
>>
>> We have a list of blacklisted files/subsystems for people that do not
>> want this to happen to their area of the kernel.  The patch seemed to
>> make sense, and it passed all known tests that we currently have.
>
>Yes, the patch makes sense (I wouldn't give my acked-by otherwise). But
>this is one of the area where things that make sense might still break
>because it is hard to assume what userspace depends on.

Great, so the solution is to just not take these things into stable at
all? The solution should be to add tests to the patches that go in there
to verify their correctness and that they don't regress in the future.

If you're really concerned about subsystems being brittle the solution
is to improve their testing rather push stuff in and hope nothing
explodes.

On one hand you Ack it saying it looks great to you and should be
merged, but on the other hand you're saying that you don't really trust
the patch?

Really, if I wouldn't pick this patch now what do you think would have
happened? It would just pop up in a few months as we roll our stable
kernel forward.

>> Sometimes things will slip through like this, it happens.  And really, a
>> 3 day turn-around-time to resolve this is pretty good, don't you think?
>
>Yes, but that doesn't make any difference on the fact that this was not
>marked for stable and I still think this is not a stable material - at
>least not at this moment.

Hindsight is 20/20 :)

If people were good at understanding the impact and implications their
patch has on the kernel we would never introduce new bugs!

I'll happily list a bunch more patches where folks didn't think they're
stable material, but turned out to be important fixes.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userspace regression in LTS and stable kernels
  2019-02-15 15:19               ` Sasha Levin
@ 2019-02-15 15:52                 ` Michal Hocko
  2019-02-15 16:18                   ` Samuel Dionne-Riel
  2019-02-15 18:00                   ` Sasha Levin
  0 siblings, 2 replies; 11+ messages in thread
From: Michal Hocko @ 2019-02-15 15:52 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Greg Kroah-Hartman, Andrew Morton, stable, Linus Torvalds,
	Richard Weinberger, Samuel Dionne-Riel, LKML, graham,
	Oleg Nesterov, Kees Cook

On Fri 15-02-19 10:19:12, Sasha Levin wrote:
> On Fri, Feb 15, 2019 at 10:42:05AM +0100, Michal Hocko wrote:
> > On Fri 15-02-19 10:20:13, Greg KH wrote:
> > > On Fri, Feb 15, 2019 at 10:10:00AM +0100, Michal Hocko wrote:
> > > > On Fri 15-02-19 08:00:22, Greg KH wrote:
> > > > > On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote:
> > > > > > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > > > >
> > > > > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger
> > > > > > > <richard.weinberger@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Your shebang line exceeds BINPRM_BUF_SIZE.
> > > > > > > > Before the said commit the kernel silently truncated the shebang line
> > > > > > > > (and corrupted it),
> > > > > > > > now it tells the user that the line is too long.
> > > > > > >
> > > > > > > It doesn't matter if it "corrupted" things by truncating it. All that
> > > > > > > matters is "it used to work, now it doesn't"
> > > > > > >
> > > > > > > Yes, maybe it never *should* have worked. And yes, it's sad that
> > > > > > > people apparently had cases that depended on this odd behavior, but
> > > > > > > there we are.
> > > > > > >
> > > > > > > I see that Kees has a patch to fix it up.
> > > > > > >
> > > > > >
> > > > > > Greg, I think we have a problem here.
> > > > > >
> > > > > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang
> > > > > > string") wasn't marked for backporting.  And, presumably as a
> > > > > > consequence, Kees's fix "exec: load_script: allow interpreter argument
> > > > > > truncation" was not marked for backporting.
> > > > > >
> > > > > > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet
> > > > > > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x.
> > > > >
> > > > > It came in 5.0-rc1, so it fits the "in a Linus released kernel"
> > > > > requirement.  If we are to wait until it shows up in a -final, that
> > > > > would be months too late for almost all of these types of patches that
> > > > > are picked up.
> > > >
> > > > rc1 is just a too early. Waiting few more rcs or even a final release
> > > > for something that people do not see as an issue should be just fine.
> > > > Consider this particular patch and tell me why it had to be rushed in
> > > > the first place. The original code was broken for _years_ but I do not
> > > > remember anybody would be complaining.
> > > 
> > > This patch was in 4.20.10, which was released on Feb 12 while 5.0-rc1
> > > came out on Jan 6.  Over a month delay.
> > 
> > Obviously not long enough.
> 
> You're assuming that if we wouldn't have taken this patch to stable
> somehow someone else would notice this bug and fix it.
> 
> What test do we have that would catch this? Which testsuite tests for
> long shebang lines? Where is the test added together with this patch
> that covers this and similar cases?

The test is the "users out there". Right now we do not have any
specialized test case because we haven't even realized it might break
something. The main difference between breaking on the bleeding edge vs.
stable tree is that people running on bleeding edge are more likely to
expect a breakage while stable users would most likely prefer to not be
guinea pigs and have, well stable trees.
[...]

> > > We have a list of blacklisted files/subsystems for people that do not
> > > want this to happen to their area of the kernel.  The patch seemed to
> > > make sense, and it passed all known tests that we currently have.
> > 
> > Yes, the patch makes sense (I wouldn't give my acked-by otherwise). But
> > this is one of the area where things that make sense might still break
> > because it is hard to assume what userspace depends on.
> 
> Great, so the solution is to just not take these things into stable at
> all?

No, but if the patch author and the maintainer have considered the
stable tree and haven't found convincing arguments to mark for stable
then it is likely that the patch doesn't need an urgent backporting.

> The solution should be to add tests to the patches that go in there
> to verify their correctness and that they don't regress in the future.
> 
> If you're really concerned about subsystems being brittle the solution
> is to improve their testing rather push stuff in and hope nothing
> explodes.
> 
> On one hand you Ack it saying it looks great to you and should be
> merged, but on the other hand you're saying that you don't really trust
> the patch?

No. But I didn't consider it a stable material. You just do not really
need all the patches in the stable, right? I have already said that this
code is there for ages and fixing it is good to have for future but
considering that nobody was really complaining then a backporting just
adds a risk and as it turned out that risk was really not zero.

> Really, if I wouldn't pick this patch now what do you think would have
> happened? It would just pop up in a few months as we roll our stable
> kernel forward.

and that would be a different kernel version and people kinda expect
bugs with newer versions. This is not the case with the stable update.

But I guess we are just repeating the same discussion over and over. Our
expectations about what the stable kernel should be differs a lot. I
would like to see fewer but only important fixes while you would like to
take as many fixes as possible. 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userspace regression in LTS and stable kernels
  2019-02-15 15:52                 ` Michal Hocko
@ 2019-02-15 16:18                   ` Samuel Dionne-Riel
  2019-02-15 18:02                     ` Sasha Levin
  2019-02-15 18:00                   ` Sasha Levin
  1 sibling, 1 reply; 11+ messages in thread
From: Samuel Dionne-Riel @ 2019-02-15 16:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Sasha Levin, Greg Kroah-Hartman, Andrew Morton, stable,
	Linus Torvalds, Richard Weinberger, LKML, graham, Oleg Nesterov,
	Kees Cook

On 15/02/2019, Michal Hocko <mhocko@kernel.org> wrote:
> On Fri 15-02-19 10:19:12, Sasha Levin wrote:
>> On Fri, Feb 15, 2019 at 10:42:05AM +0100, Michal Hocko wrote:
>> > On Fri 15-02-19 10:20:13, Greg KH wrote:
>> > > On Fri, Feb 15, 2019 at 10:10:00AM +0100, Michal Hocko wrote:
>> > > > On Fri 15-02-19 08:00:22, Greg KH wrote:
>> > > > > On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote:
>> > > > > > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds
>> > > > > > <torvalds@linux-foundation.org> wrote:
>> > > > > >
>> > > > > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger
>> > > > > > > <richard.weinberger@gmail.com> wrote:
>> > > > > > > >
>> > > > > > > > Your shebang line exceeds BINPRM_BUF_SIZE.
>> > > > > > > > Before the said commit the kernel silently truncated the
>> > > > > > > > shebang line
>> > > > > > > > (and corrupted it),
>> > > > > > > > now it tells the user that the line is too long.
>> > > > > > >
>> > > > > > > It doesn't matter if it "corrupted" things by truncating it.
>> > > > > > > All that
>> > > > > > > matters is "it used to work, now it doesn't"
>> > > > > > >
>> > > > > > > Yes, maybe it never *should* have worked. And yes, it's sad
>> > > > > > > that
>> > > > > > > people apparently had cases that depended on this odd
>> > > > > > > behavior, but
>> > > > > > > there we are.
>> > > > > > >
>> > > > > > > I see that Kees has a patch to fix it up.
>> > > > > > >
>> > > > > >
>> > > > > > Greg, I think we have a problem here.
>> > > > > >
>> > > > > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate
>> > > > > > shebang
>> > > > > > string") wasn't marked for backporting.  And, presumably as a
>> > > > > > consequence, Kees's fix "exec: load_script: allow interpreter
>> > > > > > argument
>> > > > > > truncation" was not marked for backporting.
>> > > > > >
>> > > > > > 8099b047ecc431518 hasn't even appeared in a Linus released
>> > > > > > kernel, yet
>> > > > > > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x.
>> > > > >
>> > > > > It came in 5.0-rc1, so it fits the "in a Linus released kernel"
>> > > > > requirement.  If we are to wait until it shows up in a -final,
>> > > > > that
>> > > > > would be months too late for almost all of these types of patches
>> > > > > that
>> > > > > are picked up.
>> > > >
>> > > > rc1 is just a too early. Waiting few more rcs or even a final
>> > > > release
>> > > > for something that people do not see as an issue should be just
>> > > > fine.
>> > > > Consider this particular patch and tell me why it had to be rushed
>> > > > in
>> > > > the first place. The original code was broken for _years_ but I do
>> > > > not
>> > > > remember anybody would be complaining.
>> > >
>> > > This patch was in 4.20.10, which was released on Feb 12 while 5.0-rc1
>> > > came out on Jan 6.  Over a month delay.
>> >
>> > Obviously not long enough.
>>
>> You're assuming that if we wouldn't have taken this patch to stable
>> somehow someone else would notice this bug and fix it.
>>
>> What test do we have that would catch this? Which testsuite tests for
>> long shebang lines? Where is the test added together with this patch
>> that covers this and similar cases?
>
> The test is the "users out there". Right now we do not have any
> specialized test case because we haven't even realized it might break
> something. The main difference between breaking on the bleeding edge vs.
> stable tree is that people running on bleeding edge are more likely to
> expect a breakage while stable users would most likely prefer to not be
> guinea pigs and have, well stable trees.
> [...]
>
>> > > We have a list of blacklisted files/subsystems for people that do not
>> > > want this to happen to their area of the kernel.  The patch seemed to
>> > > make sense, and it passed all known tests that we currently have.
>> >
>> > Yes, the patch makes sense (I wouldn't give my acked-by otherwise). But
>> > this is one of the area where things that make sense might still break
>> > because it is hard to assume what userspace depends on.
>>
>> Great, so the solution is to just not take these things into stable at
>> all?
>
> No, but if the patch author and the maintainer have considered the
> stable tree and haven't found convincing arguments to mark for stable
> then it is likely that the patch doesn't need an urgent backporting.
>
>> The solution should be to add tests to the patches that go in there
>> to verify their correctness and that they don't regress in the future.
>>
>> If you're really concerned about subsystems being brittle the solution
>> is to improve their testing rather push stuff in and hope nothing
>> explodes.
>>
>> On one hand you Ack it saying it looks great to you and should be
>> merged, but on the other hand you're saying that you don't really trust
>> the patch?
>
> No. But I didn't consider it a stable material. You just do not really
> need all the patches in the stable, right? I have already said that this
> code is there for ages and fixing it is good to have for future but
> considering that nobody was really complaining then a backporting just
> adds a risk and as it turned out that risk was really not zero.
>

I'm sorry to interject here, but the issue was reported on the
Kernel.org Bugzilla on February 2nd

 - https://bugzilla.kernel.org/show_bug.cgi?id=202497

In the interest of better communication, if the need arises again, how
should bugs in the RC kernels be reported so they (1) are spotted by
the right maintainers and (2) not backported even though they were
reported as causing breaking changes?

>> Really, if I wouldn't pick this patch now what do you think would have
>> happened? It would just pop up in a few months as we roll our stable
>> kernel forward.
>
> and that would be a different kernel version and people kinda expect
> bugs with newer versions. This is not the case with the stable update.
>
> But I guess we are just repeating the same discussion over and over. Our
> expectations about what the stable kernel should be differs a lot. I
> would like to see fewer but only important fixes while you would like to
> take as many fixes as possible.
> --
> Michal Hocko
> SUSE Labs
>

-- 
— Samuel Dionne-Riel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userspace regression in LTS and stable kernels
  2019-02-15 15:52                 ` Michal Hocko
  2019-02-15 16:18                   ` Samuel Dionne-Riel
@ 2019-02-15 18:00                   ` Sasha Levin
  2019-02-18 12:56                     ` Michal Hocko
  1 sibling, 1 reply; 11+ messages in thread
From: Sasha Levin @ 2019-02-15 18:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Greg Kroah-Hartman, Andrew Morton, stable, Linus Torvalds,
	Richard Weinberger, Samuel Dionne-Riel, LKML, graham,
	Oleg Nesterov, Kees Cook

On Fri, Feb 15, 2019 at 04:52:00PM +0100, Michal Hocko wrote:
>On Fri 15-02-19 10:19:12, Sasha Levin wrote:
>> On Fri, Feb 15, 2019 at 10:42:05AM +0100, Michal Hocko wrote:
>> > On Fri 15-02-19 10:20:13, Greg KH wrote:
>> > > On Fri, Feb 15, 2019 at 10:10:00AM +0100, Michal Hocko wrote:
>> > > > On Fri 15-02-19 08:00:22, Greg KH wrote:
>> > > > > On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote:
>> > > > > > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> > > > > >
>> > > > > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger
>> > > > > > > <richard.weinberger@gmail.com> wrote:
>> > > > > > > >
>> > > > > > > > Your shebang line exceeds BINPRM_BUF_SIZE.
>> > > > > > > > Before the said commit the kernel silently truncated the shebang line
>> > > > > > > > (and corrupted it),
>> > > > > > > > now it tells the user that the line is too long.
>> > > > > > >
>> > > > > > > It doesn't matter if it "corrupted" things by truncating it. All that
>> > > > > > > matters is "it used to work, now it doesn't"
>> > > > > > >
>> > > > > > > Yes, maybe it never *should* have worked. And yes, it's sad that
>> > > > > > > people apparently had cases that depended on this odd behavior, but
>> > > > > > > there we are.
>> > > > > > >
>> > > > > > > I see that Kees has a patch to fix it up.
>> > > > > > >
>> > > > > >
>> > > > > > Greg, I think we have a problem here.
>> > > > > >
>> > > > > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang
>> > > > > > string") wasn't marked for backporting.  And, presumably as a
>> > > > > > consequence, Kees's fix "exec: load_script: allow interpreter argument
>> > > > > > truncation" was not marked for backporting.
>> > > > > >
>> > > > > > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet
>> > > > > > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x.
>> > > > >
>> > > > > It came in 5.0-rc1, so it fits the "in a Linus released kernel"
>> > > > > requirement.  If we are to wait until it shows up in a -final, that
>> > > > > would be months too late for almost all of these types of patches that
>> > > > > are picked up.
>> > > >
>> > > > rc1 is just a too early. Waiting few more rcs or even a final release
>> > > > for something that people do not see as an issue should be just fine.
>> > > > Consider this particular patch and tell me why it had to be rushed in
>> > > > the first place. The original code was broken for _years_ but I do not
>> > > > remember anybody would be complaining.
>> > >
>> > > This patch was in 4.20.10, which was released on Feb 12 while 5.0-rc1
>> > > came out on Jan 6.  Over a month delay.
>> >
>> > Obviously not long enough.
>>
>> You're assuming that if we wouldn't have taken this patch to stable
>> somehow someone else would notice this bug and fix it.
>>
>> What test do we have that would catch this? Which testsuite tests for
>> long shebang lines? Where is the test added together with this patch
>> that covers this and similar cases?
>
>The test is the "users out there". Right now we do not have any
>specialized test case because we haven't even realized it might break
>something. The main difference between breaking on the bleeding edge vs.
>stable tree is that people running on bleeding edge are more likely to
>expect a breakage while stable users would most likely prefer to not be
>guinea pigs and have, well stable trees.
>[...]

Exactly, and my argument here is that no one really tests Linus's tree.
Sure, folks run -rc kernels and report bugs, but no one actually runs
these kernels at larger scales.

Most "users out there" wouldn't see this patch until it ends up in a
stable kernel.

>> > > We have a list of blacklisted files/subsystems for people that do not
>> > > want this to happen to their area of the kernel.  The patch seemed to
>> > > make sense, and it passed all known tests that we currently have.
>> >
>> > Yes, the patch makes sense (I wouldn't give my acked-by otherwise). But
>> > this is one of the area where things that make sense might still break
>> > because it is hard to assume what userspace depends on.
>>
>> Great, so the solution is to just not take these things into stable at
>> all?
>
>No, but if the patch author and the maintainer have considered the
>stable tree and haven't found convincing arguments to mark for stable
>then it is likely that the patch doesn't need an urgent backporting.

Are you suggesting that waiting longer would somehow made this "safer"?
This goes back to my argument above.

>> The solution should be to add tests to the patches that go in there
>> to verify their correctness and that they don't regress in the future.
>>
>> If you're really concerned about subsystems being brittle the solution
>> is to improve their testing rather push stuff in and hope nothing
>> explodes.
>>
>> On one hand you Ack it saying it looks great to you and should be
>> merged, but on the other hand you're saying that you don't really trust
>> the patch?
>
>No. But I didn't consider it a stable material. You just do not really
>need all the patches in the stable, right? I have already said that this
>code is there for ages and fixing it is good to have for future but
>considering that nobody was really complaining then a backporting just
>adds a risk and as it turned out that risk was really not zero.
>
>> Really, if I wouldn't pick this patch now what do you think would have
>> happened? It would just pop up in a few months as we roll our stable
>> kernel forward.
>
>and that would be a different kernel version and people kinda expect
>bugs with newer versions. This is not the case with the stable update.
>
>But I guess we are just repeating the same discussion over and over. Our
>expectations about what the stable kernel should be differs a lot. I
>would like to see fewer but only important fixes while you would like to
>take as many fixes as possible.

Maybe to clarify here: I don't want to blindly take as much patches as I
can. I want to take patches based on testing results: if something looks
like a fix and it passes all our tests, there shouldn't be a reason not
to take it.

My view is that humans are terrible at writing and understanding code:
if folks fully understood the impact of their patches we would never
have bugs, right? Assuming we both agree here that we make mistakes and
introduce bugs, why do you think that these very same people fully
understand whether a patch should go in stable or not?

The approach of manually deciding if a patch needs to go in stable is
wrong and it doesn't scale. We need to beef up our testing story and
make these decisions based off of that, and not our error-prone brains
that introduced these bugs to begin with.

Look at the outcome of this very issue: people sprung into action and
fixed this bug quickly, but how many tests were added as a result of
this? How do we know it's not going to regress again?

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userspace regression in LTS and stable kernels
  2019-02-15 16:18                   ` Samuel Dionne-Riel
@ 2019-02-15 18:02                     ` Sasha Levin
  0 siblings, 0 replies; 11+ messages in thread
From: Sasha Levin @ 2019-02-15 18:02 UTC (permalink / raw)
  To: Samuel Dionne-Riel
  Cc: Michal Hocko, Greg Kroah-Hartman, Andrew Morton, stable,
	Linus Torvalds, Richard Weinberger, LKML, graham, Oleg Nesterov,
	Kees Cook

On Fri, Feb 15, 2019 at 11:18:30AM -0500, Samuel Dionne-Riel wrote:
>I'm sorry to interject here, but the issue was reported on the
>Kernel.org Bugzilla on February 2nd
>
> - https://bugzilla.kernel.org/show_bug.cgi?id=202497
>
>In the interest of better communication, if the need arises again, how
>should bugs in the RC kernels be reported so they (1) are spotted by
>the right maintainers and (2) not backported even though they were
>reported as causing breaking changes?

Sadly our bugzilla is rarely used, even though the information in that
particular bug report is perfect.

Maybe pinging LKML and stable@vger.kernel.org would be enough, specially
if you know that it's a stable commit that caused the regression.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Userspace regression in LTS and stable kernels
  2019-02-15 18:00                   ` Sasha Levin
@ 2019-02-18 12:56                     ` Michal Hocko
  0 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2019-02-18 12:56 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Greg Kroah-Hartman, Andrew Morton, stable, Linus Torvalds,
	Richard Weinberger, Samuel Dionne-Riel, LKML, graham,
	Oleg Nesterov, Kees Cook

On Fri 15-02-19 13:00:26, Sasha Levin wrote:
> On Fri, Feb 15, 2019 at 04:52:00PM +0100, Michal Hocko wrote:
> > On Fri 15-02-19 10:19:12, Sasha Levin wrote:
> > > On Fri, Feb 15, 2019 at 10:42:05AM +0100, Michal Hocko wrote:
> > > > On Fri 15-02-19 10:20:13, Greg KH wrote:
> > > > > On Fri, Feb 15, 2019 at 10:10:00AM +0100, Michal Hocko wrote:
> > > > > > On Fri 15-02-19 08:00:22, Greg KH wrote:
> > > > > > > On Thu, Feb 14, 2019 at 12:20:27PM -0800, Andrew Morton wrote:
> > > > > > > > On Thu, 14 Feb 2019 09:56:46 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > > > > > >
> > > > > > > > > On Wed, Feb 13, 2019 at 3:37 PM Richard Weinberger
> > > > > > > > > <richard.weinberger@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Your shebang line exceeds BINPRM_BUF_SIZE.
> > > > > > > > > > Before the said commit the kernel silently truncated the shebang line
> > > > > > > > > > (and corrupted it),
> > > > > > > > > > now it tells the user that the line is too long.
> > > > > > > > >
> > > > > > > > > It doesn't matter if it "corrupted" things by truncating it. All that
> > > > > > > > > matters is "it used to work, now it doesn't"
> > > > > > > > >
> > > > > > > > > Yes, maybe it never *should* have worked. And yes, it's sad that
> > > > > > > > > people apparently had cases that depended on this odd behavior, but
> > > > > > > > > there we are.
> > > > > > > > >
> > > > > > > > > I see that Kees has a patch to fix it up.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Greg, I think we have a problem here.
> > > > > > > >
> > > > > > > > 8099b047ecc431518 ("exec: load_script: don't blindly truncate shebang
> > > > > > > > string") wasn't marked for backporting.  And, presumably as a
> > > > > > > > consequence, Kees's fix "exec: load_script: allow interpreter argument
> > > > > > > > truncation" was not marked for backporting.
> > > > > > > >
> > > > > > > > 8099b047ecc431518 hasn't even appeared in a Linus released kernel, yet
> > > > > > > > it is now present in 4.9.x, 4.14.x, 4.19.x and 4.20.x.
> > > > > > >
> > > > > > > It came in 5.0-rc1, so it fits the "in a Linus released kernel"
> > > > > > > requirement.  If we are to wait until it shows up in a -final, that
> > > > > > > would be months too late for almost all of these types of patches that
> > > > > > > are picked up.
> > > > > >
> > > > > > rc1 is just a too early. Waiting few more rcs or even a final release
> > > > > > for something that people do not see as an issue should be just fine.
> > > > > > Consider this particular patch and tell me why it had to be rushed in
> > > > > > the first place. The original code was broken for _years_ but I do not
> > > > > > remember anybody would be complaining.
> > > > >
> > > > > This patch was in 4.20.10, which was released on Feb 12 while 5.0-rc1
> > > > > came out on Jan 6.  Over a month delay.
> > > >
> > > > Obviously not long enough.
> > > 
> > > You're assuming that if we wouldn't have taken this patch to stable
> > > somehow someone else would notice this bug and fix it.
> > > 
> > > What test do we have that would catch this? Which testsuite tests for
> > > long shebang lines? Where is the test added together with this patch
> > > that covers this and similar cases?
> > 
> > The test is the "users out there". Right now we do not have any
> > specialized test case because we haven't even realized it might break
> > something. The main difference between breaking on the bleeding edge vs.
> > stable tree is that people running on bleeding edge are more likely to
> > expect a breakage while stable users would most likely prefer to not be
> > guinea pigs and have, well stable trees.
> > [...]
> 
> Exactly, and my argument here is that no one really tests Linus's tree.

I would beg to disagree. The testing coverage is smaller of course
because most people are running on a distribution/stable kernels.

> Sure, folks run -rc kernels and report bugs, but no one actually runs
> these kernels at larger scales.

And this just screams that a (much) more time has to pass before fixes
which are nice-to-have are passed to the stable tree - assuming they are
not fixing something that users of the said stable tree are seeing the
issue of course.

> Most "users out there" wouldn't see this patch until it ends up in a
> stable kernel.

...and this would be on a kernel version upgrade when some breakage is
expected and tolerated more than on minor version stable update.

[...]
> > But I guess we are just repeating the same discussion over and over. Our
> > expectations about what the stable kernel should be differs a lot. I
> > would like to see fewer but only important fixes while you would like to
> > take as many fixes as possible.
> 
> Maybe to clarify here: I don't want to blindly take as much patches as I
> can. I want to take patches based on testing results: if something looks
> like a fix and it passes all our tests, there shouldn't be a reason not
> to take it.

There are many things we do not have any tests for. E.g. I wasn't even
aware that Perl (and others) are dealing with an excessive shebang input
by re-reading the input. There are always going to be corner cases like
that. The underlying thing is that nobody seem to be complaining about
the original issue addressed by Oleg. So why the heck should we push it
to the stable tree and _risk_ a regression.

> My view is that humans are terrible at writing and understanding code:
> if folks fully understood the impact of their patches we would never
> have bugs, right? Assuming we both agree here that we make mistakes and
> introduce bugs, why do you think that these very same people fully
> understand whether a patch should go in stable or not?

I haven't really seen a script that would be more efficient in this
evaluation. With a lack of the full test coverage I do not see this
going to change anytime soon.

> The approach of manually deciding if a patch needs to go in stable is
> wrong and it doesn't scale. We need to beef up our testing story and
> make these decisions based off of that, and not our error-prone brains
> that introduced these bugs to begin with.
> 
> Look at the outcome of this very issue: people sprung into action and
> fixed this bug quickly, but how many tests were added as a result of
> this? How do we know it's not going to regress again?

Yes, the issue got identified and analyzed quickly. There was no
questioning this part. It is the regression in stable that bothers me.
You have exposed users of a tree, which is supposed to be stable, to a bug
which was totally unnecessary because nobody cared for the parsing
behavior for years.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-02-18 12:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAN1fySViLgrLAHHfTHMef-Bkh73kUHKP-ava6TbgALeSE4LfFw@mail.gmail.com>
     [not found] ` <CAFLxGvwpuxKF4UWdsULwQWo7kgVLMNHC=77dN+w-A0a6avrAXA@mail.gmail.com>
     [not found]   ` <CAHk-=wjtH5ombebghG-SDNGppJF4a5bZAUqndtZ3tauWjgw5_w@mail.gmail.com>
     [not found]     ` <20190214122027.c0df36282d65dc9979248117@linux-foundation.org>
2019-02-15  7:00       ` Userspace regression in LTS and stable kernels Greg Kroah-Hartman
2019-02-15  7:13         ` Greg Kroah-Hartman
2019-02-15  9:10         ` Michal Hocko
2019-02-15  9:20           ` Greg Kroah-Hartman
2019-02-15  9:42             ` Michal Hocko
2019-02-15 15:19               ` Sasha Levin
2019-02-15 15:52                 ` Michal Hocko
2019-02-15 16:18                   ` Samuel Dionne-Riel
2019-02-15 18:02                     ` Sasha Levin
2019-02-15 18:00                   ` Sasha Levin
2019-02-18 12:56                     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox