linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [lkp] [ACPI] 7494b07eba: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
       [not found] <871td6xcxc.fsf@yhuang-dev.intel.com>
@ 2015-10-08 11:44 ` Hanjun Guo
  2015-10-08 16:36   ` Al Stone
  0 siblings, 1 reply; 10+ messages in thread
From: Hanjun Guo @ 2015-10-08 11:44 UTC (permalink / raw)
  To: kernel test robot, Al Stone; +Cc: lkp, LKML, Rafael J. Wysocki

On 10/08/2015 11:21 AM, kernel test robot wrote:
> FYI, we noticed the below changes on
>
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> commit 7494b07ebaae2117629024369365f7be7adc16c3 ("ACPI: add in a bad_madt_entry() function to eventually replace the macro")
>
> [    0.000000] ACPI: undefined MADT subtable type for FADT 4.0: 127 (length 12)

Seems that the MADT table contains reserved subtable type (0x7F),
so this is traded as a wrong type in our patch.

> [    0.000000] ACPI: Error parsing LAPIC address override entry

This was called by early_acpi_parse_madt_lapic_addr_ovr() in
arch/x86/kernel/acpi/boot.c, which is scanning MADT for the first
time when booting, so it will fail the boot process when finding
the reserved MADT subtable type.

> [    0.000000] ACPI: Invalid BIOS MADT, disabling ACPI

As the spec said in Table 5-46 (ACPI 6.0):

0x10-0x7F Reserved. OSPM skips structures of the reserved type.

Should we just ignore those reserved type when scanning the MADT
table? In the patch "ACPI: add in a bad_madt_entry() function to
eventually replace the macro", we just trade it as wrong, that's
why we failed to boot the system.

Thanks
Hanjun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp] [ACPI] 7494b07eba: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
  2015-10-08 11:44 ` [lkp] [ACPI] 7494b07eba: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 Hanjun Guo
@ 2015-10-08 16:36   ` Al Stone
  2015-10-08 20:37     ` Rafael J. Wysocki
  0 siblings, 1 reply; 10+ messages in thread
From: Al Stone @ 2015-10-08 16:36 UTC (permalink / raw)
  To: Hanjun Guo, kernel test robot; +Cc: lkp, LKML, Rafael J. Wysocki

On 10/08/2015 05:44 AM, Hanjun Guo wrote:
> On 10/08/2015 11:21 AM, kernel test robot wrote:
>> FYI, we noticed the below changes on
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
>> commit 7494b07ebaae2117629024369365f7be7adc16c3 ("ACPI: add in a
>> bad_madt_entry() function to eventually replace the macro")
>>
>> [    0.000000] ACPI: undefined MADT subtable type for FADT 4.0: 127 (length 12)
> 
> Seems that the MADT table contains reserved subtable type (0x7F),
> so this is traded as a wrong type in our patch.
> 
>> [    0.000000] ACPI: Error parsing LAPIC address override entry
> 
> This was called by early_acpi_parse_madt_lapic_addr_ovr() in
> arch/x86/kernel/acpi/boot.c, which is scanning MADT for the first
> time when booting, so it will fail the boot process when finding
> the reserved MADT subtable type.
> 
>> [    0.000000] ACPI: Invalid BIOS MADT, disabling ACPI
> 
> As the spec said in Table 5-46 (ACPI 6.0):
> 
> 0x10-0x7F Reserved. OSPM skips structures of the reserved type.
> 
> Should we just ignore those reserved type when scanning the MADT
> table? In the patch "ACPI: add in a bad_madt_entry() function to
> eventually replace the macro", we just trade it as wrong, that's
> why we failed to boot the system.
> 
> Thanks
> Hanjun

Arrgh.  This is why people get frustrated with ACPI.  The spec is
saying that those sub-table types are reserved -- implying they can
and probably will be used for something else in the future -- but
then vendors are shipping firmware that uses those reserved values,
and an OS *expects* them to be used, and there is *no* documentation
of it other than a kernel workaround.

So yet again, technically this MADT subtable *is* wrong, and someone
should slap the vendor for doing this.  But, the practical side of
this is that we now have to workaround what is now a known violation
of the spec.

The more ACPI allows this kind of nonsense, the less usable it will
become.

At a minimum, whoever is responsible for this firmware needs to make
sure the spec reflects what they are doing.  In the meantime, the
only option is what Hanjun suggests -- make this a warning and not a
failure.  I'll prepare a patch and attach it to a reply here in a few
minutes...


-- 
ciao,
al
-----------------------------------
Al Stone
Software Engineer
Linaro Enterprise Group
al.stone@linaro.org
-----------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp] [ACPI] 7494b07eba: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
  2015-10-08 20:41       ` Rafael J. Wysocki
@ 2015-10-08 20:32         ` Al Stone
  2015-10-08 22:50           ` Rafael J. Wysocki
  0 siblings, 1 reply; 10+ messages in thread
From: Al Stone @ 2015-10-08 20:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Hanjun Guo, kernel test robot, lkp, LKML, Rafael J. Wysocki

On 10/08/2015 02:41 PM, Rafael J. Wysocki wrote:
> On Thursday, October 08, 2015 10:37:55 PM Rafael J. Wysocki wrote:
>> On Thursday, October 08, 2015 10:36:40 AM Al Stone wrote:
>>> On 10/08/2015 05:44 AM, Hanjun Guo wrote:
>>>> On 10/08/2015 11:21 AM, kernel test robot wrote:
>>>>> FYI, we noticed the below changes on
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
>>>>> commit 7494b07ebaae2117629024369365f7be7adc16c3 ("ACPI: add in a
>>>>> bad_madt_entry() function to eventually replace the macro")
>>>>>
>>>>> [    0.000000] ACPI: undefined MADT subtable type for FADT 4.0: 127 (length 12)
>>>>
>>>> Seems that the MADT table contains reserved subtable type (0x7F),
>>>> so this is traded as a wrong type in our patch.
>>>>
>>>>> [    0.000000] ACPI: Error parsing LAPIC address override entry
>>>>
>>>> This was called by early_acpi_parse_madt_lapic_addr_ovr() in
>>>> arch/x86/kernel/acpi/boot.c, which is scanning MADT for the first
>>>> time when booting, so it will fail the boot process when finding
>>>> the reserved MADT subtable type.
>>>>
>>>>> [    0.000000] ACPI: Invalid BIOS MADT, disabling ACPI
>>>>
>>>> As the spec said in Table 5-46 (ACPI 6.0):
>>>>
>>>> 0x10-0x7F Reserved. OSPM skips structures of the reserved type.
>>>>
>>>> Should we just ignore those reserved type when scanning the MADT
>>>> table? In the patch "ACPI: add in a bad_madt_entry() function to
>>>> eventually replace the macro", we just trade it as wrong, that's
>>>> why we failed to boot the system.
>>>>
>>>> Thanks
>>>> Hanjun
>>>
>>> Arrgh.  This is why people get frustrated with ACPI.  The spec is
>>> saying that those sub-table types are reserved -- implying they can
>>> and probably will be used for something else in the future -- but
>>> then vendors are shipping firmware that uses those reserved values,
>>> and an OS *expects* them to be used, and there is *no* documentation
>>> of it other than a kernel workaround.
>>>
>>> So yet again, technically this MADT subtable *is* wrong, and someone
>>> should slap the vendor for doing this.  But, the practical side of
>>> this is that we now have to workaround what is now a known violation
>>> of the spec.
>>>
>>> The more ACPI allows this kind of nonsense, the less usable it will
>>> become.
>>
>> Linux Kernel Developer's First Rule: You shall not break setups that
>> worked previously, even if they worked by accident.
>>
>> IOW, if something booted and your commit made it not boot any more, it counts
>> as a regression and needs to be modified or reverted.
> 
> Moreover, if the firmware in question shipped in a product, we have no choice
> but to work around bugs in it.  Doing otherwise would be refusing to support
> our users and not the vendor of the systems they were unfortunate enough to
> acquire.
> 
> Thanks,
> Rafael
> 

Yup, understood and agreed.  I have no issue at all with the First Rule.

What I have an issue with is all the exceptions to the standards -- and
primarily the unknown ones -- that exist with ACPI (or any other standard,
mind you), independent of any OS.

It's just like driving a car.  I will (and do) grumble at people when they
break the rules.  On the other hand, I'm not going to crash into them even
if they are at fault.  When the ACPI spec gets twisted around, I'm going
to say something about it; just the same, I am not going to break their
system if it already works.

/me goes back to testing his patch for this breakage....

-- 
ciao,
al
-----------------------------------
Al Stone
Software Engineer
Linaro Enterprise Group
al.stone@linaro.org
-----------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp] [ACPI] 7494b07eba: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
  2015-10-08 16:36   ` Al Stone
@ 2015-10-08 20:37     ` Rafael J. Wysocki
  2015-10-08 20:41       ` Rafael J. Wysocki
  0 siblings, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2015-10-08 20:37 UTC (permalink / raw)
  To: Al Stone; +Cc: Hanjun Guo, kernel test robot, lkp, LKML, Rafael J. Wysocki

On Thursday, October 08, 2015 10:36:40 AM Al Stone wrote:
> On 10/08/2015 05:44 AM, Hanjun Guo wrote:
> > On 10/08/2015 11:21 AM, kernel test robot wrote:
> >> FYI, we noticed the below changes on
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> >> commit 7494b07ebaae2117629024369365f7be7adc16c3 ("ACPI: add in a
> >> bad_madt_entry() function to eventually replace the macro")
> >>
> >> [    0.000000] ACPI: undefined MADT subtable type for FADT 4.0: 127 (length 12)
> > 
> > Seems that the MADT table contains reserved subtable type (0x7F),
> > so this is traded as a wrong type in our patch.
> > 
> >> [    0.000000] ACPI: Error parsing LAPIC address override entry
> > 
> > This was called by early_acpi_parse_madt_lapic_addr_ovr() in
> > arch/x86/kernel/acpi/boot.c, which is scanning MADT for the first
> > time when booting, so it will fail the boot process when finding
> > the reserved MADT subtable type.
> > 
> >> [    0.000000] ACPI: Invalid BIOS MADT, disabling ACPI
> > 
> > As the spec said in Table 5-46 (ACPI 6.0):
> > 
> > 0x10-0x7F Reserved. OSPM skips structures of the reserved type.
> > 
> > Should we just ignore those reserved type when scanning the MADT
> > table? In the patch "ACPI: add in a bad_madt_entry() function to
> > eventually replace the macro", we just trade it as wrong, that's
> > why we failed to boot the system.
> > 
> > Thanks
> > Hanjun
> 
> Arrgh.  This is why people get frustrated with ACPI.  The spec is
> saying that those sub-table types are reserved -- implying they can
> and probably will be used for something else in the future -- but
> then vendors are shipping firmware that uses those reserved values,
> and an OS *expects* them to be used, and there is *no* documentation
> of it other than a kernel workaround.
> 
> So yet again, technically this MADT subtable *is* wrong, and someone
> should slap the vendor for doing this.  But, the practical side of
> this is that we now have to workaround what is now a known violation
> of the spec.
> 
> The more ACPI allows this kind of nonsense, the less usable it will
> become.

Linux Kernel Developer's First Rule: You shall not break setups that
worked previously, even if they worked by accident.

IOW, if something booted and your commit made it not boot any more, it counts
as a regression and needs to be modified or reverted.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp] [ACPI] 7494b07eba: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
  2015-10-08 20:37     ` Rafael J. Wysocki
@ 2015-10-08 20:41       ` Rafael J. Wysocki
  2015-10-08 20:32         ` Al Stone
  0 siblings, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2015-10-08 20:41 UTC (permalink / raw)
  To: Al Stone; +Cc: Hanjun Guo, kernel test robot, lkp, LKML, Rafael J. Wysocki

On Thursday, October 08, 2015 10:37:55 PM Rafael J. Wysocki wrote:
> On Thursday, October 08, 2015 10:36:40 AM Al Stone wrote:
> > On 10/08/2015 05:44 AM, Hanjun Guo wrote:
> > > On 10/08/2015 11:21 AM, kernel test robot wrote:
> > >> FYI, we noticed the below changes on
> > >>
> > >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > >> commit 7494b07ebaae2117629024369365f7be7adc16c3 ("ACPI: add in a
> > >> bad_madt_entry() function to eventually replace the macro")
> > >>
> > >> [    0.000000] ACPI: undefined MADT subtable type for FADT 4.0: 127 (length 12)
> > > 
> > > Seems that the MADT table contains reserved subtable type (0x7F),
> > > so this is traded as a wrong type in our patch.
> > > 
> > >> [    0.000000] ACPI: Error parsing LAPIC address override entry
> > > 
> > > This was called by early_acpi_parse_madt_lapic_addr_ovr() in
> > > arch/x86/kernel/acpi/boot.c, which is scanning MADT for the first
> > > time when booting, so it will fail the boot process when finding
> > > the reserved MADT subtable type.
> > > 
> > >> [    0.000000] ACPI: Invalid BIOS MADT, disabling ACPI
> > > 
> > > As the spec said in Table 5-46 (ACPI 6.0):
> > > 
> > > 0x10-0x7F Reserved. OSPM skips structures of the reserved type.
> > > 
> > > Should we just ignore those reserved type when scanning the MADT
> > > table? In the patch "ACPI: add in a bad_madt_entry() function to
> > > eventually replace the macro", we just trade it as wrong, that's
> > > why we failed to boot the system.
> > > 
> > > Thanks
> > > Hanjun
> > 
> > Arrgh.  This is why people get frustrated with ACPI.  The spec is
> > saying that those sub-table types are reserved -- implying they can
> > and probably will be used for something else in the future -- but
> > then vendors are shipping firmware that uses those reserved values,
> > and an OS *expects* them to be used, and there is *no* documentation
> > of it other than a kernel workaround.
> > 
> > So yet again, technically this MADT subtable *is* wrong, and someone
> > should slap the vendor for doing this.  But, the practical side of
> > this is that we now have to workaround what is now a known violation
> > of the spec.
> > 
> > The more ACPI allows this kind of nonsense, the less usable it will
> > become.
> 
> Linux Kernel Developer's First Rule: You shall not break setups that
> worked previously, even if they worked by accident.
> 
> IOW, if something booted and your commit made it not boot any more, it counts
> as a regression and needs to be modified or reverted.

Moreover, if the firmware in question shipped in a product, we have no choice
but to work around bugs in it.  Doing otherwise would be refusing to support
our users and not the vendor of the systems they were unfortunate enough to
acquire.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp] [ACPI] 7494b07eba: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
  2015-10-08 20:32         ` Al Stone
@ 2015-10-08 22:50           ` Rafael J. Wysocki
  2015-10-08 23:05             ` Al Stone
  0 siblings, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2015-10-08 22:50 UTC (permalink / raw)
  To: Al Stone; +Cc: Hanjun Guo, kernel test robot, lkp, LKML, Rafael J. Wysocki

On Thursday, October 08, 2015 02:32:15 PM Al Stone wrote:
> On 10/08/2015 02:41 PM, Rafael J. Wysocki wrote:
> > On Thursday, October 08, 2015 10:37:55 PM Rafael J. Wysocki wrote:
> >> On Thursday, October 08, 2015 10:36:40 AM Al Stone wrote:
> >>> On 10/08/2015 05:44 AM, Hanjun Guo wrote:
> >>>> On 10/08/2015 11:21 AM, kernel test robot wrote:
> >>>>> FYI, we noticed the below changes on
> >>>>>
> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> >>>>> commit 7494b07ebaae2117629024369365f7be7adc16c3 ("ACPI: add in a
> >>>>> bad_madt_entry() function to eventually replace the macro")
> >>>>>
> >>>>> [    0.000000] ACPI: undefined MADT subtable type for FADT 4.0: 127 (length 12)
> >>>>
> >>>> Seems that the MADT table contains reserved subtable type (0x7F),
> >>>> so this is traded as a wrong type in our patch.
> >>>>
> >>>>> [    0.000000] ACPI: Error parsing LAPIC address override entry
> >>>>
> >>>> This was called by early_acpi_parse_madt_lapic_addr_ovr() in
> >>>> arch/x86/kernel/acpi/boot.c, which is scanning MADT for the first
> >>>> time when booting, so it will fail the boot process when finding
> >>>> the reserved MADT subtable type.
> >>>>
> >>>>> [    0.000000] ACPI: Invalid BIOS MADT, disabling ACPI
> >>>>
> >>>> As the spec said in Table 5-46 (ACPI 6.0):
> >>>>
> >>>> 0x10-0x7F Reserved. OSPM skips structures of the reserved type.
> >>>>
> >>>> Should we just ignore those reserved type when scanning the MADT
> >>>> table? In the patch "ACPI: add in a bad_madt_entry() function to
> >>>> eventually replace the macro", we just trade it as wrong, that's
> >>>> why we failed to boot the system.
> >>>>
> >>>> Thanks
> >>>> Hanjun
> >>>
> >>> Arrgh.  This is why people get frustrated with ACPI.  The spec is
> >>> saying that those sub-table types are reserved -- implying they can
> >>> and probably will be used for something else in the future -- but
> >>> then vendors are shipping firmware that uses those reserved values,
> >>> and an OS *expects* them to be used, and there is *no* documentation
> >>> of it other than a kernel workaround.
> >>>
> >>> So yet again, technically this MADT subtable *is* wrong, and someone
> >>> should slap the vendor for doing this.  But, the practical side of
> >>> this is that we now have to workaround what is now a known violation
> >>> of the spec.
> >>>
> >>> The more ACPI allows this kind of nonsense, the less usable it will
> >>> become.
> >>
> >> Linux Kernel Developer's First Rule: You shall not break setups that
> >> worked previously, even if they worked by accident.
> >>
> >> IOW, if something booted and your commit made it not boot any more, it counts
> >> as a regression and needs to be modified or reverted.
> > 
> > Moreover, if the firmware in question shipped in a product, we have no choice
> > but to work around bugs in it.  Doing otherwise would be refusing to support
> > our users and not the vendor of the systems they were unfortunate enough to
> > acquire.
> > 
> > Thanks,
> > Rafael
> > 
> 
> Yup, understood and agreed.  I have no issue at all with the First Rule.
> 
> What I have an issue with is all the exceptions to the standards -- and
> primarily the unknown ones -- that exist with ACPI (or any other standard,
> mind you), independent of any OS.

Well, one can argue that stadards are not what is written in specifications,
but what is done in practice by everybody.  If a specification does not agree
with the common practice, there is a problem with it, not with the practice.

> It's just like driving a car.  I will (and do) grumble at people when they
> break the rules.  On the other hand, I'm not going to crash into them even
> if they are at fault.  When the ACPI spec gets twisted around, I'm going
> to say something about it; just the same, I am not going to break their
> system if it already works.

OK

So IMO there are two things we can do.  First, try to update the spec to
reflect the reality where needed.  Second, having done that, add appropriate
checks to a firmware test suite and make it scream bloody murder every time
they trigger.  It also may be a good idea to print warnings into the kernel
buffer for them.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp] [ACPI] 7494b07eba: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
  2015-10-08 22:50           ` Rafael J. Wysocki
@ 2015-10-08 23:05             ` Al Stone
  2015-10-09 21:02               ` Rafael J. Wysocki
  0 siblings, 1 reply; 10+ messages in thread
From: Al Stone @ 2015-10-08 23:05 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Hanjun Guo, kernel test robot, lkp, LKML, Rafael J. Wysocki

On 10/08/2015 04:50 PM, Rafael J. Wysocki wrote:
> On Thursday, October 08, 2015 02:32:15 PM Al Stone wrote:
>> On 10/08/2015 02:41 PM, Rafael J. Wysocki wrote:
>>> On Thursday, October 08, 2015 10:37:55 PM Rafael J. Wysocki wrote:
>>>> On Thursday, October 08, 2015 10:36:40 AM Al Stone wrote:
>>>>> On 10/08/2015 05:44 AM, Hanjun Guo wrote:
>>>>>> On 10/08/2015 11:21 AM, kernel test robot wrote:
>>>>>>> FYI, we noticed the below changes on
>>>>>>>
>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
>>>>>>> commit 7494b07ebaae2117629024369365f7be7adc16c3 ("ACPI: add in a
>>>>>>> bad_madt_entry() function to eventually replace the macro")
>>>>>>>
>>>>>>> [    0.000000] ACPI: undefined MADT subtable type for FADT 4.0: 127 (length 12)
>>>>>>
>>>>>> Seems that the MADT table contains reserved subtable type (0x7F),
>>>>>> so this is traded as a wrong type in our patch.
>>>>>>
>>>>>>> [    0.000000] ACPI: Error parsing LAPIC address override entry
>>>>>>
>>>>>> This was called by early_acpi_parse_madt_lapic_addr_ovr() in
>>>>>> arch/x86/kernel/acpi/boot.c, which is scanning MADT for the first
>>>>>> time when booting, so it will fail the boot process when finding
>>>>>> the reserved MADT subtable type.
>>>>>>
>>>>>>> [    0.000000] ACPI: Invalid BIOS MADT, disabling ACPI
>>>>>>
>>>>>> As the spec said in Table 5-46 (ACPI 6.0):
>>>>>>
>>>>>> 0x10-0x7F Reserved. OSPM skips structures of the reserved type.
>>>>>>
>>>>>> Should we just ignore those reserved type when scanning the MADT
>>>>>> table? In the patch "ACPI: add in a bad_madt_entry() function to
>>>>>> eventually replace the macro", we just trade it as wrong, that's
>>>>>> why we failed to boot the system.
>>>>>>
>>>>>> Thanks
>>>>>> Hanjun
>>>>>
>>>>> Arrgh.  This is why people get frustrated with ACPI.  The spec is
>>>>> saying that those sub-table types are reserved -- implying they can
>>>>> and probably will be used for something else in the future -- but
>>>>> then vendors are shipping firmware that uses those reserved values,
>>>>> and an OS *expects* them to be used, and there is *no* documentation
>>>>> of it other than a kernel workaround.
>>>>>
>>>>> So yet again, technically this MADT subtable *is* wrong, and someone
>>>>> should slap the vendor for doing this.  But, the practical side of
>>>>> this is that we now have to workaround what is now a known violation
>>>>> of the spec.
>>>>>
>>>>> The more ACPI allows this kind of nonsense, the less usable it will
>>>>> become.
>>>>
>>>> Linux Kernel Developer's First Rule: You shall not break setups that
>>>> worked previously, even if they worked by accident.
>>>>
>>>> IOW, if something booted and your commit made it not boot any more, it counts
>>>> as a regression and needs to be modified or reverted.
>>>
>>> Moreover, if the firmware in question shipped in a product, we have no choice
>>> but to work around bugs in it.  Doing otherwise would be refusing to support
>>> our users and not the vendor of the systems they were unfortunate enough to
>>> acquire.
>>>
>>> Thanks,
>>> Rafael
>>>
>>
>> Yup, understood and agreed.  I have no issue at all with the First Rule.
>>
>> What I have an issue with is all the exceptions to the standards -- and
>> primarily the unknown ones -- that exist with ACPI (or any other standard,
>> mind you), independent of any OS.
> 
> Well, one can argue that stadards are not what is written in specifications,
> but what is done in practice by everybody.  If a specification does not agree
> with the common practice, there is a problem with it, not with the practice.

True.  That is the other side of it.  How rampant is this particular problem,
though?  Does everyone and their uncle use some reserved MADT subtable ID value
in their firmware?  This is the first time I've personally seen this, but then
I haven't been looking for it until now.

>> It's just like driving a car.  I will (and do) grumble at people when they
>> break the rules.  On the other hand, I'm not going to crash into them even
>> if they are at fault.  When the ACPI spec gets twisted around, I'm going
>> to say something about it; just the same, I am not going to break their
>> system if it already works.
> 
> OK
> 
> So IMO there are two things we can do.  First, try to update the spec to
> reflect the reality where needed.  Second, having done that, add appropriate
> checks to a firmware test suite and make it scream bloody murder every time
> they trigger.  It also may be a good idea to print warnings into the kernel
> buffer for them.
> 
> Thanks,
> Rafael
> 

Agreed.  The patch below uses pr_err() for arm64, as the maintainers wish, and
flags with pr_warn() any such uses on other architectures; this should fix the
regression.

In the meantime, I'll poke the spec folks on the use of reserved subtable IDs
in the MADT and see what the consensus is there.  It may just be a matter of
clarifying the language in the spec.

It's also on my plate to really dig into an ACPI test suite and see about
building something really robust for that -- this can be added as an example.
I'll see if I have time to send in a patch for FWTS, too, which is pretty
good about capturing such things.

-- 
ciao,
al
-----------------------------------
Al Stone
Software Engineer
Linaro Enterprise Group
al.stone@linaro.org
-----------------------------------

>From 9ae1261f0c2f2ee87d45521579e3ce572692d2aa Mon Sep 17 00:00:00 2001
>From: Al Stone <ahs3@redhat.com>
>Date: Thu, 8 Oct 2015 16:20:53 -0600
>Subject: [PATCH] ACPI: fix regression where x86 firmware contains MADT
 reserved subtable IDs

Kernel testing uncovered a case where x86 firmware contained an MADT
subtable with a type of 0x7F.  This is a reserved value, and arguably
should not really be used; however, disallowing its use causes the
regression.

This patch allows bad_madt_entry() to let those cases slide if we're
not using an arm64 platform where we want stricter conformance to the
ACPI spec.

Signed-off-by: Al Stone <al.stone@linaro.org>
---
 drivers/acpi/tables.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index a2ed38a..a74b6fa 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -413,9 +413,15 @@ static int __init bad_madt_entry(struct acpi_table_header
*table,
 	}

 	if (entry->type >= ms->num_types) {
-		pr_err("undefined MADT subtable type for FADT %d.%d: %d (length %d)\n",
-		       major, minor, entry->type, entry->length);
-		return 1;
+		if (IS_ENABLED(CONFIG_ARM64)) {
+			pr_err("undefined MADT subtable type for FADT %d.%d: %d (length %d)\n",
+			       major, minor, entry->type, entry->length);
+			return 1;
+		} else {
+			pr_warn("firmware should not be using reserved MADT subtable type for FADT
%d.%d: %d (length %d)\n",
+			        major, minor, entry->type, entry->length);
+			return 0;
+		}
 	}

 	/* verify that the table is allowed for this version of the spec */
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [lkp] [ACPI] 7494b07eba: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
  2015-10-08 23:05             ` Al Stone
@ 2015-10-09 21:02               ` Rafael J. Wysocki
  2015-10-09 22:52                 ` Al Stone
  0 siblings, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2015-10-09 21:02 UTC (permalink / raw)
  To: Al Stone; +Cc: Hanjun Guo, kernel test robot, lkp, LKML, Rafael J. Wysocki

On Thursday, October 08, 2015 05:05:00 PM Al Stone wrote:
> On 10/08/2015 04:50 PM, Rafael J. Wysocki wrote:
> > On Thursday, October 08, 2015 02:32:15 PM Al Stone wrote:
> >> On 10/08/2015 02:41 PM, Rafael J. Wysocki wrote:
> >>> On Thursday, October 08, 2015 10:37:55 PM Rafael J. Wysocki wrote:
> >>>> On Thursday, October 08, 2015 10:36:40 AM Al Stone wrote:
> >>>>> On 10/08/2015 05:44 AM, Hanjun Guo wrote:
> >>>>>> On 10/08/2015 11:21 AM, kernel test robot wrote:
> >>>>>>> FYI, we noticed the below changes on
> >>>>>>>
> >>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> >>>>>>> commit 7494b07ebaae2117629024369365f7be7adc16c3 ("ACPI: add in a
> >>>>>>> bad_madt_entry() function to eventually replace the macro")
> >>>>>>>
> >>>>>>> [    0.000000] ACPI: undefined MADT subtable type for FADT 4.0: 127 (length 12)
> >>>>>>
> >>>>>> Seems that the MADT table contains reserved subtable type (0x7F),
> >>>>>> so this is traded as a wrong type in our patch.
> >>>>>>
> >>>>>>> [    0.000000] ACPI: Error parsing LAPIC address override entry
> >>>>>>
> >>>>>> This was called by early_acpi_parse_madt_lapic_addr_ovr() in
> >>>>>> arch/x86/kernel/acpi/boot.c, which is scanning MADT for the first
> >>>>>> time when booting, so it will fail the boot process when finding
> >>>>>> the reserved MADT subtable type.
> >>>>>>
> >>>>>>> [    0.000000] ACPI: Invalid BIOS MADT, disabling ACPI
> >>>>>>
> >>>>>> As the spec said in Table 5-46 (ACPI 6.0):
> >>>>>>
> >>>>>> 0x10-0x7F Reserved. OSPM skips structures of the reserved type.
> >>>>>>
> >>>>>> Should we just ignore those reserved type when scanning the MADT
> >>>>>> table? In the patch "ACPI: add in a bad_madt_entry() function to
> >>>>>> eventually replace the macro", we just trade it as wrong, that's
> >>>>>> why we failed to boot the system.
> >>>>>>
> >>>>>> Thanks
> >>>>>> Hanjun
> >>>>>
> >>>>> Arrgh.  This is why people get frustrated with ACPI.  The spec is
> >>>>> saying that those sub-table types are reserved -- implying they can
> >>>>> and probably will be used for something else in the future -- but
> >>>>> then vendors are shipping firmware that uses those reserved values,
> >>>>> and an OS *expects* them to be used, and there is *no* documentation
> >>>>> of it other than a kernel workaround.
> >>>>>
> >>>>> So yet again, technically this MADT subtable *is* wrong, and someone
> >>>>> should slap the vendor for doing this.  But, the practical side of
> >>>>> this is that we now have to workaround what is now a known violation
> >>>>> of the spec.
> >>>>>
> >>>>> The more ACPI allows this kind of nonsense, the less usable it will
> >>>>> become.
> >>>>
> >>>> Linux Kernel Developer's First Rule: You shall not break setups that
> >>>> worked previously, even if they worked by accident.
> >>>>
> >>>> IOW, if something booted and your commit made it not boot any more, it counts
> >>>> as a regression and needs to be modified or reverted.
> >>>
> >>> Moreover, if the firmware in question shipped in a product, we have no choice
> >>> but to work around bugs in it.  Doing otherwise would be refusing to support
> >>> our users and not the vendor of the systems they were unfortunate enough to
> >>> acquire.
> >>>
> >>> Thanks,
> >>> Rafael
> >>>
> >>
> >> Yup, understood and agreed.  I have no issue at all with the First Rule.
> >>
> >> What I have an issue with is all the exceptions to the standards -- and
> >> primarily the unknown ones -- that exist with ACPI (or any other standard,
> >> mind you), independent of any OS.
> > 
> > Well, one can argue that stadards are not what is written in specifications,
> > but what is done in practice by everybody.  If a specification does not agree
> > with the common practice, there is a problem with it, not with the practice.
> 
> True.  That is the other side of it.  How rampant is this particular problem,
> though?  Does everyone and their uncle use some reserved MADT subtable ID value
> in their firmware?  This is the first time I've personally seen this, but then
> I haven't been looking for it until now.

Well, that depends on whether or not the OS the firmware was tested against
contained checks that would catch the problem and make it complain.  We don't
have them without your patch, but do other OSes have them?  If they don't
either, quite a lot of stuff like that may be expected to have gone to users.

It only takes one reference firmware containing bugs like those to make them
quite widespread.

> >> It's just like driving a car.  I will (and do) grumble at people when they
> >> break the rules.  On the other hand, I'm not going to crash into them even
> >> if they are at fault.  When the ACPI spec gets twisted around, I'm going
> >> to say something about it; just the same, I am not going to break their
> >> system if it already works.
> > 
> > OK
> > 
> > So IMO there are two things we can do.  First, try to update the spec to
> > reflect the reality where needed.  Second, having done that, add appropriate
> > checks to a firmware test suite and make it scream bloody murder every time
> > they trigger.  It also may be a good idea to print warnings into the kernel
> > buffer for them.
> > 
> 
> Agreed.  The patch below uses pr_err() for arm64, as the maintainers wish, and
> flags with pr_warn() any such uses on other architectures; this should fix the
> regression.
> 
> In the meantime, I'll poke the spec folks on the use of reserved subtable IDs
> in the MADT and see what the consensus is there.  It may just be a matter of
> clarifying the language in the spec.

One additional question to ask is what checks have been present in the OSes
and what they do if they see a reserved MADT subtable ID.  If they haven't been
doing anything so far, I'm afraid this particular train may be gone already.

> It's also on my plate to really dig into an ACPI test suite and see about
> building something really robust for that -- this can be added as an example.
> I'll see if I have time to send in a patch for FWTS, too, which is pretty
> good about capturing such things.

Sounds good!

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp] [ACPI] 7494b07eba: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
  2015-10-09 21:02               ` Rafael J. Wysocki
@ 2015-10-09 22:52                 ` Al Stone
  2015-10-09 23:44                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 10+ messages in thread
From: Al Stone @ 2015-10-09 22:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Hanjun Guo, kernel test robot, lkp, LKML, Rafael J. Wysocki

On 10/09/2015 03:02 PM, Rafael J. Wysocki wrote:
> On Thursday, October 08, 2015 05:05:00 PM Al Stone wrote:
>> On 10/08/2015 04:50 PM, Rafael J. Wysocki wrote:
>>> On Thursday, October 08, 2015 02:32:15 PM Al Stone wrote:
>>>> On 10/08/2015 02:41 PM, Rafael J. Wysocki wrote:
>>>>> On Thursday, October 08, 2015 10:37:55 PM Rafael J. Wysocki wrote:
>>>>>> On Thursday, October 08, 2015 10:36:40 AM Al Stone wrote:
>>>>>>> On 10/08/2015 05:44 AM, Hanjun Guo wrote:
>>>>>>>> On 10/08/2015 11:21 AM, kernel test robot wrote:
>>>>>>>>> FYI, we noticed the below changes on
>>>>>>>>>
>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
>>>>>>>>> commit 7494b07ebaae2117629024369365f7be7adc16c3 ("ACPI: add in a
>>>>>>>>> bad_madt_entry() function to eventually replace the macro")
>>>>>>>>>
>>>>>>>>> [    0.000000] ACPI: undefined MADT subtable type for FADT 4.0: 127 (length 12)
[snip....]

>> In the meantime, I'll poke the spec folks on the use of reserved subtable IDs
>> in the MADT and see what the consensus is there.  It may just be a matter of
>> clarifying the language in the spec.
> 
> One additional question to ask is what checks have been present in the OSes
> and what they do if they see a reserved MADT subtable ID.  If they haven't been
> doing anything so far, I'm afraid this particular train may be gone already.

It may be gone.  The silence so far is deafening :).

>> It's also on my plate to really dig into an ACPI test suite and see about
>> building something really robust for that -- this can be added as an example.
>> I'll see if I have time to send in a patch for FWTS, too, which is pretty
>> good about capturing such things.
> 
> Sounds good!
> 
> Thanks,
> Rafael
> 

Let me know if I need to send the patch to fix the regression elsewhere; it
dawned on me long after I sent it that this may not be the right place for it
to go...

-- 
ciao,
al
-----------------------------------
Al Stone
Software Engineer
Linaro Enterprise Group
al.stone@linaro.org
-----------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp] [ACPI] 7494b07eba: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
  2015-10-09 22:52                 ` Al Stone
@ 2015-10-09 23:44                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2015-10-09 23:44 UTC (permalink / raw)
  To: Al Stone, Rafael J. Wysocki; +Cc: Hanjun Guo, kernel test robot, lkp, LKML

On 10/10/2015 12:52 AM, Al Stone wrote:
> On 10/09/2015 03:02 PM, Rafael J. Wysocki wrote:
>> On Thursday, October 08, 2015 05:05:00 PM Al Stone wrote:
>>> On 10/08/2015 04:50 PM, Rafael J. Wysocki wrote:
>>>> On Thursday, October 08, 2015 02:32:15 PM Al Stone wrote:
>>>>> On 10/08/2015 02:41 PM, Rafael J. Wysocki wrote:
>>>>>> On Thursday, October 08, 2015 10:37:55 PM Rafael J. Wysocki wrote:
>>>>>>> On Thursday, October 08, 2015 10:36:40 AM Al Stone wrote:
>>>>>>>> On 10/08/2015 05:44 AM, Hanjun Guo wrote:
>>>>>>>>> On 10/08/2015 11:21 AM, kernel test robot wrote:
>>>>>>>>>> FYI, we noticed the below changes on
>>>>>>>>>>
>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
>>>>>>>>>> commit 7494b07ebaae2117629024369365f7be7adc16c3 ("ACPI: add in a
>>>>>>>>>> bad_madt_entry() function to eventually replace the macro")
>>>>>>>>>>
>>>>>>>>>> [    0.000000] ACPI: undefined MADT subtable type for FADT 4.0: 127 (length 12)
> [snip....]
>
>>> In the meantime, I'll poke the spec folks on the use of reserved subtable IDs
>>> in the MADT and see what the consensus is there.  It may just be a matter of
>>> clarifying the language in the spec.
>> One additional question to ask is what checks have been present in the OSes
>> and what they do if they see a reserved MADT subtable ID.  If they haven't been
>> doing anything so far, I'm afraid this particular train may be gone already.
> It may be gone.  The silence so far is deafening :).
>
>>> It's also on my plate to really dig into an ACPI test suite and see about
>>> building something really robust for that -- this can be added as an example.
>>> I'll see if I have time to send in a patch for FWTS, too, which is pretty
>>> good about capturing such things.
>> Sounds good!
>>
>> Thanks,
>> Rafael
>>
> Let me know if I need to send the patch to fix the regression elsewhere; it
> dawned on me long after I sent it that this may not be the right place for it
> to go...
>

Please send it to linux-acpi@vger.kernel.org.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-10-09 23:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <871td6xcxc.fsf@yhuang-dev.intel.com>
2015-10-08 11:44 ` [lkp] [ACPI] 7494b07eba: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 Hanjun Guo
2015-10-08 16:36   ` Al Stone
2015-10-08 20:37     ` Rafael J. Wysocki
2015-10-08 20:41       ` Rafael J. Wysocki
2015-10-08 20:32         ` Al Stone
2015-10-08 22:50           ` Rafael J. Wysocki
2015-10-08 23:05             ` Al Stone
2015-10-09 21:02               ` Rafael J. Wysocki
2015-10-09 22:52                 ` Al Stone
2015-10-09 23:44                   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).