* HP root-caues analysis for GRUB "Red screen of death" on DL120/DL360 G7 servers
@ 2011-12-08 15:36 Iain Barker
0 siblings, 0 replies; 5+ messages in thread
From: Iain Barker @ 2011-12-08 15:36 UTC (permalink / raw)
To: bug-grub@gnu.org; +Cc: grub-devel@gnu.org
I am posting the following information with permission from HP support, in the hope that it may be useful for future GRUB developer reference.
Please note that I do not subscribe to the GRUB mailing list, so cc: me directly if any reply is required.
Summary:
When using GRUB to chain-load from one device to another device, the HP BIOS used in currently DL120/DL360 (G7) servers reports "Illegal Opcode" and a red crashdump screen. This failure did not occur on previous G6 generation servers of the same models, which used AMI/Phoenix BIOS.
References:
HP support case 4635415916, opened for additional clarification in reference to HP customer advisory number c02695572
http://bizsupport1.austin.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02695572&lang=en&cc=us&taskId=101&prodSeriesId=4091408&prodTypeId=15351
Root cause analysis:
HP level3 engineering identified the root cause as follows:
_start_quoted_text_
HP Level-3 engineering have found that the HP BIOS on the DL120 G7 is not causing the red screen. GRUB loads its own INT13 handler in the interrupt vector table, so it will now intercept all int13 calls. Some time after it does that, GRUB does some type of memory copy operation which overrides the data at the address where Grub stores the INT13 handler code. As a result, on the next Int13 call in grub, the interrupt handler is no longer there so the processor just starts to execute whatever data overwrote where the int13 handler code was.
Here is how the red screen happens: When the processor executes an illegal instruction (like when it tries to execute whatever is in the overwritten int13 handler), the processor causes and interrupt which the BIOS then handles by printing the red screen with the register dump and the message. So our BIOS just prints out the red screen, but the cause of the red screen is Grub.
The specific scenario which leads to this is identified as follows:
1) Grub installs its own INT13 handler
2) Near the end of the chain loading process, Grub loads an image of the Linux kernel into memory which wipes out their Int13 handler.
3) Right before grub transfers control to the kernel to boot, grub makes a call to a function to turn off the floppy drive.
4) The call to the floppy code then makes an Int13 call to the handler which has been overwritten by the kernel and thereby results in the red screen.
The problem seems to be that Grub made assumptions about the memory layout in our system which is not accurate. HP systems that use HP developed BIOSes instead of outsourced (AMI) BIOSes use more of a memory area called EBDA than a typical system does. As a result, Grub assumes there's memory that it could safely use instead of properly calculating an area of safe memory to use. That's probably why Grub worked on the other systems and fails on G7.
_end quoted text_
Regards,
Iain Barker - Platform Engineering, Acme Packet.
yoshac@member.fsf.org
^ permalink raw reply [flat|nested] 5+ messages in thread
* HP root-caues analysis for GRUB "Red screen of death" on DL120/DL360 G7 servers
@ 2011-12-08 19:39 Iain Barker
2011-12-08 20:16 ` Vladimir 'φ-coder/phcoder' Serbinenko
0 siblings, 1 reply; 5+ messages in thread
From: Iain Barker @ 2011-12-08 19:39 UTC (permalink / raw)
To: bug-grub@gnu.org; +Cc: grub-devel@gnu.org
I am posting the following information with permission from HP support, in the hope that it may be useful for future GRUB developer reference.
Summary:
When using GRUB to chain-load from one device to another device (e.g. USB to HDD), the HP BIOS used in DL120/DL360 and other G7 servers reports "Illegal Opcode" and a red crashdump screen. This failure did not occur on previous generation (G6) servers of the same models, which used AMI/Phoenix BIOS.
References:
Acme Packet opened HP support case 4635415916 for additional clarification in reference to the public HP customer advisory number c02695572
http://bizsupport1.austin.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02695572&lang=en&cc=us&taskId=101&prodSeriesId=4091408&prodTypeId=15351
Root cause analysis:
HP level3 engineering identified the root cause as follows:
_start_quoted_text_
HP Level-3 engineering have found that the HP BIOS on the DL120 G7 is not causing the red screen. GRUB loads its own INT13 handler in the interrupt vector table, so it will now intercept all int13 calls. Some time after it does that, GRUB does some type of memory copy operation which overrides the data at the address where Grub stores the INT13 handler code. As a result, on the next Int13 call in grub, the interrupt handler is no longer there so the processor just starts to execute whatever data overwrote where the int13 handler code was.
Here is how the red screen happens: When the processor executes an illegal instruction (like when it tries to execute whatever is in the overwritten int13 handler), the processor causes and interrupt which the BIOS then handles by printing the red screen with the register dump and the message. So our BIOS just prints out the red screen, but the cause of the red screen is Grub.
The specific scenario which leads to this is identified as follows:
1) Grub installs its own INT13 handler
2) Near the end of the chain loading process, Grub loads an image of the Linux kernel into memory which wipes out their Int13 handler.
3) Right before grub transfers control to the kernel to boot, grub makes a call to a function to turn off the floppy drive.
4) The call to the floppy code then makes an Int13 call to the handler which has been overwritten by the kernel and thereby results in the red screen.
The problem seems to be that Grub made assumptions about the memory layout in our system which is not accurate. HP systems that use HP developed BIOSes instead of outsourced (AMI) BIOSes use more of a memory area called EBDA than a typical system does. As a result, Grub assumes there's memory that it could safely use instead of properly calculating an area of safe memory to use. That's probably why Grub worked on the other systems and fails on G7.
_end quoted text_
Regards,
Iain Barker - Platform Engineering, Acme Packet.
[yoshac@member.fsf.org]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: HP root-caues analysis for GRUB "Red screen of death" on DL120/DL360 G7 servers
2011-12-08 19:39 HP root-caues analysis for GRUB "Red screen of death" on DL120/DL360 G7 servers Iain Barker
@ 2011-12-08 20:16 ` Vladimir 'φ-coder/phcoder' Serbinenko
2011-12-08 20:23 ` Seth Goldberg
0 siblings, 1 reply; 5+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2011-12-08 20:16 UTC (permalink / raw)
To: The development of GNU GRUB; +Cc: Iain Barker
> 1) Grub installs its own INT13 handler
> 2) Near the end of the chain loading process, Grub loads an image of the Linux kernel into memory which wipes out their Int13 handler.
> 3) Right before grub transfers control to the kernel to boot, grub makes a call to a function to turn off the floppy drive.
> 4) The call to the floppy code then makes an Int13 call to the handler which has been overwritten by the kernel and thereby results in the red screen.
>
This text seems to be contradictory. INT13 handler is installed if
drivemap is useful only with chainload. But then it mentions Linux
loading. Also the call to stop floppy doesn't call BIOS:
static inline void
grub_stop_floppy (void)
{
grub_outb (0, GRUB_FLOPPY_REG_DIGITAL_OUTPUT);
}
May I see the configfile in question?
Also GRUB does calculate the safe place based on memory layout, looking
at code I see the problem that it's calculated before installing
drivemap hook. While this should be fixed, I see no reason to use
drivemap with Linux.
> _end quoted text_
>
> Regards,
> Iain Barker - Platform Engineering, Acme Packet.
> [yoshac@member.fsf.org]
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
>
--
Regards
Vladimir 'φ-coder/phcoder' Serbinenko
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: HP root-caues analysis for GRUB "Red screen of death" on DL120/DL360 G7 servers
2011-12-08 20:16 ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2011-12-08 20:23 ` Seth Goldberg
2011-12-08 20:42 ` Vladimir 'φ-coder/phcoder' Serbinenko
0 siblings, 1 reply; 5+ messages in thread
From: Seth Goldberg @ 2011-12-08 20:23 UTC (permalink / raw)
To: The development of GNU GRUB; +Cc: Iain Barker
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1743 bytes --]
Is this Legacy GRUB they're talking about maybe?
--S
Quoting Vladimir 'φ-coder/phcoder' Serbinenko, who wrote the following on...:
>
>> 1) Grub installs its own INT13 handler
>> 2) Near the end of the chain loading process, Grub loads an image of the
> Linux kernel into memory which wipes out their Int13 handler.
>> 3) Right before grub transfers control to the kernel to boot, grub makes a
> call to a function to turn off the floppy drive.
>> 4) The call to the floppy code then makes an Int13 call to the handler
> which has been overwritten by the kernel and thereby results in the red
> screen.
>>
> This text seems to be contradictory. INT13 handler is installed if
> drivemap is useful only with chainload. But then it mentions Linux
> loading. Also the call to stop floppy doesn't call BIOS:
> static inline void
> grub_stop_floppy (void)
> {
> grub_outb (0, GRUB_FLOPPY_REG_DIGITAL_OUTPUT);
> }
> May I see the configfile in question?
> Also GRUB does calculate the safe place based on memory layout, looking
> at code I see the problem that it's calculated before installing
> drivemap hook. While this should be fixed, I see no reason to use
> drivemap with Linux.
>> _end quoted text_
>>
>> Regards,
>> Iain Barker - Platform Engineering, Acme Packet.
>> [yoshac@member.fsf.org]
>>
>>
>> _______________________________________________
>> Grub-devel mailing list
>> Grub-devel@gnu.org
>> https://lists.gnu.org/mailman/listinfo/grub-devel
>>
>
>
> --
> Regards
> Vladimir 'φ-coder/phcoder' Serbinenko
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: HP root-caues analysis for GRUB "Red screen of death" on DL120/DL360 G7 servers
2011-12-08 20:23 ` Seth Goldberg
@ 2011-12-08 20:42 ` Vladimir 'φ-coder/phcoder' Serbinenko
0 siblings, 0 replies; 5+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2011-12-08 20:42 UTC (permalink / raw)
To: The development of GNU GRUB; +Cc: Iain Barker, Seth Goldberg
On 08.12.2011 21:23, Seth Goldberg wrote:
>
> Is this Legacy GRUB they're talking about maybe?
Can be. Then they use unsupported software to begin with.
>
>
> --S
>
> Quoting Vladimir 'φ-coder/phcoder' Serbinenko, who wrote the following
> on...:
>
>>
>>> 1) Grub installs its own INT13 handler
>>> 2) Near the end of the chain loading process, Grub loads an image of
>>> the
>> Linux kernel into memory which wipes out their Int13 handler.
>>> 3) Right before grub transfers control to the kernel to boot, grub
>>> makes a
>> call to a function to turn off the floppy drive.
>>> 4) The call to the floppy code then makes an Int13 call to the handler
>> which has been overwritten by the kernel and thereby results in the
>> red screen.
>>>
>> This text seems to be contradictory. INT13 handler is installed if
>> drivemap is useful only with chainload. But then it mentions Linux
>> loading. Also the call to stop floppy doesn't call BIOS:
>> static inline void
>> grub_stop_floppy (void)
>> {
>> grub_outb (0, GRUB_FLOPPY_REG_DIGITAL_OUTPUT);
>> }
>> May I see the configfile in question?
>> Also GRUB does calculate the safe place based on memory layout,
>> looking at code I see the problem that it's calculated before
>> installing drivemap hook. While this should be fixed, I see no reason
>> to use drivemap with Linux.
>>> _end quoted text_
>>>
>>> Regards,
>>> Iain Barker - Platform Engineering, Acme Packet.
>>> [yoshac@member.fsf.org]
>>>
>>>
>>> _______________________________________________
>>> Grub-devel mailing list
>>> Grub-devel@gnu.org
>>> https://lists.gnu.org/mailman/listinfo/grub-devel
>>>
>>
>>
>> --
>> Regards
>> Vladimir 'φ-coder/phcoder' Serbinenko
>>
>>
>> _______________________________________________
>> Grub-devel mailing list
>> Grub-devel@gnu.org
>> https://lists.gnu.org/mailman/listinfo/grub-devel
>>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
--
Regards
Vladimir 'φ-coder/phcoder' Serbinenko
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-12-08 21:34 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-08 19:39 HP root-caues analysis for GRUB "Red screen of death" on DL120/DL360 G7 servers Iain Barker
2011-12-08 20:16 ` Vladimir 'φ-coder/phcoder' Serbinenko
2011-12-08 20:23 ` Seth Goldberg
2011-12-08 20:42 ` Vladimir 'φ-coder/phcoder' Serbinenko
-- strict thread matches above, loose matches on Subject: below --
2011-12-08 15:36 Iain Barker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).