* compatability with older versions of UEFI
@ 2015-06-23 17:05 Luck, Tony
[not found] ` <20150623170534.GA21341-E6Nu+q68HHTI/KE9syI0vLvm/XP+8Wra@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Luck, Tony @ 2015-06-23 17:05 UTC (permalink / raw)
To: Matt Fleming; +Cc: linux-efi-u79uwXL29TY76Z2rM5mHXA
When we print UEFI appendix N error logs provided by plaform
firmware we get to this function:
static void cper_estatus_print_section(
const char *pfx, const struct acpi_hest_generic_data *gdata, int sec_no)
{
Which includes this code (similar for the other section types):
} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
printk("%s""section_type: memory error\n", newpfx);
if (gdata->error_data_length >= sizeof(*mem_err))
cper_print_mem(newpfx, mem_err);
else
goto err_section_too_small;
I'm having a problem with that size check. The kernel defines
"struct cper_sec_mem_err" according to the 2.3.1 and later versions
of the spec where the last element is the module handle at offset 78
so the "sizeof(*mem_err)" is 80. But my BIOS defaults to providing
version 2.2 format information, where the last field is the memory
error type, and the struture size is only 73 bytes. So this code
takes the goto err_section_too_small, prints an error message, and
stops decoding.
This is a pity because it looks like the UEFI folk took great care
to make these structures back/forward compatible. Each has a bitfield
at the start saying which elements are valid ... so my 2.2 using BIOS
will never set the valid bits for the fields it doesn't know about.
Perhaps we can have some defines for the size of these structures
in the oldest version of the spec where they exist (2.1 AFAICT):
#define UEFI_MEMSECTION_MIN_SIZE 73
etc. Then use these as the sanity check?
Or we can be totally distrustful of the BIOS and pass the
"gdata->error_data_length" to the print function and have
it check that size against the offset of any fields that
the validation_bits say we should print. This feels like
overkill.
-Tony
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: compatability with older versions of UEFI
[not found] ` <20150623170534.GA21341-E6Nu+q68HHTI/KE9syI0vLvm/XP+8Wra@public.gmane.org>
@ 2015-06-24 18:04 ` Zhang, Jonathan Zhixiong
[not found] ` <558AF115.8020909-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Zhang, Jonathan Zhixiong @ 2015-06-24 18:04 UTC (permalink / raw)
To: Luck, Tony, Matt Fleming
Cc: linux-efi-u79uwXL29TY76Z2rM5mHXA, harba-sgV2jX0FEOL9JmXXK+q4OQ,
linaro-acpi-QSEj5FYQhm4dnm+yROfE0A
Another option would be to have 2 structs, the first one
"struct cper_sec_mem_err" holds the structure as defined by UEFI
2.1, the 2nd one "struct cper_sec_mem_err_24_ext" holds the 4
elements added in UEFI 2.3.1.
In the past how is such situation (newer version of spec adds elements
to existing table) handled? To me, the "extension structure" option
proposed above makes sense because it allows rigid check of data
passed from firmware to OS.
Jonathan
On 6/23/2015 10:05 AM, Luck, Tony wrote:
> When we print UEFI appendix N error logs provided by plaform
> firmware we get to this function:
>
> static void cper_estatus_print_section(
> const char *pfx, const struct acpi_hest_generic_data *gdata, int sec_no)
> {
>
> Which includes this code (similar for the other section types):
>
>
> } else if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
> struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
> printk("%s""section_type: memory error\n", newpfx);
> if (gdata->error_data_length >= sizeof(*mem_err))
> cper_print_mem(newpfx, mem_err);
> else
> goto err_section_too_small;
>
> I'm having a problem with that size check. The kernel defines
> "struct cper_sec_mem_err" according to the 2.3.1 and later versions
> of the spec where the last element is the module handle at offset 78
> so the "sizeof(*mem_err)" is 80. But my BIOS defaults to providing
> version 2.2 format information, where the last field is the memory
> error type, and the struture size is only 73 bytes. So this code
> takes the goto err_section_too_small, prints an error message, and
> stops decoding.
>
> This is a pity because it looks like the UEFI folk took great care
> to make these structures back/forward compatible. Each has a bitfield
> at the start saying which elements are valid ... so my 2.2 using BIOS
> will never set the valid bits for the fields it doesn't know about.
>
> Perhaps we can have some defines for the size of these structures
> in the oldest version of the spec where they exist (2.1 AFAICT):
>
> #define UEFI_MEMSECTION_MIN_SIZE 73
>
> etc. Then use these as the sanity check?
>
> Or we can be totally distrustful of the BIOS and pass the
> "gdata->error_data_length" to the print function and have
> it check that size against the offset of any fields that
> the validation_bits say we should print. This feels like
> overkill.
>
> -Tony
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Jonathan (Zhixiong) Zhang
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: compatability with older versions of UEFI
[not found] ` <558AF115.8020909-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2015-06-24 19:14 ` Luck, Tony
[not found] ` <3908561D78D1C84285E8C5FCA982C28F32A9FB7C-8oqHQFITsIE64kNsxIetb7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-28 14:29 ` Matt Fleming
1 sibling, 1 reply; 13+ messages in thread
From: Luck, Tony @ 2015-06-24 19:14 UTC (permalink / raw)
To: Zhang, Jonathan Zhixiong, Fleming, Matt
Cc: linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
harba-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org,
linaro-acpi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org
> Another option would be to have 2 structs, the first one
> "struct cper_sec_mem_err" holds the structure as defined by UEFI
> 2.1, the 2nd one "struct cper_sec_mem_err_24_ext" holds the 4
> elements added in UEFI 2.3.1.
Reading some more of the UEFI 2.5 spec ... I see we are in for a world of pain
here.
2.5 adds some small tweaks to the memory structure (adding a couple of extra
bits to the "row" entry that can be grabbed from the formerly reserved byte
at offset 73). But then there is a whole new GUID for a "Memory Error Section 2"
which has doubled the width of the device, row, column, rank, and bit_pos fields
together with adding two new fields for chip_id and status. This will be painful
because we hardwired the old sizes into extlog_mem_event in <ras/ras_event.h>
-Tony
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: compatability with older versions of UEFI
[not found] ` <3908561D78D1C84285E8C5FCA982C28F32A9FB7C-8oqHQFITsIE64kNsxIetb7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-06-24 19:50 ` Zhang, Jonathan Zhixiong
[not found] ` <558B09F8.4060706-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2015-06-28 14:34 ` Matt Fleming
1 sibling, 1 reply; 13+ messages in thread
From: Zhang, Jonathan Zhixiong @ 2015-06-24 19:50 UTC (permalink / raw)
To: Luck, Tony, Fleming, Matt
Cc: linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linaro-acpi-cunTk1MwBs8s++Sfvej+rw,
harba-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org,
linaro-acpi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org
On 6/24/2015 12:14 PM, Luck, Tony wrote:
>> Another option would be to have 2 structs, the first one
>> "struct cper_sec_mem_err" holds the structure as defined by UEFI
>> 2.1, the 2nd one "struct cper_sec_mem_err_24_ext" holds the 4
>> elements added in UEFI 2.3.1.
>
> Reading some more of the UEFI 2.5 spec ... I see we are in for a world of pain
> here.
>
> 2.5 adds some small tweaks to the memory structure (adding a couple of extra
> bits to the "row" entry that can be grabbed from the formerly reserved byte
> at offset 73).
I think this can be dealt with easily as long as all platforms observe
the rule that if a bit is reserved, the bit should be set a '0' instead
of being set randomly. Is this a fair assumption on all platforms?
> But then there is a whole new GUID for a "Memory Error Section 2"
> which has doubled the width of the device, row, column, rank, and bit_pos fields
> together with adding two new fields for chip_id and status. This will be painful
> because we hardwired the old sizes into extlog_mem_event in <ras/ras_event.h>
The old size is encoded in "stuct cper_mem_err_compact", other members
of the trace data are the same between "Memory Error Section" and
"Memory Error Section 2". One option we have without having to disturb
user space handler of memory error trace data would be to change
"struct cper_mem_err_compact" so the affected elements are of
__u32 instead of __u16. Drawback of this option is that the trace
buffer will be unnecessarily bigger if a platform generates
"Memory Error Section" data instead of "Memory Error Section 2" data.
Such drawback is not a big issue given that uncorrected memory error
happens infrequently and corrected memory error should be grouped
by platform.
--
Jonathan (Zhixiong) Zhang
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: compatability with older versions of UEFI
[not found] ` <558B09F8.4060706-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2015-06-24 20:12 ` Luck, Tony
[not found] ` <3908561D78D1C84285E8C5FCA982C28F32A9FBFF-8oqHQFITsIE64kNsxIetb7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Luck, Tony @ 2015-06-24 20:12 UTC (permalink / raw)
To: Zhang, Jonathan Zhixiong, Fleming, Matt
Cc: linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linaro-acpi-cunTk1MwBs8s++Sfvej+rw@public.gmane.org,
harba-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org,
linaro-acpi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org
> "Memory Error Section 2". One option we have without having to disturb
> user space handler of memory error trace data would be to change
> "struct cper_mem_err_compact" so the affected elements are of
> __u32 instead of __u16. Drawback of this option is that the trace
> buffer will be unnecessarily bigger if a platform generates
> "Memory Error Section" data instead of "Memory Error Section 2" data.
That structure is visible to user level consumers of the event (perhaps just
one of those right now?):
git://git.fedorahosted.org/rasdaemon.git
We were not as smart as UEFI and didn't include a version number, or other
plan, that would allow us to transition to a new format cleanly.
Perhaps we could re-purpose some of the high order "validation_bits"
as a version number? It's a u64 and UEFI 2.5 is only up to bit21 so far,
so perhaps it will be a long, long time before they get that many fields
in some future standard.
-Tony
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: compatability with older versions of UEFI
[not found] ` <3908561D78D1C84285E8C5FCA982C28F32A9FBFF-8oqHQFITsIE64kNsxIetb7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-06-24 23:50 ` Zhang, Jonathan Zhixiong
0 siblings, 0 replies; 13+ messages in thread
From: Zhang, Jonathan Zhixiong @ 2015-06-24 23:50 UTC (permalink / raw)
To: Luck, Tony, mchehab-H+wXaHxf7aLQT0dZR+AlfA, Fleming, Matt
Cc: linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linaro-acpi-cunTk1MwBs8s++Sfvej+rw@public.gmane.org,
harba-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org,
linaro-acpi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org
+Mauro
On 6/24/2015 1:12 PM, Luck, Tony wrote:
>> "Memory Error Section 2". One option we have without having to disturb
>> user space handler of memory error trace data would be to change
>> "struct cper_mem_err_compact" so the affected elements are of
>> __u32 instead of __u16. Drawback of this option is that the trace
>> buffer will be unnecessarily bigger if a platform generates
>> "Memory Error Section" data instead of "Memory Error Section 2" data.
>
> That structure is visible to user level consumers of the event (perhaps just
> one of those right now?):
>
> git://git.fedorahosted.org/rasdaemon.git
>
> We were not as smart as UEFI and didn't include a version number, or other
> plan, that would allow us to transition to a new format cleanly.
>
> Perhaps we could re-purpose some of the high order "validation_bits"
> as a version number? It's a u64 and UEFI 2.5 is only up to bit21 so far,
> so perhaps it will be a long, long time before they get that many fields
> in some future standard.
I agree that we could re-purpose some of the high order "validation_bits"
as a version number. That being said, I am not sure whether that
approach is preferred by user space tool authors. My feeling is
such approach would make user space tool more complicated (eg.
has to understand versions). Mauro, pls. correct me is I am wrong;
pls. refer to previous email in this thread for context related to
challenge brought forth by UEFI 2.5.
perf interface has debugfs interface, so if user space tool does
following, compatibility with different kernel version and different
spec version will be maintained:
* Use debugfs interface to discover format of trace data.
* use largest size known for user space structure; check size of member
in user space structure and size of corresponding field in trace data,
adjust data as necessary.
In the mean time, I think when kernel defines TRACE_EVENT, it should
try not having to use __field_struct, that would make format inside that
field opaque to user space tool. For instance:
__field_struct(struct cper_mem_err_compact, data)
Because of above, ras-extlog-handler.c in rasdaemon has to hardcode
this structure, making it harder to maintain forward/backward
compatibility.
--
Jonathan (Zhixiong) Zhang
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: compatability with older versions of UEFI
[not found] ` <558AF115.8020909-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2015-06-24 19:14 ` Luck, Tony
@ 2015-06-28 14:29 ` Matt Fleming
[not found] ` <20150628142909.GA28334-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
1 sibling, 1 reply; 13+ messages in thread
From: Matt Fleming @ 2015-06-28 14:29 UTC (permalink / raw)
To: Zhang, Jonathan Zhixiong
Cc: Luck, Tony, Matt Fleming, linux-efi-u79uwXL29TY76Z2rM5mHXA,
harba-sgV2jX0FEOL9JmXXK+q4OQ, linaro-acpi-QSEj5FYQhm4dnm+yROfE0A
On Wed, 24 Jun, at 11:04:05AM, Zhang, Jonathan Zhixiong wrote:
> Another option would be to have 2 structs, the first one
> "struct cper_sec_mem_err" holds the structure as defined by UEFI
> 2.1, the 2nd one "struct cper_sec_mem_err_24_ext" holds the 4
> elements added in UEFI 2.3.1.
Yes, this is traditionally how we've handled this issue in other parts
of the kernel that deal with UEFI/ACPI.
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: compatability with older versions of UEFI
[not found] ` <3908561D78D1C84285E8C5FCA982C28F32A9FB7C-8oqHQFITsIE64kNsxIetb7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-24 19:50 ` Zhang, Jonathan Zhixiong
@ 2015-06-28 14:34 ` Matt Fleming
1 sibling, 0 replies; 13+ messages in thread
From: Matt Fleming @ 2015-06-28 14:34 UTC (permalink / raw)
To: Luck, Tony
Cc: Zhang, Jonathan Zhixiong, Fleming, Matt,
linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
harba-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org,
linaro-acpi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org
On Wed, 24 Jun, at 07:14:43PM, Luck, Tony wrote:
> > Another option would be to have 2 structs, the first one
> > "struct cper_sec_mem_err" holds the structure as defined by UEFI
> > 2.1, the 2nd one "struct cper_sec_mem_err_24_ext" holds the 4
> > elements added in UEFI 2.3.1.
>
> Reading some more of the UEFI 2.5 spec ... I see we are in for a world of pain
> here.
>
> 2.5 adds some small tweaks to the memory structure (adding a couple of extra
> bits to the "row" entry that can be grabbed from the formerly reserved byte
> at offset 73). But then there is a whole new GUID for a "Memory Error Section 2"
> which has doubled the width of the device, row, column, rank, and bit_pos fields
> together with adding two new fields for chip_id and status. This will be painful
> because we hardwired the old sizes into extlog_mem_event in <ras/ras_event.h>
Can't you just create a new memory event type? Granted, you'll need
changes to the ras daemon, but since there's a new GUID involved I think
that's forgivable.
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] efi: Handle memory error structures produced based on old versions of standard
[not found] ` <20150628142909.GA28334-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-06-29 18:21 ` Luck, Tony
[not found] ` <20150629182106.GA25924-E6Nu+q68HHTI/KE9syI0vLvm/XP+8Wra@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Luck, Tony @ 2015-06-29 18:21 UTC (permalink / raw)
To: Matt Fleming
Cc: Zhang, Jonathan Zhixiong, Matt Fleming,
linux-efi-u79uwXL29TY76Z2rM5mHXA, harba-sgV2jX0FEOL9JmXXK+q4OQ,
linaro-acpi-QSEj5FYQhm4dnm+yROfE0A
The memory error record structure includes as its first field a
bitmask of which subsequent fields are valid. The allows new fields
to be added to the structure while keeping compatibility with older
software that parses these records. This mechanism was used between
versions 2.2 and 2.3 to add four new fields, growing the size of the
structure from 73 bytes to 80. But Linux just added all the new
fields so this test:
if (gdata->error_data_length >= sizeof(*mem_err))
cper_print_mem(newpfx, mem_err);
else
goto err_section_too_small;
now make Linux complain about old format records being too short.
Add a definition for the old format of the structure and use that
for the minimum size check. Pass the actual size to cper_print_mem()
so it can sanity check the validation_bits field to ensure that if
a BIOS using the old format sets bits as if it were new, we won't
access fields beyond the end of the structure.
Signed-off-by: Tony Luck <tony.luck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
drivers/firmware/efi/cper.c | 13 ++++++++++---
include/linux/cper.h | 22 +++++++++++++++++++++-
2 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 4fd9961d552e..b2012004a8ab 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -305,10 +305,15 @@ const char *cper_mem_err_unpack(struct trace_seq *p,
return ret;
}
-static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem)
+static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem,
+ int len)
{
struct cper_mem_err_compact cmem;
+ /* Don't trust UEFI 2.1/2.2 structure with bad validation bits */
+ if (len == sizeof(struct cper_sec_mem_err_old) &&
+ (mem->validation_bits & ~(CPER_MEM_VALID_RANK_NUMBER - 1)))
+ return;
if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
if (mem->validation_bits & CPER_MEM_VALID_PA)
@@ -405,8 +410,10 @@ static void cper_estatus_print_section(
} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
printk("%s""section_type: memory error\n", newpfx);
- if (gdata->error_data_length >= sizeof(*mem_err))
- cper_print_mem(newpfx, mem_err);
+ if (gdata->error_data_length >=
+ sizeof(struct cper_sec_mem_err_old))
+ cper_print_mem(newpfx, mem_err,
+ gdata->error_data_length);
else
goto err_section_too_small;
} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PCIE)) {
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 76abba4b238e..dcacb1a72e26 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -340,7 +340,27 @@ struct cper_ia_proc_ctx {
__u64 mm_reg_addr;
};
-/* Memory Error Section */
+/* Old Memory Error Section UEFI 2.1, 2.2 */
+struct cper_sec_mem_err_old {
+ __u64 validation_bits;
+ __u64 error_status;
+ __u64 physical_addr;
+ __u64 physical_addr_mask;
+ __u16 node;
+ __u16 card;
+ __u16 module;
+ __u16 bank;
+ __u16 device;
+ __u16 row;
+ __u16 column;
+ __u16 bit_pos;
+ __u64 requestor_id;
+ __u64 responder_id;
+ __u64 target_id;
+ __u8 error_type;
+};
+
+/* Memory Error Section UEFI >= 2.3 */
struct cper_sec_mem_err {
__u64 validation_bits;
__u64 error_status;
--
2.1.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] efi: Handle memory error structures produced based on old versions of standard
[not found] ` <20150629182106.GA25924-E6Nu+q68HHTI/KE9syI0vLvm/XP+8Wra@public.gmane.org>
@ 2015-06-30 12:22 ` Matt Fleming
[not found] ` <20150630122244.GJ28334-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Matt Fleming @ 2015-06-30 12:22 UTC (permalink / raw)
To: Luck, Tony
Cc: Zhang, Jonathan Zhixiong, Matt Fleming,
linux-efi-u79uwXL29TY76Z2rM5mHXA, harba-sgV2jX0FEOL9JmXXK+q4OQ,
linaro-acpi-QSEj5FYQhm4dnm+yROfE0A
On Mon, 29 Jun, at 11:21:07AM, Luck, Tony wrote:
> The memory error record structure includes as its first field a
> bitmask of which subsequent fields are valid. The allows new fields
> to be added to the structure while keeping compatibility with older
> software that parses these records. This mechanism was used between
> versions 2.2 and 2.3 to add four new fields, growing the size of the
> structure from 73 bytes to 80. But Linux just added all the new
> fields so this test:
> if (gdata->error_data_length >= sizeof(*mem_err))
> cper_print_mem(newpfx, mem_err);
> else
> goto err_section_too_small;
> now make Linux complain about old format records being too short.
>
> Add a definition for the old format of the structure and use that
> for the minimum size check. Pass the actual size to cper_print_mem()
> so it can sanity check the validation_bits field to ensure that if
> a BIOS using the old format sets bits as if it were new, we won't
> access fields beyond the end of the structure.
>
> Signed-off-by: Tony Luck <tony.luck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> ---
> drivers/firmware/efi/cper.c | 13 ++++++++++---
> include/linux/cper.h | 22 +++++++++++++++++++++-
> 2 files changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 4fd9961d552e..b2012004a8ab 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -305,10 +305,15 @@ const char *cper_mem_err_unpack(struct trace_seq *p,
> return ret;
> }
>
> -static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem)
> +static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem,
> + int len)
> {
> struct cper_mem_err_compact cmem;
>
> + /* Don't trust UEFI 2.1/2.2 structure with bad validation bits */
> + if (len == sizeof(struct cper_sec_mem_err_old) &&
> + (mem->validation_bits & ~(CPER_MEM_VALID_RANK_NUMBER - 1)))
> + return;
I would have thought that we'd want to print an error message here
instead of silently returning?
Otherwise looks good.
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCHv2] efi: Handle memory error structures produced based on old versions of standard
[not found] ` <20150630122244.GJ28334-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-06-30 22:57 ` Luck, Tony
[not found] ` <20150630225751.GA18060-E6Nu+q68HHTI/KE9syI0vLvm/XP+8Wra@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Luck, Tony @ 2015-06-30 22:57 UTC (permalink / raw)
To: Matt Fleming
Cc: Zhang, Jonathan Zhixiong, Matt Fleming,
linux-efi-u79uwXL29TY76Z2rM5mHXA, harba-sgV2jX0FEOL9JmXXK+q4OQ,
linaro-acpi-QSEj5FYQhm4dnm+yROfE0A
The memory error record structure includes as its first field a
bitmask of which subsequent fields are valid. The allows new fields
to be added to the structure while keeping compatibility with older
software that parses these records. This mechanism was used between
versions 2.2 and 2.3 to add four new fields, growing the size of the
structure from 73 bytes to 80. But Linux just added all the new
fields so this test:
if (gdata->error_data_length >= sizeof(*mem_err))
cper_print_mem(newpfx, mem_err);
else
goto err_section_too_small;
now make Linux complain about old format records being too short.
Add a definition for the old format of the structure and use that
for the minimum size check. Pass the actual size to cper_print_mem()
so it can sanity check the validation_bits field to ensure that if
a BIOS using the old format sets bits as if it were new, we won't
access fields beyond the end of the structure.
Signed-off-by: Tony Luck <tony.luck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
v1-v2: print FW_WARN if we see bogus validation bits from an old sized structure
drivers/firmware/efi/cper.c | 15 ++++++++++++---
include/linux/cper.h | 22 +++++++++++++++++++++-
2 files changed, 33 insertions(+), 4 deletions(-)
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 4fd9961d552e..d42537425438 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -305,10 +305,17 @@ const char *cper_mem_err_unpack(struct trace_seq *p,
return ret;
}
-static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem)
+static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem,
+ int len)
{
struct cper_mem_err_compact cmem;
+ /* Don't trust UEFI 2.1/2.2 structure with bad validation bits */
+ if (len == sizeof(struct cper_sec_mem_err_old) &&
+ (mem->validation_bits & ~(CPER_MEM_VALID_RANK_NUMBER - 1))) {
+ pr_err(FW_WARN "valid bits set for fields beyond structure\n");
+ return;
+ }
if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
if (mem->validation_bits & CPER_MEM_VALID_PA)
@@ -405,8 +412,10 @@ static void cper_estatus_print_section(
} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
printk("%s""section_type: memory error\n", newpfx);
- if (gdata->error_data_length >= sizeof(*mem_err))
- cper_print_mem(newpfx, mem_err);
+ if (gdata->error_data_length >=
+ sizeof(struct cper_sec_mem_err_old))
+ cper_print_mem(newpfx, mem_err,
+ gdata->error_data_length);
else
goto err_section_too_small;
} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PCIE)) {
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 76abba4b238e..dcacb1a72e26 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -340,7 +340,27 @@ struct cper_ia_proc_ctx {
__u64 mm_reg_addr;
};
-/* Memory Error Section */
+/* Old Memory Error Section UEFI 2.1, 2.2 */
+struct cper_sec_mem_err_old {
+ __u64 validation_bits;
+ __u64 error_status;
+ __u64 physical_addr;
+ __u64 physical_addr_mask;
+ __u16 node;
+ __u16 card;
+ __u16 module;
+ __u16 bank;
+ __u16 device;
+ __u16 row;
+ __u16 column;
+ __u16 bit_pos;
+ __u64 requestor_id;
+ __u64 responder_id;
+ __u64 target_id;
+ __u8 error_type;
+};
+
+/* Memory Error Section UEFI >= 2.3 */
struct cper_sec_mem_err {
__u64 validation_bits;
__u64 error_status;
--
2.1.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCHv2] efi: Handle memory error structures produced based on old versions of standard
[not found] ` <20150630225751.GA18060-E6Nu+q68HHTI/KE9syI0vLvm/XP+8Wra@public.gmane.org>
@ 2015-07-08 15:54 ` Matt Fleming
[not found] ` <20150708155440.GA5598-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Matt Fleming @ 2015-07-08 15:54 UTC (permalink / raw)
To: Luck, Tony
Cc: Zhang, Jonathan Zhixiong, Matt Fleming,
linux-efi-u79uwXL29TY76Z2rM5mHXA, harba-sgV2jX0FEOL9JmXXK+q4OQ,
linaro-acpi-QSEj5FYQhm4dnm+yROfE0A
On Tue, 30 Jun, at 03:57:51PM, Luck, Tony wrote:
> The memory error record structure includes as its first field a
> bitmask of which subsequent fields are valid. The allows new fields
> to be added to the structure while keeping compatibility with older
> software that parses these records. This mechanism was used between
> versions 2.2 and 2.3 to add four new fields, growing the size of the
> structure from 73 bytes to 80. But Linux just added all the new
> fields so this test:
> if (gdata->error_data_length >= sizeof(*mem_err))
> cper_print_mem(newpfx, mem_err);
> else
> goto err_section_too_small;
> now make Linux complain about old format records being too short.
>
> Add a definition for the old format of the structure and use that
> for the minimum size check. Pass the actual size to cper_print_mem()
> so it can sanity check the validation_bits field to ensure that if
> a BIOS using the old format sets bits as if it were new, we won't
> access fields beyond the end of the structure.
>
> Signed-off-by: Tony Luck <tony.luck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> ---
> v1-v2: print FW_WARN if we see bogus validation bits from an old sized structure
>
> drivers/firmware/efi/cper.c | 15 ++++++++++++---
> include/linux/cper.h | 22 +++++++++++++++++++++-
> 2 files changed, 33 insertions(+), 4 deletions(-)
Looks good to me Tony. Since this is essentially a bug fix, does it need
applying to stable?
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCHv2] efi: Handle memory error structures produced based on old versions of standard
[not found] ` <20150708155440.GA5598-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-07-08 17:35 ` Luck, Tony
0 siblings, 0 replies; 13+ messages in thread
From: Luck, Tony @ 2015-07-08 17:35 UTC (permalink / raw)
To: Matt Fleming
Cc: Zhang, Jonathan Zhixiong, Fleming, Matt,
linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
harba-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org,
linaro-acpi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org
> Looks good to me Tony. Since this is essentially a bug fix, does it need
> applying to stable?
Yes. Please add a Cc: stable tag when applying
Thanks for catching that
-Tony
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2015-07-08 17:35 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-23 17:05 compatability with older versions of UEFI Luck, Tony
[not found] ` <20150623170534.GA21341-E6Nu+q68HHTI/KE9syI0vLvm/XP+8Wra@public.gmane.org>
2015-06-24 18:04 ` Zhang, Jonathan Zhixiong
[not found] ` <558AF115.8020909-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2015-06-24 19:14 ` Luck, Tony
[not found] ` <3908561D78D1C84285E8C5FCA982C28F32A9FB7C-8oqHQFITsIE64kNsxIetb7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-24 19:50 ` Zhang, Jonathan Zhixiong
[not found] ` <558B09F8.4060706-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2015-06-24 20:12 ` Luck, Tony
[not found] ` <3908561D78D1C84285E8C5FCA982C28F32A9FBFF-8oqHQFITsIE64kNsxIetb7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-24 23:50 ` Zhang, Jonathan Zhixiong
2015-06-28 14:34 ` Matt Fleming
2015-06-28 14:29 ` Matt Fleming
[not found] ` <20150628142909.GA28334-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-06-29 18:21 ` [PATCH] efi: Handle memory error structures produced based on old versions of standard Luck, Tony
[not found] ` <20150629182106.GA25924-E6Nu+q68HHTI/KE9syI0vLvm/XP+8Wra@public.gmane.org>
2015-06-30 12:22 ` Matt Fleming
[not found] ` <20150630122244.GJ28334-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-06-30 22:57 ` [PATCHv2] " Luck, Tony
[not found] ` <20150630225751.GA18060-E6Nu+q68HHTI/KE9syI0vLvm/XP+8Wra@public.gmane.org>
2015-07-08 15:54 ` Matt Fleming
[not found] ` <20150708155440.GA5598-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-07-08 17:35 ` Luck, Tony
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).