* Re: [PATCH] x86: memtest: fix compile warning
2009-06-11 14:21 ` Thomas Gleixner
@ 2009-06-11 14:30 ` H. Peter Anvin
2009-06-11 15:26 ` Andreas Herrmann
2009-06-11 17:19 ` Yinghai Lu
2 siblings, 0 replies; 8+ messages in thread
From: H. Peter Anvin @ 2009-06-11 14:30 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Andreas Herrmann, Stephen Rothwell, Ingo Molnar, Peter Zijlstra,
linux-next, linux-kernel
Thomas Gleixner wrote:
>
> But aside of that this code is confusing.
>
> start_phys_aligned = ALIGN(start_phys, incr);
>
> Why do we have to fiddle with the alignment. Are you really seing e820
> entries which are not 8 byte aligned ?
>
I have personally seen those on real systems.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] x86: memtest: fix compile warning
2009-06-11 14:21 ` Thomas Gleixner
2009-06-11 14:30 ` H. Peter Anvin
@ 2009-06-11 15:26 ` Andreas Herrmann
2009-06-12 13:11 ` Andreas Herrmann
2009-06-11 17:19 ` Yinghai Lu
2 siblings, 1 reply; 8+ messages in thread
From: Andreas Herrmann @ 2009-06-11 15:26 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Stephen Rothwell, Ingo Molnar, H. Peter Anvin, Peter Zijlstra,
linux-next, linux-kernel, Yinghai Lu
On Thu, Jun 11, 2009 at 04:21:41PM +0200, Thomas Gleixner wrote:
> On Thu, 11 Jun 2009, Andreas Herrmann wrote:
>
> > Commit c9690998ef48ffefeccb91c70a7739eebdea57f9
> > (x86: memtest: remove 64-bit division) introduced following compile warning:
> >
> > arch/x86/mm/memtest.c: In function 'memtest':
> > arch/x86/mm/memtest.c:56: warning: comparison of distinct pointer types lacks a cast
> > arch/x86/mm/memtest.c:58: warning: comparison of distinct pointer types lacks a cast
> >
> > Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
> > ---
> > arch/x86/mm/memtest.c | 4 ++--
> > 1 files changed, 2 insertions(+), 2 deletions(-)
> >
> > Sorry.
> > Please apply.
>
> I applied it already, but zapped it right away, as it is bad style to
> do the type casting in the loops. The proper fix is below.
Doesn't your fix re-introduces the 64-bit division problem with old
gcc? I removed that division with the mentioned commit just forgot to
type-cast the pointer.
> But aside of that this code is confusing.
>
> start_phys_aligned = ALIGN(start_phys, incr);
>
>
> Why do we have to fiddle with the alignment. Are you really seing e820
> entries which are not 8 byte aligned ?
CC-ing Yinghai who might know more about this.
See also http://marc.info/?l=linux-kernel&m=123490434528131
> for (p = start; p < end; p++, start_phys_aligned += incr) {
> if (*p == pattern)
> continue;
> if (start_phys_aligned == last_bad + incr) {
> last_bad += incr;
> continue;
> }
> if (start_bad)
> reserve_bad_mem(pattern, start_bad, last_bad + incr);
> start_bad = last_bad = start_phys_aligned;
> }
> if (start_bad)
> reserve_bad_mem(pattern, start_bad, last_bad + incr);
>
> I really had to look more than once to understand what the heck
> start_phys_aligned and last_bad + incr are doing. Really non
> intuitive.
>
> But the reserve_bad_mem() semantics are even more scary:
>
> - if you hit flaky memory, which gives you bad and good results here
> and there, you call reserve_bad_mem() totally unbound which is
> likely to overflow the early reservation space and panics the
> machine. You need to keep track of those events somehow (e.g. in a
> bitmap) so you can detect such problems and mark the whole affected
> region bad in one go.
Agreed, needs to be fixed.
> - you call reserve_early() which calls __reserve_early(....,
> overrun_ok = 0) so if you do the default multi pattern scan and each
> run sees the same region of broken memory you will trigger the
> "Overlapping early reservations" panic in __reserve_early() when you
> reserve that region the second time. Why do you run the test twice
> when the first one failed already ? Also there is no need to do the
> wipeout run in that case, which will trigger it as well!
Sure, needs to be fixed as well.
(Note: I think both problems exist in the memtest code right from the beginning.)
> So in both cases you panic the machine w/o need.
>
> Please fix ASAP.
> Thanks,
>
> tglx
> ---
> diff --git a/arch/x86/mm/memtest.c b/arch/x86/mm/memtest.c
> index d1c5cef..18d244f 100644
> --- a/arch/x86/mm/memtest.c
> +++ b/arch/x86/mm/memtest.c
> @@ -40,16 +40,14 @@ static void __init reserve_bad_mem(u64 pattern, u64 start_bad, u64 end_bad)
>
> static void __init memtest(u64 pattern, u64 start_phys, u64 size)
> {
> - u64 *p, *end;
> - void *start;
> + u64 *p, *start, *end;
> u64 start_bad, last_bad;
> u64 start_phys_aligned;
> - size_t incr;
> + const size_t incr = sizeof(pattern);
>
> - incr = sizeof(pattern);
> start_phys_aligned = ALIGN(start_phys, incr);
> start = __va(start_phys_aligned);
> - end = (u64 *) (start + size - (start_phys_aligned - start_phys));
> + end = start + (size - (start_phys_aligned - start_phys)) / incr;
> start_bad = 0;
> last_bad = 0;
>
Regards,
Andreas
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] x86: memtest: fix compile warning
2009-06-11 15:26 ` Andreas Herrmann
@ 2009-06-12 13:11 ` Andreas Herrmann
0 siblings, 0 replies; 8+ messages in thread
From: Andreas Herrmann @ 2009-06-12 13:11 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Stephen Rothwell, Ingo Molnar, H. Peter Anvin, Peter Zijlstra,
linux-next, linux-kernel, Yinghai Lu
On Thu, Jun 11, 2009 at 05:26:58PM +0200, Andreas Herrmann wrote:
> On Thu, Jun 11, 2009 at 04:21:41PM +0200, Thomas Gleixner wrote:
> > On Thu, 11 Jun 2009, Andreas Herrmann wrote:
> >
> > > Commit c9690998ef48ffefeccb91c70a7739eebdea57f9
> > > (x86: memtest: remove 64-bit division) introduced following compile warning:
> > >
> > > arch/x86/mm/memtest.c: In function 'memtest':
> > > arch/x86/mm/memtest.c:56: warning: comparison of distinct pointer types lacks a cast
> > > arch/x86/mm/memtest.c:58: warning: comparison of distinct pointer types lacks a cast
> > >
> > > Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
> > > ---
> > > arch/x86/mm/memtest.c | 4 ++--
> > > 1 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > Sorry.
> > > Please apply.
> >
> > I applied it already, but zapped it right away, as it is bad style to
> > do the type casting in the loops. The proper fix is below.
>
> Doesn't your fix re-introduces the 64-bit division problem with old
> gcc? I removed that division with the mentioned commit just forgot to
> type-cast the pointer.
It doesn't.
> > diff --git a/arch/x86/mm/memtest.c b/arch/x86/mm/memtest.c
> > index d1c5cef..18d244f 100644
> > --- a/arch/x86/mm/memtest.c
> > +++ b/arch/x86/mm/memtest.c
> > @@ -40,16 +40,14 @@ static void __init reserve_bad_mem(u64 pattern, u64 start_bad, u64 end_bad)
> >
> > static void __init memtest(u64 pattern, u64 start_phys, u64 size)
> > {
> > - u64 *p, *end;
> > - void *start;
> > + u64 *p, *start, *end;
> > u64 start_bad, last_bad;
> > u64 start_phys_aligned;
> > - size_t incr;
> > + const size_t incr = sizeof(pattern);
The const qualifier made the difference.
Thanks,
Andreas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] x86: memtest: fix compile warning
2009-06-11 14:21 ` Thomas Gleixner
2009-06-11 14:30 ` H. Peter Anvin
2009-06-11 15:26 ` Andreas Herrmann
@ 2009-06-11 17:19 ` Yinghai Lu
2009-06-11 21:05 ` Thomas Gleixner
2 siblings, 1 reply; 8+ messages in thread
From: Yinghai Lu @ 2009-06-11 17:19 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Andreas Herrmann, Stephen Rothwell, Ingo Molnar, H. Peter Anvin,
Peter Zijlstra, linux-next, linux-kernel
On Thu, Jun 11, 2009 at 7:21 AM, Thomas Gleixner<tglx@linutronix.de> wrote:
> On Thu, 11 Jun 2009, Andreas Herrmann wrote:
>
>> Commit c9690998ef48ffefeccb91c70a7739eebdea57f9
>> (x86: memtest: remove 64-bit division) introduced following compile warning:
>>
>> arch/x86/mm/memtest.c: In function 'memtest':
>> arch/x86/mm/memtest.c:56: warning: comparison of distinct pointer types lacks a cast
>> arch/x86/mm/memtest.c:58: warning: comparison of distinct pointer types lacks a cast
>>
>> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
>> ---
>> arch/x86/mm/memtest.c | 4 ++--
>> 1 files changed, 2 insertions(+), 2 deletions(-)
>>
>> Sorry.
>> Please apply.
>
> I applied it already, but zapped it right away, as it is bad style to
> do the type casting in the loops. The proper fix is below.
>
> But aside of that this code is confusing.
>
> start_phys_aligned = ALIGN(start_phys, incr);
>
> Why do we have to fiddle with the alignment. Are you really seing e820
> entries which are not 8 byte aligned ?
>
> for (p = start; p < end; p++, start_phys_aligned += incr) {
> if (*p == pattern)
> continue;
> if (start_phys_aligned == last_bad + incr) {
> last_bad += incr;
> continue;
> }
> if (start_bad)
> reserve_bad_mem(pattern, start_bad, last_bad + incr);
> start_bad = last_bad = start_phys_aligned;
> }
> if (start_bad)
> reserve_bad_mem(pattern, start_bad, last_bad + incr);
>
> I really had to look more than once to understand what the heck
> start_phys_aligned and last_bad + incr are doing. Really non
> intuitive.
>
> But the reserve_bad_mem() semantics are even more scary:
>
> - if you hit flaky memory, which gives you bad and good results here
> and there, you call reserve_bad_mem() totally unbound which is
> likely to overflow the early reservation space and panics the
> machine. You need to keep track of those events somehow (e.g. in a
> bitmap) so you can detect such problems and mark the whole affected
> region bad in one go.
if one pass found bad, it is reserved.
second pass will use find_e820_area_size() to get new range, so bad
one will not be used.
>
> - you call reserve_early() which calls __reserve_early(....,
> overrun_ok = 0) so if you do the default multi pattern scan and each
> run sees the same region of broken memory you will trigger the
> "Overlapping early reservations" panic in __reserve_early() when you
> reserve that region the second time. Why do you run the test twice
> when the first one failed already ? Also there is no need to do the
> wipeout run in that case, which will trigger it as well!
current problem in that: we could run out of res_reserve array.
solution will be make res_reserve array dynamically.
when can not find slot, need use find_e820_area to get double sized,
and copy the old to new one.
then free the old one.
YH
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] x86: memtest: fix compile warning
2009-06-11 17:19 ` Yinghai Lu
@ 2009-06-11 21:05 ` Thomas Gleixner
0 siblings, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2009-06-11 21:05 UTC (permalink / raw)
To: Yinghai Lu
Cc: Andreas Herrmann, Stephen Rothwell, Ingo Molnar, H. Peter Anvin,
Peter Zijlstra, linux-next, linux-kernel
On Thu, 11 Jun 2009, Yinghai Lu wrote:
> On Thu, Jun 11, 2009 at 7:21 AM, Thomas Gleixner<tglx@linutronix.de> wrote:
> > On Thu, 11 Jun 2009, Andreas Herrmann wrote:
> > But the reserve_bad_mem() semantics are even more scary:
> >
> > - if you hit flaky memory, which gives you bad and good results here
> > and there, you call reserve_bad_mem() totally unbound which is
> > likely to overflow the early reservation space and panics the
> > machine. You need to keep track of those events somehow (e.g. in a
> > bitmap) so you can detect such problems and mark the whole affected
> > region bad in one go.
>
> if one pass found bad, it is reserved.
> second pass will use find_e820_area_size() to get new range, so bad
> one will not be used.
No, that's not about passes. Assume that you have flaky memory which
works halfways. So that code runs through a full memory region from 0
to 0x1000000.
0-FF OK
100-1ff BAD
200-21f OK
220-23f BAD
....
So there is no find_e820_area_size() between those OK/BAD steps, but
every new BAD hit calls reserve_early() and you run out of space in
the reserve array.
> > - you call reserve_early() which calls __reserve_early(....,
> > overrun_ok = 0) so if you do the default multi pattern scan and each
> > run sees the same region of broken memory you will trigger the
> > "Overlapping early reservations" panic in __reserve_early() when you
> > reserve that region the second time. Why do you run the test twice
> > when the first one failed already ? Also there is no need to do the
> > wipeout run in that case, which will trigger it as well!
Ok, here applies the find_e820_area_size() thing. I missed that
because the code is so well documented and obvious.
> current problem in that: we could run out of res_reserve array.
> solution will be make res_reserve array dynamically.
> when can not find slot, need use find_e820_area to get double sized,
> and copy the old to new one.
> then free the old one.
This applies to the first problem, which can be avoided by clever
coding.
Thanks,
tglx
^ permalink raw reply [flat|nested] 8+ messages in thread