* [uml-devel] [PATCH] Fix for "occasional userspace process in D/Z state" bug
@ 2014-09-26 11:49 anton.ivanov
2014-09-26 11:56 ` Richard Weinberger
0 siblings, 1 reply; 4+ messages in thread
From: anton.ivanov @ 2014-09-26 11:49 UTC (permalink / raw)
To: user-mode-linux-devel; +Cc: richard
From: Anton Ivanov <antivano@cisco.com>
This is a fix for a very old UML bug which can be triggered with stock
UML. It takes a lot of effort to trigger it there because the
lseek()/read() | write() mechanics of the UBD driver implicitly sync the
memory all the time by hitting the appropriate barrier implementation in
the host kernel.
By improving the disk susbsystem we make this bug raise its ugly head
with a vengeance - you can get a process in D (with an occasional child
in Z state) simply by running an apt-get on 30-40 large packages.
Is this correct place to have the sync - no idea. It may need to move
to somewhere inside tlb.c. With the fence in exec.c it works (TM).
If I understand this correctly, this also needs to be an instruction
appropriate for the underlying host so just a barrier() will not cut
it. You have to fence. En-guarde... Touche... :)
Signed-off-by: Anton Ivanov <antivano@cisco.com>
---
arch/um/kernel/exec.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index 0d7103c..7cb6805 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -27,6 +27,11 @@ void flush_thread(void)
ret = unmap(¤t->mm->context.id, 0, STUB_START, 0, &data);
ret = ret || unmap(¤t->mm->context.id, STUB_END,
host_task_size - STUB_END, 1, &data);
+#ifdef CONFIG_X86_32
+ alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2);
+#else
+ asm volatile("mfence":::"memory");
+#endif
if (ret) {
printk(KERN_ERR "flush_thread - clearing address space failed, "
"err = %d\n", ret);
--
1.7.10.4
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [uml-devel] [PATCH] Fix for "occasional userspace process in D/Z state" bug
2014-09-26 11:49 [uml-devel] [PATCH] Fix for "occasional userspace process in D/Z state" bug anton.ivanov
@ 2014-09-26 11:56 ` Richard Weinberger
2014-09-26 12:35 ` Anton Ivanov (antivano)
2014-09-28 8:15 ` Anton Ivanov
0 siblings, 2 replies; 4+ messages in thread
From: Richard Weinberger @ 2014-09-26 11:56 UTC (permalink / raw)
To: anton.ivanov, user-mode-linux-devel; +Cc: Daniel Walter
Am 26.09.2014 13:49, schrieb anton.ivanov@kot-begemot.co.uk:
> From: Anton Ivanov <antivano@cisco.com>
>
> This is a fix for a very old UML bug which can be triggered with stock
> UML. It takes a lot of effort to trigger it there because the
> lseek()/read() | write() mechanics of the UBD driver implicitly sync the
> memory all the time by hitting the appropriate barrier implementation in
> the host kernel.
>
> By improving the disk susbsystem we make this bug raise its ugly head
> with a vengeance - you can get a process in D (with an occasional child
> in Z state) simply by running an apt-get on 30-40 large packages.
>
> Is this correct place to have the sync - no idea. It may need to move
> to somewhere inside tlb.c. With the fence in exec.c it works (TM).
>
> If I understand this correctly, this also needs to be an instruction
> appropriate for the underlying host so just a barrier() will not cut
> it. You have to fence. En-guarde... Touche... :)
>
> Signed-off-by: Anton Ivanov <antivano@cisco.com>
> ---
> arch/um/kernel/exec.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
> index 0d7103c..7cb6805 100644
> --- a/arch/um/kernel/exec.c
> +++ b/arch/um/kernel/exec.c
> @@ -27,6 +27,11 @@ void flush_thread(void)
> ret = unmap(¤t->mm->context.id, 0, STUB_START, 0, &data);
> ret = ret || unmap(¤t->mm->context.id, STUB_END,
> host_task_size - STUB_END, 1, &data);
> +#ifdef CONFIG_X86_32
> + alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2);
> +#else
> + asm volatile("mfence":::"memory");
> +#endif
Why not mb()?
I'm not sure whether this fix is correct.
Thanks,
//richard
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [uml-devel] [PATCH] Fix for "occasional userspace process in D/Z state" bug
2014-09-26 11:56 ` Richard Weinberger
@ 2014-09-26 12:35 ` Anton Ivanov (antivano)
2014-09-28 8:15 ` Anton Ivanov
1 sibling, 0 replies; 4+ messages in thread
From: Anton Ivanov (antivano) @ 2014-09-26 12:35 UTC (permalink / raw)
To: Richard Weinberger, anton.ivanov@kot-begemot.co.uk,
user-mode-linux-devel@lists.sourceforge.net
Cc: Daniel Walter
On 26/09/14 12:56, Richard Weinberger wrote:
[snip]
> +#ifdef CONFIG_X86_32
> + alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2);
> +#else
> + asm volatile("mfence":::"memory");
> +#endif
> Why not mb()?
> I'm not sure whether this fix is correct.
Looking at the actual defines that would work. I was originally looking
at asm-generic instead of x86/um/asm/barrier.h and the asm-generic
barrier() would have been insufficent.
x86/um/asm/barrier.h has literally the same define, so as you noted -
using mb() will have the same effect.
As far as is this fix correct or not - I do not know myself. I said it
when submitting it :)
It definitely fixes the problem - both the original (and extremely
difficult to reproduce) D|Z in the stock UML and the similar (and much
easier to reproduce) D|Z which you get with all the disk subsystem
improvement patches.
I will reissue it with mb() shortly.
A.
>
> Thanks,
> //richard
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [uml-devel] [PATCH] Fix for "occasional userspace process in D/Z state" bug
2014-09-26 11:56 ` Richard Weinberger
2014-09-26 12:35 ` Anton Ivanov (antivano)
@ 2014-09-28 8:15 ` Anton Ivanov
1 sibling, 0 replies; 4+ messages in thread
From: Anton Ivanov @ 2014-09-28 8:15 UTC (permalink / raw)
To: Richard Weinberger, user-mode-linux-devel; +Cc: Daniel Walter
On 26/09/14 12:56, Richard Weinberger wrote:
> Am 26.09.2014 13:49, schrieb anton.ivanov@kot-begemot.co.uk:
>> From: Anton Ivanov <antivano@cisco.com>
>>
>> This is a fix for a very old UML bug which can be triggered with stock
>> UML. It takes a lot of effort to trigger it there because the
>> lseek()/read() | write() mechanics of the UBD driver implicitly sync the
>> memory all the time by hitting the appropriate barrier implementation in
>> the host kernel.
>>
>> By improving the disk susbsystem we make this bug raise its ugly head
>> with a vengeance - you can get a process in D (with an occasional child
>> in Z state) simply by running an apt-get on 30-40 large packages.
>>
>> Is this correct place to have the sync - no idea. It may need to move
>> to somewhere inside tlb.c. With the fence in exec.c it works (TM).
>>
>> If I understand this correctly, this also needs to be an instruction
>> appropriate for the underlying host so just a barrier() will not cut
>> it. You have to fence. En-guarde... Touche... :)
>>
>> Signed-off-by: Anton Ivanov <antivano@cisco.com>
>> ---
>> arch/um/kernel/exec.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
>> index 0d7103c..7cb6805 100644
>> --- a/arch/um/kernel/exec.c
>> +++ b/arch/um/kernel/exec.c
>> @@ -27,6 +27,11 @@ void flush_thread(void)
>> ret = unmap(¤t->mm->context.id, 0, STUB_START, 0, &data);
>> ret = ret || unmap(¤t->mm->context.id, STUB_END,
>> host_task_size - STUB_END, 1, &data);
>> +#ifdef CONFIG_X86_32
>> + alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2);
>> +#else
>> + asm volatile("mfence":::"memory");
>> +#endif
> Why not mb()?
> I'm not sure whether this fix is correct.
As I said before - neither am I.
I have tried to narrow it down and trace it - the right place is
probably somewhere further downstream in tlb.c (or even further
downstream from that).
While this place (exec.c) is probably not the best place, it improves
things quite a bit (I am not 100% sure it covers all failure cases).
There is a need to find it and fix it though. You start hitting the bug
quite a lot, the moment you start changing older calls which implicitly
synchronize memory by hitting a mb() in the host (lseek, fsync, etc) for
calls that do not do that.
I did not see that during my initial testing, because I always pin UML
to a single core for performance reasons.
IMHO long term performance improvement of UML depends on finding and
fixing the root cause for this. I will try to look at it during whatever
spare time I have in the next few months. Without a viable fix, you
cannot do something as trivial as sorting out the disk subsystem (the
pwrite/pread/async-fsync fixes are from the realm of bleeding obvious,
they should not create any races by themselves).
A
>
> Thanks,
> //richard
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-09-28 8:16 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-26 11:49 [uml-devel] [PATCH] Fix for "occasional userspace process in D/Z state" bug anton.ivanov
2014-09-26 11:56 ` Richard Weinberger
2014-09-26 12:35 ` Anton Ivanov (antivano)
2014-09-28 8:15 ` Anton Ivanov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.