* Latency problem
@ 2011-10-27 16:03 Raphaël Beamonte
[not found] ` <1319741477.6197.26.camel@marge.simson.net>
2011-10-27 21:15 ` Carsten Emde
0 siblings, 2 replies; 7+ messages in thread
From: Raphaël Beamonte @ 2011-10-27 16:03 UTC (permalink / raw)
To: linux-rt-users
Hello,
Since some weeks, I'm facing a problem with the computer I'm using.
His results to the cyclictest are very bad, and the max latency is
about 38ms. Yes. "ms".
I didn't know yet if it was only a software problem or if it was
linked to hardware, so I installed a debian distribution, with RT
kernel (3.0.0-1-rt-amd64) on an external SDD, and tried it on more
than one computer. The results with other computers were very good
(about 40us maximum), but when I tried on this computer, or on others
computers with the same hardware: the problem was here. There is a
first maximum at about 19ms, and after that about 38ms.
> # /dev/cpu_dma_latency set to 0us
> policy: other/other: loadavg: 0.51 0.18 0.09 1/499 25409
>
> T: 0 (22080) P: 0 I:1000 C: 34849 Min: 3 Act: 3 Avg: 69 Max: 37864
Yesterday, I posted a patch that allowed me to test the hardware
latency of my computer, and here are the results (still very bad...) :
> root@USBnux:/home/xaf/RT/rt-tests# ./hwlatdetect --threshold=1 --report=/home/xaf/hwlat_report
>
> hwlatdetect: test duration 120 seconds
> parameters:
> Latency threshold: 1us
>
> Sample window: 1000000us
> Sample width: 500000us
> Non-sampling period: 500000us
> Output File: /home/xaf/hwlat_report
>Starting test
>test finished
>Max Latency: 38136us
>Samples recorded: 45
>Samples exceeding threshold: 45
>sample data written to /home/xaf/hwlat_report
and the content of the hwlat_report file :
> 1319575592.0802426354 19061
> 1319575593.0806426299 2
> 1319575608.0898425572 19065
> 1319575610.0906425467 19063
> 1319575616.0930425175 2
> 1319575620.0950424982 19048
> 1319575622.0958424885 2
> 1319575627.0010424686 2
> 1319575629.0026424590 19063
> 1319575630.0030424546 19048
> 1319575633.0046424390 19063
> 1319575634.0050424344 19037
> 1319575637.0062424197 38125
> 1319575641.0078423997 19082
> 1319575642.0082423947 19062
> 1319575649.0110423608 19066
> 1319575650.0114423561 19050
> 1319575651.0118423511 19048
> 1319575655.0138423311 2
> 1319575659.0154423121 19069
> 1319575661.0162423015 19072
> 1319575662.0166422975 19049
> 1319575664.0174422870 19068
> 1319575665.0178422825 19063
> 1319575666.0182422774 19059
> 1319575668.0190422679 19055
> 1319575670.0198422577 19051
> 1319575671.0202422531 19060
> 1319575674.0222422384 2
> 1319575676.0230422288 38106
> 1319575680.0246422092 38136
> 1319575682.0254421988 19045
> 1319575683.0258421943 19045
> 1319575685.0266421842 19058
> 1319575689.0282421645 2
> 1319575695.0318421354 2
> 1319575696.0322421303 19050
> 1319575697.0326421258 19085
> 1319575699.0342421159 19049
> 1319575701.0366421057 19039
> 1319575702.0386420994 19049
> 1319575704.0390420911 19065
> 1319575706.0398420818 19053
> 1319575707.0402420761 19036
> 1319575710.0446420605 2
I really have no idea from where the problem could be. And I need your
lights to help me.
Here is the lspci of the incriminated computer :
> 00:00.0 Host bridge: Intel Corporation 5000X Chipset Memory Controller Hub (rev 31)
> 00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 2 (rev 31)
> 00:03.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 3 (rev 31)
> 00:04.0 PCI bridge: Intel Corporation 5000X Chipset PCI Express x16 Port 4-7 (rev 31)
> 00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA Engine (rev 31)
> 00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 31)
> 00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 31)
> 00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 31)
> 00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev 31)
> 00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev 31)
> 00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 31)
> 00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 31)
> 00:1b.0 Audio device: Intel Corporation 631xESB/632xESB High Definition Audio Controller (rev 09)
> 00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09)
> 00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09)
> 00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09)
> 00:1d.3 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #4 (rev 09)
> 00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09)
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
> 00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09)
> 00:1f.1 IDE interface: Intel Corporation 631xESB/632xESB IDE Controller (rev 09)
> 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller (rev 09)
> 00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller (rev 09)
> 01:00.0 VGA compatible controller: nVidia Corporation GT215 [GeForce GT 240] (rev a2)
> 01:00.1 Audio device: nVidia Corporation High Definition Audio Controller (rev a1)
> 02:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01)
> 02:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01)
> 03:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01)
> 03:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E3 (rev 01)
> 05:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
> 05:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
Thank you for your help.
Raphaël
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Latency problem
[not found] ` <1319741477.6197.26.camel@marge.simson.net>
@ 2011-10-27 19:03 ` Raphaël Beamonte
2011-10-28 4:54 ` Mike Galbraith
0 siblings, 1 reply; 7+ messages in thread
From: Raphaël Beamonte @ 2011-10-27 19:03 UTC (permalink / raw)
To: Mike Galbraith; +Cc: linux-rt-users
> You don't give any info wrt what kind of box this is, how you ran
> cyclictest, what load it was competing against, nada. With the info
> provided, nobody can say much.
>
> It _looks_ like you must have just called up cyclictest using all
> defaults, which doesn't even start cyclictest as a realtime class task,
> meaning you'll see worst case latencies for SCHED_OTHER tasks, which has
> nothing to do with realtime class scheduling latencies.
>
> Latencies in the 40us range are pretty normal for my el-cheapo Q6600
> desktop grade box trying to save power and sucking rocks at it, but ms
> latencies I never see with remotely sane loads.
>
> -Mike
>
Hello Mike,
You're right, but the problem is that I tried running cyclictest with
many different parameters to see if it was working (-S, -p95, etc..).
I also tried to disable some CPU hearts to see with just one active
heart if the problem still was here. But like I said, the test was
returning this bad results ONLY over this hardware ! I tried with the
same distribution, in the same conditions, and with the same tests on
other computers with different hardware (software wasn't different as
I used an external SDD to boot on debian), and it worked fine. I
agree... ms latencies is very weird...
The load of the system was different each time, but the results were
the same : I tried in single user mode, I tried with X server loaded,
under a X server terminal, etc. etc.
Moreover, the use of hwlatdetect seems to confirm that the problem is
from hardware.
Another test that we tried to do is to activate idle=poll at the boot
in grub, and to trace the cyclictest. In the traces, we can see a gap
of more or less 35ms.
I'm pretty sure the problem is linked to hardware. But if you have an
idea of parameters I can use with cyclictest to have better results,
please let me know !
Thank you !
Raphaël
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Latency problem
2011-10-27 16:03 Latency problem Raphaël Beamonte
[not found] ` <1319741477.6197.26.camel@marge.simson.net>
@ 2011-10-27 21:15 ` Carsten Emde
1 sibling, 0 replies; 7+ messages in thread
From: Carsten Emde @ 2011-10-27 21:15 UTC (permalink / raw)
To: Raphaël Beamonte; +Cc: linux-rt-users
Raphaël,
> [..]
>> root@USBnux:/home/xaf/RT/rt-tests# ./hwlatdetect --threshold=1 --report=/home/xaf/hwlat_report
>> hwlatdetect: test duration 120 seconds
>> parameters:
>> Latency threshold: 1us
>>
>> Sample window: 1000000us
>> Sample width: 500000us
>> Non-sampling period: 500000us
>> Output File: /home/xaf/hwlat_report
>> Starting test
>> test finished
>> Max Latency: 38136us
>> Samples recorded: 45
>> Samples exceeding threshold: 45
>> sample data written to /home/xaf/hwlat_report
>
> and the content of the hwlat_report file :
>
>> 1319575592.0802426354 19061
>> 1319575707.0402420761 19036
> [..]
> I really have no idea from where the problem could be. And I need your
> lights to help me.
The hwlatdetect application detects system management interrupts (SMIs).
Such interrupts are installed by the BIOS and are used to manage various
things such as battery management, overheat protection and emulation of
legacy devices (e.g. IDE, PS/2 etc.). You may try to check your BIOS
settings and disable all these or similar features. Then run hwlatdetect
again. If it still reports SMIs, the next thing to do depends on the
promises that you got when you bought the board:
1. It was advertised as an industrial PC to be used in a control systems
and to run a real-time operating system:
Return the board to the manufacturer along with the printed output of
hwlatdetect and ask for a fix or your money back.
2. It is a standard server PC board without any promises as to its
suitability in a real-time system:
There probably is nothing you can do. Try to send the output of
hwlatdetect to the BIOS manufacturer and nicely ask for a fixed BIOS,
but I am not very optimistic that you will get one.
Hope this helps,
-Carsten.
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Latency problem
2011-10-27 19:03 ` Raphaël Beamonte
@ 2011-10-28 4:54 ` Mike Galbraith
2011-11-01 21:24 ` Sankara Muthukrishnan
0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2011-10-28 4:54 UTC (permalink / raw)
To: Raphaël Beamonte; +Cc: linux-rt-users
On Thu, 2011-10-27 at 15:03 -0400, Raphaël Beamonte wrote:
> Moreover, the use of hwlatdetect seems to confirm that the problem is
> from hardware.
Yeah, skimmed over your message too quick, thinking nsecs instead of
usecs. Probably SMIs.
-Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Latency problem
2011-10-28 4:54 ` Mike Galbraith
@ 2011-11-01 21:24 ` Sankara Muthukrishnan
[not found] ` <CAE_Gge2hbE0Shg4NrNvYswCDHhgAEwZsSUACuM5Tsmt8CqCP=g@mail.gmail.com>
0 siblings, 1 reply; 7+ messages in thread
From: Sankara Muthukrishnan @ 2011-11-01 21:24 UTC (permalink / raw)
To: Mike Galbraith; +Cc: Raphaël Beamonte, linux-rt-users
2011/10/27 Mike Galbraith <efault@gmx.de>:
> On Thu, 2011-10-27 at 15:03 -0400, Raphaël Beamonte wrote:
>
>> Moreover, the use of hwlatdetect seems to confirm that the problem is
>> from hardware.
>
> Yeah, skimmed over your message too quick, thinking nsecs instead of
> usecs. Probably SMIs.
>
> -Mike
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
One way to find out whether the spikes in the latency are due to
SMI(System Management Interrupt) is to read the MSR register that
keeps the count of SMIs. So, read the MSR in the start of the test
program and the end of the test and see whether the count has
increased. Unfortunately, it is model-specific-register and I am not
sure whether AMD supports it. I don't think all Intel processors
support it either. I know that Nehalem based Intel Core-i7 arrandale
processor supports counting SMI thru MSR 0x34 and I have not found it
documented in any Intel documentation but found in some reference BIOS
source. I have tested it with legacy USB interrupts and found it to
work. RDMSR is a privilege-0 instruction though.
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Latency problem
[not found] ` <CAE_Gge2hbE0Shg4NrNvYswCDHhgAEwZsSUACuM5Tsmt8CqCP=g@mail.gmail.com>
@ 2011-11-01 22:12 ` Raphaël Beamonte
[not found] ` <CAFQPvXcyLpisRJ11=-5w7tSwN+A+oKYc_ExGCYSZxMov1-zHxg@mail.gmail.com>
0 siblings, 1 reply; 7+ messages in thread
From: Raphaël Beamonte @ 2011-11-01 22:12 UTC (permalink / raw)
To: linux-rt-users
Thank you very much for your answers.
> The hwlatdetect application detects system management interrupts
> (SMIs). Such interrupts are installed by the BIOS and are used to manage
> various things such as battery management, overheat protection and
> emulation of legacy devices (e.g. IDE, PS/2 etc.). You may try to check
> your BIOS settings and disable all these or similar features.
In fact, after disabling almost all features in the bios, the problem
is still here but a little bit reducted (there is just about 20ms
latency now.. I think it's linked to the fact the thermal controller
is disabled). But I didn't find any reference to SMIs in the bios,
neither in the motherboard documentation (Supermicro X7DAL). The
computer was not bought for doing realtime, so if it isn't working I
think I'll change the computer to use one which has good latency
during the tests.
> One way to find out whether the spikes in the latency are due to
> SMI(System Management Interrupt) is to read the MSR register that
> keeps the count of SMIs. So, read the MSR in the start of the test
> program and the end of the test and see whether the count has
> increased. Unfortunately, it is model-specific-register and I am not
> sure whether AMD supports it. I don't think all Intel processors
> support it either. I know that Nehalem based Intel Core-i7 arrandale
> processor supports counting SMI thru MSR 0x34 and I have not found it
> documented in any Intel documentation but found in some reference BIOS
> source. I have tested it with legacy USB interrupts and found it to
> work. RDMSR is a privilege-0 instruction though.
0x34 is not working for me, it's not returning anything (just that he
can't read)
> root@station9:/home/xaf/RT# rdmsr 0x34
> rdmsr: CPU 0 cannot read MSR 0x00000034
The CPU I use here are two Intel(R) Xeon(R) CPU E5405 @ 2.00GHz, and
I am unable to find any documentation about the results of MSR to know
what argument I must use to see the number of SMIs. I'll investigate
more on this...
Thanks one more time !
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Latency problem
[not found] ` <CAFQPvXcyLpisRJ11=-5w7tSwN+A+oKYc_ExGCYSZxMov1-zHxg@mail.gmail.com>
@ 2011-11-02 16:28 ` Sankara Muthukrishnan
0 siblings, 0 replies; 7+ messages in thread
From: Sankara Muthukrishnan @ 2011-11-02 16:28 UTC (permalink / raw)
To: Raphaël Beamonte; +Cc: linux-rt-users
2011/11/2 Sankara Muthukrishnan <sankara.m@gmail.com>:
>
>
> 2011/11/1 Raphaël Beamonte <raphael.beamonte@gmail.com>
>>
>> Thank you very much for your answers.
>>
>> > The hwlatdetect application detects system management interrupts
>> > (SMIs). Such interrupts are installed by the BIOS and are used to manage
>> > various things such as battery management, overheat protection and
>> > emulation of legacy devices (e.g. IDE, PS/2 etc.). You may try to check
>> > your BIOS settings and disable all these or similar features.
>>
>> In fact, after disabling almost all features in the bios, the problem
>> is still here but a little bit reducted (there is just about 20ms
>> latency now.. I think it's linked to the fact the thermal controller
>> is disabled). But I didn't find any reference to SMIs in the bios,
>> neither in the motherboard documentation (Supermicro X7DAL). The
>> computer was not bought for doing realtime, so if it isn't working I
>> think I'll change the computer to use one which has good latency
>> during the tests.
>>
>> > One way to find out whether the spikes in the latency are due to
>> > SMI(System Management Interrupt) is to read the MSR register that
>> > keeps the count of SMIs. So, read the MSR in the start of the test
>> > program and the end of the test and see whether the count has
>> > increased. Unfortunately, it is model-specific-register and I am not
>> > sure whether AMD supports it. I don't think all Intel processors
>> > support it either. I know that Nehalem based Intel Core-i7 arrandale
>> > processor supports counting SMI thru MSR 0x34 and I have not found it
>> > documented in any Intel documentation but found in some reference BIOS
>> > source. I have tested it with legacy USB interrupts and found it to
>> > work. RDMSR is a privilege-0 instruction though.
>>
>> 0x34 is not working for me, it's not returning anything (just that he
>> can't read)
>> > root@station9:/home/xaf/RT# rdmsr 0x34
>> > rdmsr: CPU 0 cannot read MSR 0x00000034
>> The CPU I use here are two Intel(R) Xeon(R) CPU E5405 @ 2.00GHz, and
>> I am unable to find any documentation about the results of MSR to know
>> what argument I must use to see the number of SMIs. I'll investigate
>> more on this...
>>
>> Thanks one more time !
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rt-users"
>> in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
If you are building your own kernel, you can try to disable some SMI
sources, even if BIOS does not have an option to disable that. [Note:
Most BIOSes provide more options to configure only in their
development version and not in the release version]. However, you need
to be cautious about what SMIs you are turning off and what the
consequences of turning them off are. Again, I don't know what the
chipset you are using. To give you an idea, for SMI on ICH9, search
for GLB_SMI_EN bit and SMI_LOCK in
http://www.intel.com/assets/pdf/datasheet/316972.pdf
Regarding MSR for SMI, like I said, that is specific to CPU
architectures and CPUs. I tried to find where I got that information
for arrandale processors. It is at http://biosbits.org/download/. You
may have to find out or ask the manufacturer whether SMI counter is
supported and what the MSR is. Good luck.
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-11-02 16:28 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-27 16:03 Latency problem Raphaël Beamonte
[not found] ` <1319741477.6197.26.camel@marge.simson.net>
2011-10-27 19:03 ` Raphaël Beamonte
2011-10-28 4:54 ` Mike Galbraith
2011-11-01 21:24 ` Sankara Muthukrishnan
[not found] ` <CAE_Gge2hbE0Shg4NrNvYswCDHhgAEwZsSUACuM5Tsmt8CqCP=g@mail.gmail.com>
2011-11-01 22:12 ` Raphaël Beamonte
[not found] ` <CAFQPvXcyLpisRJ11=-5w7tSwN+A+oKYc_ExGCYSZxMov1-zHxg@mail.gmail.com>
2011-11-02 16:28 ` Sankara Muthukrishnan
2011-10-27 21:15 ` Carsten Emde
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).