All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] IO-APIC latencies
@ 2012-09-17  6:30 Gilles Chanteperdrix
  2012-09-17  7:43 ` Jan Kiszka
  0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17  6:30 UTC (permalink / raw)
  To: Xenomai


Hi,

looking at x86 latencies, I found that what was taking long on my atom
was masking the fasteoi interrupts at IO-APIC level. So, I experimented
an idea: masking at LAPIC level instead of IO-APIC, by using the "task
priority" register. This seems to improve latencies on my atom:

http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png

This implies splitting the LAPIC vectors in a high priority and low
priority sets, the final implementation would use ipipe_enable_irqdesc
to detect a high priority domain, and change the vector at that time.

This also improves the latencies on my old PIII with a VIA chipset, but
it generates spurious interrupts (I do not know if it really is a
matter, as handling a spurious interrupt is still faster than masking an
IO-APIC interrupt), the spurious interrupts in that case are a
documented behaviour of the LAPIC.

Is there any interest in pursuing this idea, or are x86 with slow
IO-APIC the exception more than the rule, or having to split the vector
space appears too great a restriction?

Regards.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17  6:30 [Xenomai] IO-APIC latencies Gilles Chanteperdrix
@ 2012-09-17  7:43 ` Jan Kiszka
  2012-09-17  8:07   ` Gilles Chanteperdrix
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2012-09-17  7:43 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
> 
> Hi,
> 
> looking at x86 latencies, I found that what was taking long on my atom
> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
> priority" register. This seems to improve latencies on my atom:
> 
> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
> 
> This implies splitting the LAPIC vectors in a high priority and low
> priority sets, the final implementation would use ipipe_enable_irqdesc
> to detect a high priority domain, and change the vector at that time.
> 
> This also improves the latencies on my old PIII with a VIA chipset, but
> it generates spurious interrupts (I do not know if it really is a
> matter, as handling a spurious interrupt is still faster than masking an
> IO-APIC interrupt), the spurious interrupts in that case are a
> documented behaviour of the LAPIC.
> 
> Is there any interest in pursuing this idea, or are x86 with slow
> IO-APIC the exception more than the rule, or having to split the vector
> space appears too great a restriction?

Line-based interrupts are legacy, of decreasing relevance for PCI
devices - likely what we are primarily interesting in here - due to MSI.
So I tend to say "don't worry", specifically as fiddling with vector
allocations will require yet another round of invasive changes to the
IRQ subsystem of Linux.

Jan


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 259 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20120917/d94c6ca9/attachment.pgp>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17  7:43 ` Jan Kiszka
@ 2012-09-17  8:07   ` Gilles Chanteperdrix
  2012-09-17  8:18     ` Jan Kiszka
  0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17  8:07 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/17/2012 09:43 AM, Jan Kiszka wrote:
> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>
>> Hi,
>>
>> looking at x86 latencies, I found that what was taking long on my atom
>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>> priority" register. This seems to improve latencies on my atom:
>>
>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>
>> This implies splitting the LAPIC vectors in a high priority and low
>> priority sets, the final implementation would use ipipe_enable_irqdesc
>> to detect a high priority domain, and change the vector at that time.
>>
>> This also improves the latencies on my old PIII with a VIA chipset, but
>> it generates spurious interrupts (I do not know if it really is a
>> matter, as handling a spurious interrupt is still faster than masking an
>> IO-APIC interrupt), the spurious interrupts in that case are a
>> documented behaviour of the LAPIC.
>>
>> Is there any interest in pursuing this idea, or are x86 with slow
>> IO-APIC the exception more than the rule, or having to split the vector
>> space appears too great a restriction?
> 
> Line-based interrupts are legacy, of decreasing relevance for PCI
> devices - likely what we are primarily interesting in here - due to MSI.

Even if I enable MSI, the kernel still uses these irqs for the 
peripherals integrated to the chipset, such as the USB HCI, or ATA 
driver (IOW, non PCI devices). 

atom login: root                                                                  
# cat /proc/interrupts                                                            
           CPU0       CPU1                                                        
  0:         41          0   IO-APIC-edge      timer                              
  4:         39          0   IO-APIC-edge      serial                             
  9:          0          0   IO-APIC-fasteoi   acpi                               
 14:          0          0   IO-APIC-edge      ata_piix                           
 15:          0          0   IO-APIC-edge      ata_piix                           
 16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5                      
 18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4                      
 19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3            
 23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2       
 43:       2704          0   PCI-MSI-edge      eth0                               
 44:        249          0   PCI-MSI-edge      snd_hda_intel                      
NMI:          0          0   Non-maskable interrupts                              
LOC:        661        644   Local timer interrupts                               
SPU:          0          0   Spurious interrupts                                  
PMI:          0          0   Performance monitoring interrupts                    
IWI:          0          0   IRQ work interrupts                                  
RTR:          0          0   APIC ICR read retries                                
RES:       1582       2225   Rescheduling interrupts                              
CAL:         26         48   Function call interrupts                             
TLB:         10         19   TLB shootdowns                                       
ERR:          0                                                                   
MIS:          0                                                                   

I do not think peripherals integrated to chipsets can really be
considered "legacy". And they tend to be used in the field...

> So I tend to say "don't worry", specifically as fiddling with vector
> allocations will require yet another round of invasive changes to the
> IRQ subsystem of Linux.

The changes would be minimally invasive, we would reuse the functions
already existing (clear_irq_vector and assign_irq_vector).

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17  8:07   ` Gilles Chanteperdrix
@ 2012-09-17  8:18     ` Jan Kiszka
  2012-09-17  8:32       ` Gilles Chanteperdrix
  2012-09-17 12:21       ` Gilles Chanteperdrix
  0 siblings, 2 replies; 40+ messages in thread
From: Jan Kiszka @ 2012-09-17  8:18 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>
>>> Hi,
>>>
>>> looking at x86 latencies, I found that what was taking long on my atom
>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>> priority" register. This seems to improve latencies on my atom:
>>>
>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>
>>> This implies splitting the LAPIC vectors in a high priority and low
>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>> to detect a high priority domain, and change the vector at that time.
>>>
>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>> it generates spurious interrupts (I do not know if it really is a
>>> matter, as handling a spurious interrupt is still faster than masking an
>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>> documented behaviour of the LAPIC.
>>>
>>> Is there any interest in pursuing this idea, or are x86 with slow
>>> IO-APIC the exception more than the rule, or having to split the vector
>>> space appears too great a restriction?
>>
>> Line-based interrupts are legacy, of decreasing relevance for PCI
>> devices - likely what we are primarily interesting in here - due to MSI.
> 
> Even if I enable MSI, the kernel still uses these irqs for the 
> peripherals integrated to the chipset, such as the USB HCI, or ATA 
> driver (IOW, non PCI devices). 

Those are all PCI as well. And modern chipsets include variants of them
with MSI(-X) support.

> 
> atom login: root                                                                  
> # cat /proc/interrupts                                                            
>            CPU0       CPU1                                                        
>   0:         41          0   IO-APIC-edge      timer                              
>   4:         39          0   IO-APIC-edge      serial                             
>   9:          0          0   IO-APIC-fasteoi   acpi                               
>  14:          0          0   IO-APIC-edge      ata_piix                           
>  15:          0          0   IO-APIC-edge      ata_piix                           
>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5                      
>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4                      
>  19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3            
>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2       
>  43:       2704          0   PCI-MSI-edge      eth0                               
>  44:        249          0   PCI-MSI-edge      snd_hda_intel                      
> NMI:          0          0   Non-maskable interrupts                              
> LOC:        661        644   Local timer interrupts                               
> SPU:          0          0   Spurious interrupts                                  
> PMI:          0          0   Performance monitoring interrupts                    
> IWI:          0          0   IRQ work interrupts                                  
> RTR:          0          0   APIC ICR read retries                                
> RES:       1582       2225   Rescheduling interrupts                              
> CAL:         26         48   Function call interrupts                             
> TLB:         10         19   TLB shootdowns                                       
> ERR:          0                                                                   
> MIS:          0                                                                   
> 
> I do not think peripherals integrated to chipsets can really be
> considered "legacy". And they tend to be used in the field...

The good news is that, even on your low-end atom, you can avoid those
latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
core and the RT on the other. That's getting easier and easier due to
the inflation of cores.

> 
>> So I tend to say "don't worry", specifically as fiddling with vector
>> allocations will require yet another round of invasive changes to the
>> IRQ subsystem of Linux.
> 
> The changes would be minimally invasive, we would reuse the functions
> already existing (clear_irq_vector and assign_irq_vector).
> 

You will have to rearrange vector assignment and mask those vectors on
all CPUs, possibly complicated my affinity changes. That's worrying me
as well. But I'm also open for discussing a prototype.

Jan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 259 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20120917/8d32f70e/attachment.pgp>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17  8:18     ` Jan Kiszka
@ 2012-09-17  8:32       ` Gilles Chanteperdrix
  2012-09-17  9:07         ` Jan Kiszka
  2012-09-17 12:21       ` Gilles Chanteperdrix
  1 sibling, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17  8:32 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/17/2012 10:18 AM, Jan Kiszka wrote:
> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>
>>>> Hi,
>>>>
>>>> looking at x86 latencies, I found that what was taking long on my atom
>>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>>> priority" register. This seems to improve latencies on my atom:
>>>>
>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>
>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>>> to detect a high priority domain, and change the vector at that time.
>>>>
>>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>>> it generates spurious interrupts (I do not know if it really is a
>>>> matter, as handling a spurious interrupt is still faster than masking an
>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>> documented behaviour of the LAPIC.
>>>>
>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>> IO-APIC the exception more than the rule, or having to split the vector
>>>> space appears too great a restriction?
>>>
>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>> devices - likely what we are primarily interesting in here - due to MSI.
>>
>> Even if I enable MSI, the kernel still uses these irqs for the 
>> peripherals integrated to the chipset, such as the USB HCI, or ATA 
>> driver (IOW, non PCI devices). 
> 
> Those are all PCI as well. And modern chipsets include variants of them
> with MSI(-X) support.
> 
>>
>> atom login: root                                                                  
>> # cat /proc/interrupts                                                            
>>            CPU0       CPU1                                                        
>>   0:         41          0   IO-APIC-edge      timer                              
>>   4:         39          0   IO-APIC-edge      serial                             
>>   9:          0          0   IO-APIC-fasteoi   acpi                               
>>  14:          0          0   IO-APIC-edge      ata_piix                           
>>  15:          0          0   IO-APIC-edge      ata_piix                           
>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5                      
>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4                      
>>  19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3            
>>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2       
>>  43:       2704          0   PCI-MSI-edge      eth0                               
>>  44:        249          0   PCI-MSI-edge      snd_hda_intel                      
>> NMI:          0          0   Non-maskable interrupts                              
>> LOC:        661        644   Local timer interrupts                               
>> SPU:          0          0   Spurious interrupts                                  
>> PMI:          0          0   Performance monitoring interrupts                    
>> IWI:          0          0   IRQ work interrupts                                  
>> RTR:          0          0   APIC ICR read retries                                
>> RES:       1582       2225   Rescheduling interrupts                              
>> CAL:         26         48   Function call interrupts                             
>> TLB:         10         19   TLB shootdowns                                       
>> ERR:          0                                                                   
>> MIS:          0                                                                   
>>
>> I do not think peripherals integrated to chipsets can really be
>> considered "legacy". And they tend to be used in the field...
> 
> The good news is that, even on your low-end atom, you can avoid those
> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
> core and the RT on the other. That's getting easier and easier due to
> the inflation of cores.

What if you want to use RTUSB for instance?

> 
>>
>>> So I tend to say "don't worry", specifically as fiddling with vector
>>> allocations will require yet another round of invasive changes to the
>>> IRQ subsystem of Linux.
>>
>> The changes would be minimally invasive, we would reuse the functions
>> already existing (clear_irq_vector and assign_irq_vector).
>>
> 
> You will have to rearrange vector assignment and mask those vectors on
> all CPUs, possibly complicated my affinity changes. That's worrying me
> as well. But I'm also open for discussing a prototype.

You do not need to mask anything. The idea is that assign_irq_vector
would take an additional argument indicating whether we want a high or
low vector, the affinity change would use the current vector value to
pass the right argument to assign_irq_vector (if I am not wrong,
affinity changes already use assign_irq_vector).

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17  8:32       ` Gilles Chanteperdrix
@ 2012-09-17  9:07         ` Jan Kiszka
  2012-09-17  9:29           ` Gilles Chanteperdrix
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2012-09-17  9:07 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2012-09-17 10:32, Gilles Chanteperdrix wrote:
> On 09/17/2012 10:18 AM, Jan Kiszka wrote:
>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> looking at x86 latencies, I found that what was taking long on my atom
>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>>>> priority" register. This seems to improve latencies on my atom:
>>>>>
>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>>
>>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>>>> to detect a high priority domain, and change the vector at that time.
>>>>>
>>>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>>>> it generates spurious interrupts (I do not know if it really is a
>>>>> matter, as handling a spurious interrupt is still faster than masking an
>>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>>> documented behaviour of the LAPIC.
>>>>>
>>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>>> IO-APIC the exception more than the rule, or having to split the vector
>>>>> space appears too great a restriction?
>>>>
>>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>>> devices - likely what we are primarily interesting in here - due to MSI.
>>>
>>> Even if I enable MSI, the kernel still uses these irqs for the 
>>> peripherals integrated to the chipset, such as the USB HCI, or ATA 
>>> driver (IOW, non PCI devices). 
>>
>> Those are all PCI as well. And modern chipsets include variants of them
>> with MSI(-X) support.
>>
>>>
>>> atom login: root                                                                  
>>> # cat /proc/interrupts                                                            
>>>            CPU0       CPU1                                                        
>>>   0:         41          0   IO-APIC-edge      timer                              
>>>   4:         39          0   IO-APIC-edge      serial                             
>>>   9:          0          0   IO-APIC-fasteoi   acpi                               
>>>  14:          0          0   IO-APIC-edge      ata_piix                           
>>>  15:          0          0   IO-APIC-edge      ata_piix                           
>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5                      
>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4                      
>>>  19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3            
>>>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2       
>>>  43:       2704          0   PCI-MSI-edge      eth0                               
>>>  44:        249          0   PCI-MSI-edge      snd_hda_intel                      
>>> NMI:          0          0   Non-maskable interrupts                              
>>> LOC:        661        644   Local timer interrupts                               
>>> SPU:          0          0   Spurious interrupts                                  
>>> PMI:          0          0   Performance monitoring interrupts                    
>>> IWI:          0          0   IRQ work interrupts                                  
>>> RTR:          0          0   APIC ICR read retries                                
>>> RES:       1582       2225   Rescheduling interrupts                              
>>> CAL:         26         48   Function call interrupts                             
>>> TLB:         10         19   TLB shootdowns                                       
>>> ERR:          0                                                                   
>>> MIS:          0                                                                   
>>>
>>> I do not think peripherals integrated to chipsets can really be
>>> considered "legacy". And they tend to be used in the field...
>>
>> The good news is that, even on your low-end atom, you can avoid those
>> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
>> core and the RT on the other. That's getting easier and easier due to
>> the inflation of cores.
> 
> What if you want to use RTUSB for instance?

Then I will likely not worry about a few micros of additional latency
due to IO-APIC accesses.

> 
>>
>>>
>>>> So I tend to say "don't worry", specifically as fiddling with vector
>>>> allocations will require yet another round of invasive changes to the
>>>> IRQ subsystem of Linux.
>>>
>>> The changes would be minimally invasive, we would reuse the functions
>>> already existing (clear_irq_vector and assign_irq_vector).
>>>
>>
>> You will have to rearrange vector assignment and mask those vectors on
>> all CPUs, possibly complicated my affinity changes. That's worrying me
>> as well. But I'm also open for discussing a prototype.
> 
> You do not need to mask anything. The idea is that assign_irq_vector
> would take an additional argument indicating whether we want a high or
> low vector, the affinity change would use the current vector value to
> pass the right argument to assign_irq_vector (if I am not wrong,
> affinity changes already use assign_irq_vector).

Again, I'm open to re-asses this based on a working prototype. I just
have a bad feeling regarding it.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17  9:07         ` Jan Kiszka
@ 2012-09-17  9:29           ` Gilles Chanteperdrix
  2012-09-17  9:42             ` Jan Kiszka
  0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17  9:29 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/17/2012 11:07 AM, Jan Kiszka wrote:
> On 2012-09-17 10:32, Gilles Chanteperdrix wrote:
>> On 09/17/2012 10:18 AM, Jan Kiszka wrote:
>>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> looking at x86 latencies, I found that what was taking long on my atom
>>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>>>>> priority" register. This seems to improve latencies on my atom:
>>>>>>
>>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>>>
>>>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>>>>> to detect a high priority domain, and change the vector at that time.
>>>>>>
>>>>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>>>>> it generates spurious interrupts (I do not know if it really is a
>>>>>> matter, as handling a spurious interrupt is still faster than masking an
>>>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>>>> documented behaviour of the LAPIC.
>>>>>>
>>>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>>>> IO-APIC the exception more than the rule, or having to split the vector
>>>>>> space appears too great a restriction?
>>>>>
>>>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>>>> devices - likely what we are primarily interesting in here - due to MSI.
>>>>
>>>> Even if I enable MSI, the kernel still uses these irqs for the 
>>>> peripherals integrated to the chipset, such as the USB HCI, or ATA 
>>>> driver (IOW, non PCI devices). 
>>>
>>> Those are all PCI as well. And modern chipsets include variants of them
>>> with MSI(-X) support.
>>>
>>>>
>>>> atom login: root                                                                  
>>>> # cat /proc/interrupts                                                            
>>>>            CPU0       CPU1                                                        
>>>>   0:         41          0   IO-APIC-edge      timer                              
>>>>   4:         39          0   IO-APIC-edge      serial                             
>>>>   9:          0          0   IO-APIC-fasteoi   acpi                               
>>>>  14:          0          0   IO-APIC-edge      ata_piix                           
>>>>  15:          0          0   IO-APIC-edge      ata_piix                           
>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5                      
>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4                      
>>>>  19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3            
>>>>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2       
>>>>  43:       2704          0   PCI-MSI-edge      eth0                               
>>>>  44:        249          0   PCI-MSI-edge      snd_hda_intel                      
>>>> NMI:          0          0   Non-maskable interrupts                              
>>>> LOC:        661        644   Local timer interrupts                               
>>>> SPU:          0          0   Spurious interrupts                                  
>>>> PMI:          0          0   Performance monitoring interrupts                    
>>>> IWI:          0          0   IRQ work interrupts                                  
>>>> RTR:          0          0   APIC ICR read retries                                
>>>> RES:       1582       2225   Rescheduling interrupts                              
>>>> CAL:         26         48   Function call interrupts                             
>>>> TLB:         10         19   TLB shootdowns                                       
>>>> ERR:          0                                                                   
>>>> MIS:          0                                                                   
>>>>
>>>> I do not think peripherals integrated to chipsets can really be
>>>> considered "legacy". And they tend to be used in the field...
>>>
>>> The good news is that, even on your low-end atom, you can avoid those
>>> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
>>> core and the RT on the other. That's getting easier and easier due to
>>> the inflation of cores.
>>
>> What if you want to use RTUSB for instance?
> 
> Then I will likely not worry about a few micros of additional latency
> due to IO-APIC accesses.

On my atom, taking an IO-APIC fasteoi interrupt, acking and masking it,
takes 10us in UP, and 20us in SMP (with the tracer on).


> 
>>
>>>
>>>>
>>>>> So I tend to say "don't worry", specifically as fiddling with vector
>>>>> allocations will require yet another round of invasive changes to the
>>>>> IRQ subsystem of Linux.
>>>>
>>>> The changes would be minimally invasive, we would reuse the functions
>>>> already existing (clear_irq_vector and assign_irq_vector).
>>>>
>>>
>>> You will have to rearrange vector assignment and mask those vectors on
>>> all CPUs, possibly complicated my affinity changes. That's worrying me
>>> as well. But I'm also open for discussing a prototype.
>>
>> You do not need to mask anything. The idea is that assign_irq_vector
>> would take an additional argument indicating whether we want a high or
>> low vector, the affinity change would use the current vector value to
>> pass the right argument to assign_irq_vector (if I am not wrong,
>> affinity changes already use assign_irq_vector).
> 
> Again, I'm open to re-asses this based on a working prototype. I just
> have a bad feeling regarding it.

What I have done so far is limiting the vectors used by Linux below
0x80, and that is pretty easy, it is simply achieved by moving the
"system vectors" down. My original question is to know whether I go to
the full implementation idea or not, the prototype requires modifying
the vector assignment as would be on the final implementation, even if
the principle looks easy, it will probably take some time to get right.

So, I am going to reformulate the question:
Are there any users of Xenomai on x86 which:
- use hardware with IO-APIC-fasteoi irqs
- care for gaining between 10us and 20us in latencies.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17  9:29           ` Gilles Chanteperdrix
@ 2012-09-17  9:42             ` Jan Kiszka
  2012-09-17 10:00               ` Gilles Chanteperdrix
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2012-09-17  9:42 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2012-09-17 11:29, Gilles Chanteperdrix wrote:
> On 09/17/2012 11:07 AM, Jan Kiszka wrote:
>> On 2012-09-17 10:32, Gilles Chanteperdrix wrote:
>>> On 09/17/2012 10:18 AM, Jan Kiszka wrote:
>>>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>>>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> looking at x86 latencies, I found that what was taking long on my atom
>>>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>>>>>> priority" register. This seems to improve latencies on my atom:
>>>>>>>
>>>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>>>>
>>>>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>>>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>>>>>> to detect a high priority domain, and change the vector at that time.
>>>>>>>
>>>>>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>>>>>> it generates spurious interrupts (I do not know if it really is a
>>>>>>> matter, as handling a spurious interrupt is still faster than masking an
>>>>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>>>>> documented behaviour of the LAPIC.
>>>>>>>
>>>>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>>>>> IO-APIC the exception more than the rule, or having to split the vector
>>>>>>> space appears too great a restriction?
>>>>>>
>>>>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>>>>> devices - likely what we are primarily interesting in here - due to MSI.
>>>>>
>>>>> Even if I enable MSI, the kernel still uses these irqs for the 
>>>>> peripherals integrated to the chipset, such as the USB HCI, or ATA 
>>>>> driver (IOW, non PCI devices). 
>>>>
>>>> Those are all PCI as well. And modern chipsets include variants of them
>>>> with MSI(-X) support.
>>>>
>>>>>
>>>>> atom login: root                                                                  
>>>>> # cat /proc/interrupts                                                            
>>>>>            CPU0       CPU1                                                        
>>>>>   0:         41          0   IO-APIC-edge      timer                              
>>>>>   4:         39          0   IO-APIC-edge      serial                             
>>>>>   9:          0          0   IO-APIC-fasteoi   acpi                               
>>>>>  14:          0          0   IO-APIC-edge      ata_piix                           
>>>>>  15:          0          0   IO-APIC-edge      ata_piix                           
>>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5                      
>>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4                      
>>>>>  19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3            
>>>>>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2       
>>>>>  43:       2704          0   PCI-MSI-edge      eth0                               
>>>>>  44:        249          0   PCI-MSI-edge      snd_hda_intel                      
>>>>> NMI:          0          0   Non-maskable interrupts                              
>>>>> LOC:        661        644   Local timer interrupts                               
>>>>> SPU:          0          0   Spurious interrupts                                  
>>>>> PMI:          0          0   Performance monitoring interrupts                    
>>>>> IWI:          0          0   IRQ work interrupts                                  
>>>>> RTR:          0          0   APIC ICR read retries                                
>>>>> RES:       1582       2225   Rescheduling interrupts                              
>>>>> CAL:         26         48   Function call interrupts                             
>>>>> TLB:         10         19   TLB shootdowns                                       
>>>>> ERR:          0                                                                   
>>>>> MIS:          0                                                                   
>>>>>
>>>>> I do not think peripherals integrated to chipsets can really be
>>>>> considered "legacy". And they tend to be used in the field...
>>>>
>>>> The good news is that, even on your low-end atom, you can avoid those
>>>> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
>>>> core and the RT on the other. That's getting easier and easier due to
>>>> the inflation of cores.
>>>
>>> What if you want to use RTUSB for instance?
>>
>> Then I will likely not worry about a few micros of additional latency
>> due to IO-APIC accesses.
> 
> On my atom, taking an IO-APIC fasteoi interrupt, acking and masking it,
> takes 10us in UP, and 20us in SMP (with the tracer on).

...and on more appropriate chipsets? I bet the Atom is (once again) off
here.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17  9:42             ` Jan Kiszka
@ 2012-09-17 10:00               ` Gilles Chanteperdrix
  2012-09-17 10:39                 ` Henri Roosen
  2012-09-17 12:12                 ` Richard Cochran
  0 siblings, 2 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17 10:00 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/17/2012 11:42 AM, Jan Kiszka wrote:
> On 2012-09-17 11:29, Gilles Chanteperdrix wrote:
>> On 09/17/2012 11:07 AM, Jan Kiszka wrote:
>>> On 2012-09-17 10:32, Gilles Chanteperdrix wrote:
>>>> On 09/17/2012 10:18 AM, Jan Kiszka wrote:
>>>>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>>>>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>>>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> looking at x86 latencies, I found that what was taking long on my atom
>>>>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>>>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>>>>>>> priority" register. This seems to improve latencies on my atom:
>>>>>>>>
>>>>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>>>>>
>>>>>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>>>>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>>>>>>> to detect a high priority domain, and change the vector at that time.
>>>>>>>>
>>>>>>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>>>>>>> it generates spurious interrupts (I do not know if it really is a
>>>>>>>> matter, as handling a spurious interrupt is still faster than masking an
>>>>>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>>>>>> documented behaviour of the LAPIC.
>>>>>>>>
>>>>>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>>>>>> IO-APIC the exception more than the rule, or having to split the vector
>>>>>>>> space appears too great a restriction?
>>>>>>>
>>>>>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>>>>>> devices - likely what we are primarily interesting in here - due to MSI.
>>>>>>
>>>>>> Even if I enable MSI, the kernel still uses these irqs for the 
>>>>>> peripherals integrated to the chipset, such as the USB HCI, or ATA 
>>>>>> driver (IOW, non PCI devices). 
>>>>>
>>>>> Those are all PCI as well. And modern chipsets include variants of them
>>>>> with MSI(-X) support.
>>>>>
>>>>>>
>>>>>> atom login: root                                                                  
>>>>>> # cat /proc/interrupts                                                            
>>>>>>            CPU0       CPU1                                                        
>>>>>>   0:         41          0   IO-APIC-edge      timer                              
>>>>>>   4:         39          0   IO-APIC-edge      serial                             
>>>>>>   9:          0          0   IO-APIC-fasteoi   acpi                               
>>>>>>  14:          0          0   IO-APIC-edge      ata_piix                           
>>>>>>  15:          0          0   IO-APIC-edge      ata_piix                           
>>>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5                      
>>>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4                      
>>>>>>  19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3            
>>>>>>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2       
>>>>>>  43:       2704          0   PCI-MSI-edge      eth0                               
>>>>>>  44:        249          0   PCI-MSI-edge      snd_hda_intel                      
>>>>>> NMI:          0          0   Non-maskable interrupts                              
>>>>>> LOC:        661        644   Local timer interrupts                               
>>>>>> SPU:          0          0   Spurious interrupts                                  
>>>>>> PMI:          0          0   Performance monitoring interrupts                    
>>>>>> IWI:          0          0   IRQ work interrupts                                  
>>>>>> RTR:          0          0   APIC ICR read retries                                
>>>>>> RES:       1582       2225   Rescheduling interrupts                              
>>>>>> CAL:         26         48   Function call interrupts                             
>>>>>> TLB:         10         19   TLB shootdowns                                       
>>>>>> ERR:          0                                                                   
>>>>>> MIS:          0                                                                   
>>>>>>
>>>>>> I do not think peripherals integrated to chipsets can really be
>>>>>> considered "legacy". And they tend to be used in the field...
>>>>>
>>>>> The good news is that, even on your low-end atom, you can avoid those
>>>>> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
>>>>> core and the RT on the other. That's getting easier and easier due to
>>>>> the inflation of cores.
>>>>
>>>> What if you want to use RTUSB for instance?
>>>
>>> Then I will likely not worry about a few micros of additional latency
>>> due to IO-APIC accesses.
>>
>> On my atom, taking an IO-APIC fasteoi interrupt, acking and masking it,
>> takes 10us in UP, and 20us in SMP (with the tracer on).
> 
> ...and on more appropriate chipsets? I bet the Atom is (once again) off
> here.

I do not know, do you care for sharing your traces with us? I only run
Xenomai on atom (which I am not sure do not qualify as "modern", new
atoms seem to be produced), geode (ok, this one is definitely dead, but
there seem to be people still running xenomai on them), and an old
pentium III with an old VIA686 chipset, where masking the IO-APIC is
even slower than acking the i8259.

Anyway, the IO-APIC registers accesses does not look designed for speed:
it has an indirect scheme that seem more designed to save space in the
processor mapping and to be configured once and for all when
enabling/disabling interrupt, not at each and every interrupt.

The point is: people may want to use Xenomai on atoms. We do not really
know on what kind of x86 people run xenomai, knowing that would help us
directing our efforts.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 10:00               ` Gilles Chanteperdrix
@ 2012-09-17 10:39                 ` Henri Roosen
  2012-09-17 11:14                   ` Gilles Chanteperdrix
  2012-09-17 12:12                 ` Richard Cochran
  1 sibling, 1 reply; 40+ messages in thread
From: Henri Roosen @ 2012-09-17 10:39 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, Xenomai

On Mon, Sep 17, 2012 at 12:00 PM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On 09/17/2012 11:42 AM, Jan Kiszka wrote:
>> On 2012-09-17 11:29, Gilles Chanteperdrix wrote:
>>> On 09/17/2012 11:07 AM, Jan Kiszka wrote:
>>>> On 2012-09-17 10:32, Gilles Chanteperdrix wrote:
>>>>> On 09/17/2012 10:18 AM, Jan Kiszka wrote:
>>>>>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>>>>>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>>>>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> looking at x86 latencies, I found that what was taking long on my atom
>>>>>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>>>>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>>>>>>>> priority" register. This seems to improve latencies on my atom:
>>>>>>>>>
>>>>>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>>>>>>
>>>>>>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>>>>>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>>>>>>>> to detect a high priority domain, and change the vector at that time.
>>>>>>>>>
>>>>>>>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>>>>>>>> it generates spurious interrupts (I do not know if it really is a
>>>>>>>>> matter, as handling a spurious interrupt is still faster than masking an
>>>>>>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>>>>>>> documented behaviour of the LAPIC.
>>>>>>>>>
>>>>>>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>>>>>>> IO-APIC the exception more than the rule, or having to split the vector
>>>>>>>>> space appears too great a restriction?
>>>>>>>>
>>>>>>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>>>>>>> devices - likely what we are primarily interesting in here - due to MSI.
>>>>>>>
>>>>>>> Even if I enable MSI, the kernel still uses these irqs for the
>>>>>>> peripherals integrated to the chipset, such as the USB HCI, or ATA
>>>>>>> driver (IOW, non PCI devices).
>>>>>>
>>>>>> Those are all PCI as well. And modern chipsets include variants of them
>>>>>> with MSI(-X) support.
>>>>>>
>>>>>>>
>>>>>>> atom login: root
>>>>>>> # cat /proc/interrupts
>>>>>>>            CPU0       CPU1
>>>>>>>   0:         41          0   IO-APIC-edge      timer
>>>>>>>   4:         39          0   IO-APIC-edge      serial
>>>>>>>   9:          0          0   IO-APIC-fasteoi   acpi
>>>>>>>  14:          0          0   IO-APIC-edge      ata_piix
>>>>>>>  15:          0          0   IO-APIC-edge      ata_piix
>>>>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>>>>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>>>>>>  19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>>>>>>>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
>>>>>>>  43:       2704          0   PCI-MSI-edge      eth0
>>>>>>>  44:        249          0   PCI-MSI-edge      snd_hda_intel
>>>>>>> NMI:          0          0   Non-maskable interrupts
>>>>>>> LOC:        661        644   Local timer interrupts
>>>>>>> SPU:          0          0   Spurious interrupts
>>>>>>> PMI:          0          0   Performance monitoring interrupts
>>>>>>> IWI:          0          0   IRQ work interrupts
>>>>>>> RTR:          0          0   APIC ICR read retries
>>>>>>> RES:       1582       2225   Rescheduling interrupts
>>>>>>> CAL:         26         48   Function call interrupts
>>>>>>> TLB:         10         19   TLB shootdowns
>>>>>>> ERR:          0
>>>>>>> MIS:          0
>>>>>>>
>>>>>>> I do not think peripherals integrated to chipsets can really be
>>>>>>> considered "legacy". And they tend to be used in the field...
>>>>>>
>>>>>> The good news is that, even on your low-end atom, you can avoid those
>>>>>> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
>>>>>> core and the RT on the other. That's getting easier and easier due to
>>>>>> the inflation of cores.
>>>>>
>>>>> What if you want to use RTUSB for instance?
>>>>
>>>> Then I will likely not worry about a few micros of additional latency
>>>> due to IO-APIC accesses.
>>>
>>> On my atom, taking an IO-APIC fasteoi interrupt, acking and masking it,
>>> takes 10us in UP, and 20us in SMP (with the tracer on).
>>
>> ...and on more appropriate chipsets? I bet the Atom is (once again) off
>> here.
>
> I do not know, do you care for sharing your traces with us? I only run
> Xenomai on atom (which I am not sure do not qualify as "modern", new
> atoms seem to be produced), geode (ok, this one is definitely dead, but
> there seem to be people still running xenomai on them), and an old
> pentium III with an old VIA686 chipset, where masking the IO-APIC is
> even slower than acking the i8259.
>
> Anyway, the IO-APIC registers accesses does not look designed for speed:
> it has an indirect scheme that seem more designed to save space in the
> processor mapping and to be configured once and for all when
> enabling/disabling interrupt, not at each and every interrupt.
>
> The point is: people may want to use Xenomai on atoms. We do not really
> know on what kind of x86 people run xenomai, knowing that would help us
> directing our efforts.

We are currently investigating whether we can use Atom's for our
future products. We have to stick to the x86 architecture and our
products should work without big cooling fans. Currently running tests
on Atom D2700 (which I know is EOL, but for research purposes should
give us a good indication).

A 20us latency gain is a lot and would be very welcome in our system!

>
> --
>                                             Gilles.
>
> _______________________________________________
> Xenomai mailing list
> Xenomai@xenomai.org
> http://www.xenomai.org/mailman/listinfo/xenomai


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 10:39                 ` Henri Roosen
@ 2012-09-17 11:14                   ` Gilles Chanteperdrix
  2012-09-17 12:15                     ` Henri Roosen
  0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17 11:14 UTC (permalink / raw)
  To: Henri Roosen; +Cc: Jan Kiszka, Xenomai

On 09/17/2012 12:39 PM, Henri Roosen wrote:

> On Mon, Sep 17, 2012 at 12:00 PM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
>> On 09/17/2012 11:42 AM, Jan Kiszka wrote:
>>> On 2012-09-17 11:29, Gilles Chanteperdrix wrote:
>>>> On 09/17/2012 11:07 AM, Jan Kiszka wrote:
>>>>> On 2012-09-17 10:32, Gilles Chanteperdrix wrote:
>>>>>> On 09/17/2012 10:18 AM, Jan Kiszka wrote:
>>>>>>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>>>>>>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>>>>>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> looking at x86 latencies, I found that what was taking long on my atom
>>>>>>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>>>>>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>>>>>>>>> priority" register. This seems to improve latencies on my atom:
>>>>>>>>>>
>>>>>>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>>>>>>>
>>>>>>>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>>>>>>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>>>>>>>>> to detect a high priority domain, and change the vector at that time.
>>>>>>>>>>
>>>>>>>>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>>>>>>>>> it generates spurious interrupts (I do not know if it really is a
>>>>>>>>>> matter, as handling a spurious interrupt is still faster than masking an
>>>>>>>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>>>>>>>> documented behaviour of the LAPIC.
>>>>>>>>>>
>>>>>>>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>>>>>>>> IO-APIC the exception more than the rule, or having to split the vector
>>>>>>>>>> space appears too great a restriction?
>>>>>>>>>
>>>>>>>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>>>>>>>> devices - likely what we are primarily interesting in here - due to MSI.
>>>>>>>>
>>>>>>>> Even if I enable MSI, the kernel still uses these irqs for the
>>>>>>>> peripherals integrated to the chipset, such as the USB HCI, or ATA
>>>>>>>> driver (IOW, non PCI devices).
>>>>>>>
>>>>>>> Those are all PCI as well. And modern chipsets include variants of them
>>>>>>> with MSI(-X) support.
>>>>>>>
>>>>>>>>
>>>>>>>> atom login: root
>>>>>>>> # cat /proc/interrupts
>>>>>>>>            CPU0       CPU1
>>>>>>>>   0:         41          0   IO-APIC-edge      timer
>>>>>>>>   4:         39          0   IO-APIC-edge      serial
>>>>>>>>   9:          0          0   IO-APIC-fasteoi   acpi
>>>>>>>>  14:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>  15:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>>>>>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>>>>>>>  19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>>>>>>>>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
>>>>>>>>  43:       2704          0   PCI-MSI-edge      eth0
>>>>>>>>  44:        249          0   PCI-MSI-edge      snd_hda_intel
>>>>>>>> NMI:          0          0   Non-maskable interrupts
>>>>>>>> LOC:        661        644   Local timer interrupts
>>>>>>>> SPU:          0          0   Spurious interrupts
>>>>>>>> PMI:          0          0   Performance monitoring interrupts
>>>>>>>> IWI:          0          0   IRQ work interrupts
>>>>>>>> RTR:          0          0   APIC ICR read retries
>>>>>>>> RES:       1582       2225   Rescheduling interrupts
>>>>>>>> CAL:         26         48   Function call interrupts
>>>>>>>> TLB:         10         19   TLB shootdowns
>>>>>>>> ERR:          0
>>>>>>>> MIS:          0
>>>>>>>>
>>>>>>>> I do not think peripherals integrated to chipsets can really be
>>>>>>>> considered "legacy". And they tend to be used in the field...
>>>>>>>
>>>>>>> The good news is that, even on your low-end atom, you can avoid those
>>>>>>> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
>>>>>>> core and the RT on the other. That's getting easier and easier due to
>>>>>>> the inflation of cores.
>>>>>>
>>>>>> What if you want to use RTUSB for instance?
>>>>>
>>>>> Then I will likely not worry about a few micros of additional latency
>>>>> due to IO-APIC accesses.
>>>>
>>>> On my atom, taking an IO-APIC fasteoi interrupt, acking and masking it,
>>>> takes 10us in UP, and 20us in SMP (with the tracer on).
>>>
>>> ...and on more appropriate chipsets? I bet the Atom is (once again) off
>>> here.
>>
>> I do not know, do you care for sharing your traces with us? I only run
>> Xenomai on atom (which I am not sure do not qualify as "modern", new
>> atoms seem to be produced), geode (ok, this one is definitely dead, but
>> there seem to be people still running xenomai on them), and an old
>> pentium III with an old VIA686 chipset, where masking the IO-APIC is
>> even slower than acking the i8259.
>>
>> Anyway, the IO-APIC registers accesses does not look designed for speed:
>> it has an indirect scheme that seem more designed to save space in the
>> processor mapping and to be configured once and for all when
>> enabling/disabling interrupt, not at each and every interrupt.
>>
>> The point is: people may want to use Xenomai on atoms. We do not really
>> know on what kind of x86 people run xenomai, knowing that would help us
>> directing our efforts.
> 
> We are currently investigating whether we can use Atom's for our
> future products. We have to stick to the x86 architecture and our
> products should work without big cooling fans. Currently running tests
> on Atom D2700 (which I know is EOL, but for research purposes should
> give us a good indication).
> 
> A 20us latency gain is a lot and would be very welcome in our system!


If you enable CONFIG_MSI, do you still see some IO-APIC-fasteoi in
/proc/interrupts?

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 10:00               ` Gilles Chanteperdrix
  2012-09-17 10:39                 ` Henri Roosen
@ 2012-09-17 12:12                 ` Richard Cochran
  1 sibling, 0 replies; 40+ messages in thread
From: Richard Cochran @ 2012-09-17 12:12 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, Xenomai

On Mon, Sep 17, 2012 at 12:00:01PM +0200, Gilles Chanteperdrix wrote:
> 
> The point is: people may want to use Xenomai on atoms. We do not really
> know on what kind of x86 people run xenomai, knowing that would help us
> directing our efforts.

FWIW, I was once involved in a project where we looked at atom and
considered running xenomai on it. I would think that improving worst
case latency on atom by 10 or 20 usec would be well worth the effort.

Thanks,
Richard



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 11:14                   ` Gilles Chanteperdrix
@ 2012-09-17 12:15                     ` Henri Roosen
  2012-09-17 12:27                       ` Jan Kiszka
  0 siblings, 1 reply; 40+ messages in thread
From: Henri Roosen @ 2012-09-17 12:15 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, Xenomai

On Mon, Sep 17, 2012 at 1:14 PM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On 09/17/2012 12:39 PM, Henri Roosen wrote:
>
>> On Mon, Sep 17, 2012 at 12:00 PM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>>> On 09/17/2012 11:42 AM, Jan Kiszka wrote:
>>>> On 2012-09-17 11:29, Gilles Chanteperdrix wrote:
>>>>> On 09/17/2012 11:07 AM, Jan Kiszka wrote:
>>>>>> On 2012-09-17 10:32, Gilles Chanteperdrix wrote:
>>>>>>> On 09/17/2012 10:18 AM, Jan Kiszka wrote:
>>>>>>>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>>>>>>>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>>>>>>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> looking at x86 latencies, I found that what was taking long on my atom
>>>>>>>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>>>>>>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>>>>>>>>>> priority" register. This seems to improve latencies on my atom:
>>>>>>>>>>>
>>>>>>>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>>>>>>>>
>>>>>>>>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>>>>>>>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>>>>>>>>>> to detect a high priority domain, and change the vector at that time.
>>>>>>>>>>>
>>>>>>>>>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>>>>>>>>>> it generates spurious interrupts (I do not know if it really is a
>>>>>>>>>>> matter, as handling a spurious interrupt is still faster than masking an
>>>>>>>>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>>>>>>>>> documented behaviour of the LAPIC.
>>>>>>>>>>>
>>>>>>>>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>>>>>>>>> IO-APIC the exception more than the rule, or having to split the vector
>>>>>>>>>>> space appears too great a restriction?
>>>>>>>>>>
>>>>>>>>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>>>>>>>>> devices - likely what we are primarily interesting in here - due to MSI.
>>>>>>>>>
>>>>>>>>> Even if I enable MSI, the kernel still uses these irqs for the
>>>>>>>>> peripherals integrated to the chipset, such as the USB HCI, or ATA
>>>>>>>>> driver (IOW, non PCI devices).
>>>>>>>>
>>>>>>>> Those are all PCI as well. And modern chipsets include variants of them
>>>>>>>> with MSI(-X) support.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> atom login: root
>>>>>>>>> # cat /proc/interrupts
>>>>>>>>>            CPU0       CPU1
>>>>>>>>>   0:         41          0   IO-APIC-edge      timer
>>>>>>>>>   4:         39          0   IO-APIC-edge      serial
>>>>>>>>>   9:          0          0   IO-APIC-fasteoi   acpi
>>>>>>>>>  14:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>  15:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>>>>>>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>>>>>>>>  19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>>>>>>>>>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
>>>>>>>>>  43:       2704          0   PCI-MSI-edge      eth0
>>>>>>>>>  44:        249          0   PCI-MSI-edge      snd_hda_intel
>>>>>>>>> NMI:          0          0   Non-maskable interrupts
>>>>>>>>> LOC:        661        644   Local timer interrupts
>>>>>>>>> SPU:          0          0   Spurious interrupts
>>>>>>>>> PMI:          0          0   Performance monitoring interrupts
>>>>>>>>> IWI:          0          0   IRQ work interrupts
>>>>>>>>> RTR:          0          0   APIC ICR read retries
>>>>>>>>> RES:       1582       2225   Rescheduling interrupts
>>>>>>>>> CAL:         26         48   Function call interrupts
>>>>>>>>> TLB:         10         19   TLB shootdowns
>>>>>>>>> ERR:          0
>>>>>>>>> MIS:          0
>>>>>>>>>
>>>>>>>>> I do not think peripherals integrated to chipsets can really be
>>>>>>>>> considered "legacy". And they tend to be used in the field...
>>>>>>>>
>>>>>>>> The good news is that, even on your low-end atom, you can avoid those
>>>>>>>> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
>>>>>>>> core and the RT on the other. That's getting easier and easier due to
>>>>>>>> the inflation of cores.
>>>>>>>
>>>>>>> What if you want to use RTUSB for instance?
>>>>>>
>>>>>> Then I will likely not worry about a few micros of additional latency
>>>>>> due to IO-APIC accesses.
>>>>>
>>>>> On my atom, taking an IO-APIC fasteoi interrupt, acking and masking it,
>>>>> takes 10us in UP, and 20us in SMP (with the tracer on).
>>>>
>>>> ...and on more appropriate chipsets? I bet the Atom is (once again) off
>>>> here.
>>>
>>> I do not know, do you care for sharing your traces with us? I only run
>>> Xenomai on atom (which I am not sure do not qualify as "modern", new
>>> atoms seem to be produced), geode (ok, this one is definitely dead, but
>>> there seem to be people still running xenomai on them), and an old
>>> pentium III with an old VIA686 chipset, where masking the IO-APIC is
>>> even slower than acking the i8259.
>>>
>>> Anyway, the IO-APIC registers accesses does not look designed for speed:
>>> it has an indirect scheme that seem more designed to save space in the
>>> processor mapping and to be configured once and for all when
>>> enabling/disabling interrupt, not at each and every interrupt.
>>>
>>> The point is: people may want to use Xenomai on atoms. We do not really
>>> know on what kind of x86 people run xenomai, knowing that would help us
>>> directing our efforts.
>>
>> We are currently investigating whether we can use Atom's for our
>> future products. We have to stick to the x86 architecture and our
>> products should work without big cooling fans. Currently running tests
>> on Atom D2700 (which I know is EOL, but for research purposes should
>> give us a good indication).
>>
>> A 20us latency gain is a lot and would be very welcome in our system!
>
>
> If you enable CONFIG_MSI, do you still see some IO-APIC-fasteoi in
> /proc/interrupts?
>

The kernel config has no CONFIG_MSI, but instead:
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y

There is still IO-APIC-fasteoi in /proc/interrupts:

# cat /proc/interrupts
           CPU0       CPU1
  0:        250          0   IO-APIC-edge      timer
  4:         71          0   IO-APIC-edge      serial
  7:         29          0   IO-APIC-edge
  8:          0          0   IO-APIC-edge      rtc0
  9:          0          0   IO-APIC-fasteoi   acpi
 16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
 18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
 19:         41          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
 23:       5440          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
 40:        940          0   PCI-MSI-edge      eth0
 41:         21          0   PCI-MSI-edge      xhci_hcd
 42:          0          0   PCI-MSI-edge      xhci_hcd
 43:          0          0   PCI-MSI-edge      xhci_hcd
NMI:          0          0   Non-maskable interrupts
LOC:      29559      25129   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
IWI:          0          0   IRQ work interrupts
RTR:          0          0   APIC ICR read retries
RES:         20          0   Rescheduling interrupts
CAL:          0          8   Function call interrupts
TLB:          9          5   TLB shootdowns
ERR:         74
MIS:          0

> --
>                                                                 Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17  8:18     ` Jan Kiszka
  2012-09-17  8:32       ` Gilles Chanteperdrix
@ 2012-09-17 12:21       ` Gilles Chanteperdrix
  2012-09-17 12:27         ` Jan Kiszka
  1 sibling, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17 12:21 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/17/2012 10:18 AM, Jan Kiszka wrote:
> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>
>>>> Hi,
>>>>
>>>> looking at x86 latencies, I found that what was taking long on my atom
>>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>>> priority" register. This seems to improve latencies on my atom:
>>>>
>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>
>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>>> to detect a high priority domain, and change the vector at that time.
>>>>
>>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>>> it generates spurious interrupts (I do not know if it really is a
>>>> matter, as handling a spurious interrupt is still faster than masking an
>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>> documented behaviour of the LAPIC.
>>>>
>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>> IO-APIC the exception more than the rule, or having to split the vector
>>>> space appears too great a restriction?
>>>
>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>> devices - likely what we are primarily interesting in here - due to MSI.
>>
>> Even if I enable MSI, the kernel still uses these irqs for the 
>> peripherals integrated to the chipset, such as the USB HCI, or ATA 
>> driver (IOW, non PCI devices). 
> 
> Those are all PCI as well. And modern chipsets include variants of them
> with MSI(-X) support.

Here is what I get on my workstation:
$ grep fasteoi /proc/interrupts 
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 19:          0          0          0          0   IO-APIC-fasteoi   ahci
 23:  280150305          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb5, ehci_hcd:usb6

It is an sandy bridge processor, it has less than 2 years. 
What exactly do you mean by "modern" ?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 12:21       ` Gilles Chanteperdrix
@ 2012-09-17 12:27         ` Jan Kiszka
  0 siblings, 0 replies; 40+ messages in thread
From: Jan Kiszka @ 2012-09-17 12:27 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2012-09-17 14:21, Gilles Chanteperdrix wrote:
> On 09/17/2012 10:18 AM, Jan Kiszka wrote:
>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> looking at x86 latencies, I found that what was taking long on my atom
>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>>>> priority" register. This seems to improve latencies on my atom:
>>>>>
>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>>
>>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>>>> to detect a high priority domain, and change the vector at that time.
>>>>>
>>>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>>>> it generates spurious interrupts (I do not know if it really is a
>>>>> matter, as handling a spurious interrupt is still faster than masking an
>>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>>> documented behaviour of the LAPIC.
>>>>>
>>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>>> IO-APIC the exception more than the rule, or having to split the vector
>>>>> space appears too great a restriction?
>>>>
>>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>>> devices - likely what we are primarily interesting in here - due to MSI.
>>>
>>> Even if I enable MSI, the kernel still uses these irqs for the 
>>> peripherals integrated to the chipset, such as the USB HCI, or ATA 
>>> driver (IOW, non PCI devices). 
>>
>> Those are all PCI as well. And modern chipsets include variants of them
>> with MSI(-X) support.
> 
> Here is what I get on my workstation:
> $ grep fasteoi /proc/interrupts 
>   9:          0          0          0          0   IO-APIC-fasteoi   acpi
>  19:          0          0          0          0   IO-APIC-fasteoi   ahci
>  23:  280150305          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb5, ehci_hcd:usb6
> 
> It is an sandy bridge processor, it has less than 2 years. 
> What exactly do you mean by "modern" ?

Mine is more than 2 years old and has an MSI-capable AHCI e.g.

Also interesting is the IO-APIC access delay you see on this platform.
We'll try to measure here as well.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 12:15                     ` Henri Roosen
@ 2012-09-17 12:27                       ` Jan Kiszka
  2012-09-17 13:46                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2012-09-17 12:27 UTC (permalink / raw)
  To: Henri Roosen; +Cc: Xenomai

On 2012-09-17 14:15, Henri Roosen wrote:
> On Mon, Sep 17, 2012 at 1:14 PM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
>> On 09/17/2012 12:39 PM, Henri Roosen wrote:
>>
>>> On Mon, Sep 17, 2012 at 12:00 PM, Gilles Chanteperdrix
>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>> On 09/17/2012 11:42 AM, Jan Kiszka wrote:
>>>>> On 2012-09-17 11:29, Gilles Chanteperdrix wrote:
>>>>>> On 09/17/2012 11:07 AM, Jan Kiszka wrote:
>>>>>>> On 2012-09-17 10:32, Gilles Chanteperdrix wrote:
>>>>>>>> On 09/17/2012 10:18 AM, Jan Kiszka wrote:
>>>>>>>>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>>>>>>>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> looking at x86 latencies, I found that what was taking long on my atom
>>>>>>>>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>>>>>>>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>>>>>>>>>>> priority" register. This seems to improve latencies on my atom:
>>>>>>>>>>>>
>>>>>>>>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>>>>>>>>>
>>>>>>>>>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>>>>>>>>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>>>>>>>>>>> to detect a high priority domain, and change the vector at that time.
>>>>>>>>>>>>
>>>>>>>>>>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>>>>>>>>>>> it generates spurious interrupts (I do not know if it really is a
>>>>>>>>>>>> matter, as handling a spurious interrupt is still faster than masking an
>>>>>>>>>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>>>>>>>>>> documented behaviour of the LAPIC.
>>>>>>>>>>>>
>>>>>>>>>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>>>>>>>>>> IO-APIC the exception more than the rule, or having to split the vector
>>>>>>>>>>>> space appears too great a restriction?
>>>>>>>>>>>
>>>>>>>>>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>>>>>>>>>> devices - likely what we are primarily interesting in here - due to MSI.
>>>>>>>>>>
>>>>>>>>>> Even if I enable MSI, the kernel still uses these irqs for the
>>>>>>>>>> peripherals integrated to the chipset, such as the USB HCI, or ATA
>>>>>>>>>> driver (IOW, non PCI devices).
>>>>>>>>>
>>>>>>>>> Those are all PCI as well. And modern chipsets include variants of them
>>>>>>>>> with MSI(-X) support.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> atom login: root
>>>>>>>>>> # cat /proc/interrupts
>>>>>>>>>>            CPU0       CPU1
>>>>>>>>>>   0:         41          0   IO-APIC-edge      timer
>>>>>>>>>>   4:         39          0   IO-APIC-edge      serial
>>>>>>>>>>   9:          0          0   IO-APIC-fasteoi   acpi
>>>>>>>>>>  14:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>>  15:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>>>>>>>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>>>>>>>>>  19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>>>>>>>>>>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
>>>>>>>>>>  43:       2704          0   PCI-MSI-edge      eth0
>>>>>>>>>>  44:        249          0   PCI-MSI-edge      snd_hda_intel
>>>>>>>>>> NMI:          0          0   Non-maskable interrupts
>>>>>>>>>> LOC:        661        644   Local timer interrupts
>>>>>>>>>> SPU:          0          0   Spurious interrupts
>>>>>>>>>> PMI:          0          0   Performance monitoring interrupts
>>>>>>>>>> IWI:          0          0   IRQ work interrupts
>>>>>>>>>> RTR:          0          0   APIC ICR read retries
>>>>>>>>>> RES:       1582       2225   Rescheduling interrupts
>>>>>>>>>> CAL:         26         48   Function call interrupts
>>>>>>>>>> TLB:         10         19   TLB shootdowns
>>>>>>>>>> ERR:          0
>>>>>>>>>> MIS:          0
>>>>>>>>>>
>>>>>>>>>> I do not think peripherals integrated to chipsets can really be
>>>>>>>>>> considered "legacy". And they tend to be used in the field...
>>>>>>>>>
>>>>>>>>> The good news is that, even on your low-end atom, you can avoid those
>>>>>>>>> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
>>>>>>>>> core and the RT on the other. That's getting easier and easier due to
>>>>>>>>> the inflation of cores.
>>>>>>>>
>>>>>>>> What if you want to use RTUSB for instance?
>>>>>>>
>>>>>>> Then I will likely not worry about a few micros of additional latency
>>>>>>> due to IO-APIC accesses.
>>>>>>
>>>>>> On my atom, taking an IO-APIC fasteoi interrupt, acking and masking it,
>>>>>> takes 10us in UP, and 20us in SMP (with the tracer on).
>>>>>
>>>>> ...and on more appropriate chipsets? I bet the Atom is (once again) off
>>>>> here.
>>>>
>>>> I do not know, do you care for sharing your traces with us? I only run
>>>> Xenomai on atom (which I am not sure do not qualify as "modern", new
>>>> atoms seem to be produced), geode (ok, this one is definitely dead, but
>>>> there seem to be people still running xenomai on them), and an old
>>>> pentium III with an old VIA686 chipset, where masking the IO-APIC is
>>>> even slower than acking the i8259.
>>>>
>>>> Anyway, the IO-APIC registers accesses does not look designed for speed:
>>>> it has an indirect scheme that seem more designed to save space in the
>>>> processor mapping and to be configured once and for all when
>>>> enabling/disabling interrupt, not at each and every interrupt.
>>>>
>>>> The point is: people may want to use Xenomai on atoms. We do not really
>>>> know on what kind of x86 people run xenomai, knowing that would help us
>>>> directing our efforts.
>>>
>>> We are currently investigating whether we can use Atom's for our
>>> future products. We have to stick to the x86 architecture and our
>>> products should work without big cooling fans. Currently running tests
>>> on Atom D2700 (which I know is EOL, but for research purposes should
>>> give us a good indication).
>>>
>>> A 20us latency gain is a lot and would be very welcome in our system!
>>
>>
>> If you enable CONFIG_MSI, do you still see some IO-APIC-fasteoi in
>> /proc/interrupts?
>>
> 
> The kernel config has no CONFIG_MSI, but instead:
> CONFIG_ARCH_SUPPORTS_MSI=y
> CONFIG_PCI_MSI=y
> 
> There is still IO-APIC-fasteoi in /proc/interrupts:
> 
> # cat /proc/interrupts
>            CPU0       CPU1
>   0:        250          0   IO-APIC-edge      timer
>   4:         71          0   IO-APIC-edge      serial
>   7:         29          0   IO-APIC-edge
>   8:          0          0   IO-APIC-edge      rtc0
>   9:          0          0   IO-APIC-fasteoi   acpi
>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>  19:         41          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>  23:       5440          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
>  40:        940          0   PCI-MSI-edge      eth0
>  41:         21          0   PCI-MSI-edge      xhci_hcd
>  42:          0          0   PCI-MSI-edge      xhci_hcd
>  43:          0          0   PCI-MSI-edge      xhci_hcd
> NMI:          0          0   Non-maskable interrupts
> LOC:      29559      25129   Local timer interrupts
> SPU:          0          0   Spurious interrupts
> PMI:          0          0   Performance monitoring interrupts
> IWI:          0          0   IRQ work interrupts
> RTR:          0          0   APIC ICR read retries
> RES:         20          0   Rescheduling interrupts
> CAL:          0          8   Function call interrupts
> TLB:          9          5   TLB shootdowns
> ERR:         74
> MIS:          0

Unless you are short on CPU resources: isolcpus=1. At least bind all
Linux IRQs to one CPU. That's independent of any potential low-level
optimizations.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 12:27                       ` Jan Kiszka
@ 2012-09-17 13:46                         ` Gilles Chanteperdrix
  2012-09-17 13:54                           ` Jan Kiszka
  0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17 13:46 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/17/2012 02:27 PM, Jan Kiszka wrote:
> On 2012-09-17 14:15, Henri Roosen wrote:
>> On Mon, Sep 17, 2012 at 1:14 PM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>>> On 09/17/2012 12:39 PM, Henri Roosen wrote:
>>>
>>>> On Mon, Sep 17, 2012 at 12:00 PM, Gilles Chanteperdrix
>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>> On 09/17/2012 11:42 AM, Jan Kiszka wrote:
>>>>>> On 2012-09-17 11:29, Gilles Chanteperdrix wrote:
>>>>>>> On 09/17/2012 11:07 AM, Jan Kiszka wrote:
>>>>>>>> On 2012-09-17 10:32, Gilles Chanteperdrix wrote:
>>>>>>>>> On 09/17/2012 10:18 AM, Jan Kiszka wrote:
>>>>>>>>>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>>>>>>>>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> looking at x86 latencies, I found that what was taking long on my atom
>>>>>>>>>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>>>>>>>>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>>>>>>>>>>>> priority" register. This seems to improve latencies on my atom:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>>>>>>>>>>
>>>>>>>>>>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>>>>>>>>>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>>>>>>>>>>>> to detect a high priority domain, and change the vector at that time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>>>>>>>>>>>> it generates spurious interrupts (I do not know if it really is a
>>>>>>>>>>>>> matter, as handling a spurious interrupt is still faster than masking an
>>>>>>>>>>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>>>>>>>>>>> documented behaviour of the LAPIC.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>>>>>>>>>>> IO-APIC the exception more than the rule, or having to split the vector
>>>>>>>>>>>>> space appears too great a restriction?
>>>>>>>>>>>>
>>>>>>>>>>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>>>>>>>>>>> devices - likely what we are primarily interesting in here - due to MSI.
>>>>>>>>>>>
>>>>>>>>>>> Even if I enable MSI, the kernel still uses these irqs for the
>>>>>>>>>>> peripherals integrated to the chipset, such as the USB HCI, or ATA
>>>>>>>>>>> driver (IOW, non PCI devices).
>>>>>>>>>>
>>>>>>>>>> Those are all PCI as well. And modern chipsets include variants of them
>>>>>>>>>> with MSI(-X) support.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> atom login: root
>>>>>>>>>>> # cat /proc/interrupts
>>>>>>>>>>>            CPU0       CPU1
>>>>>>>>>>>   0:         41          0   IO-APIC-edge      timer
>>>>>>>>>>>   4:         39          0   IO-APIC-edge      serial
>>>>>>>>>>>   9:          0          0   IO-APIC-fasteoi   acpi
>>>>>>>>>>>  14:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>>>  15:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>>>>>>>>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>>>>>>>>>>  19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>>>>>>>>>>>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
>>>>>>>>>>>  43:       2704          0   PCI-MSI-edge      eth0
>>>>>>>>>>>  44:        249          0   PCI-MSI-edge      snd_hda_intel
>>>>>>>>>>> NMI:          0          0   Non-maskable interrupts
>>>>>>>>>>> LOC:        661        644   Local timer interrupts
>>>>>>>>>>> SPU:          0          0   Spurious interrupts
>>>>>>>>>>> PMI:          0          0   Performance monitoring interrupts
>>>>>>>>>>> IWI:          0          0   IRQ work interrupts
>>>>>>>>>>> RTR:          0          0   APIC ICR read retries
>>>>>>>>>>> RES:       1582       2225   Rescheduling interrupts
>>>>>>>>>>> CAL:         26         48   Function call interrupts
>>>>>>>>>>> TLB:         10         19   TLB shootdowns
>>>>>>>>>>> ERR:          0
>>>>>>>>>>> MIS:          0
>>>>>>>>>>>
>>>>>>>>>>> I do not think peripherals integrated to chipsets can really be
>>>>>>>>>>> considered "legacy". And they tend to be used in the field...
>>>>>>>>>>
>>>>>>>>>> The good news is that, even on your low-end atom, you can avoid those
>>>>>>>>>> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
>>>>>>>>>> core and the RT on the other. That's getting easier and easier due to
>>>>>>>>>> the inflation of cores.
>>>>>>>>>
>>>>>>>>> What if you want to use RTUSB for instance?
>>>>>>>>
>>>>>>>> Then I will likely not worry about a few micros of additional latency
>>>>>>>> due to IO-APIC accesses.
>>>>>>>
>>>>>>> On my atom, taking an IO-APIC fasteoi interrupt, acking and masking it,
>>>>>>> takes 10us in UP, and 20us in SMP (with the tracer on).
>>>>>>
>>>>>> ...and on more appropriate chipsets? I bet the Atom is (once again) off
>>>>>> here.
>>>>>
>>>>> I do not know, do you care for sharing your traces with us? I only run
>>>>> Xenomai on atom (which I am not sure do not qualify as "modern", new
>>>>> atoms seem to be produced), geode (ok, this one is definitely dead, but
>>>>> there seem to be people still running xenomai on them), and an old
>>>>> pentium III with an old VIA686 chipset, where masking the IO-APIC is
>>>>> even slower than acking the i8259.
>>>>>
>>>>> Anyway, the IO-APIC registers accesses does not look designed for speed:
>>>>> it has an indirect scheme that seem more designed to save space in the
>>>>> processor mapping and to be configured once and for all when
>>>>> enabling/disabling interrupt, not at each and every interrupt.
>>>>>
>>>>> The point is: people may want to use Xenomai on atoms. We do not really
>>>>> know on what kind of x86 people run xenomai, knowing that would help us
>>>>> directing our efforts.
>>>>
>>>> We are currently investigating whether we can use Atom's for our
>>>> future products. We have to stick to the x86 architecture and our
>>>> products should work without big cooling fans. Currently running tests
>>>> on Atom D2700 (which I know is EOL, but for research purposes should
>>>> give us a good indication).
>>>>
>>>> A 20us latency gain is a lot and would be very welcome in our system!
>>>
>>>
>>> If you enable CONFIG_MSI, do you still see some IO-APIC-fasteoi in
>>> /proc/interrupts?
>>>
>>
>> The kernel config has no CONFIG_MSI, but instead:
>> CONFIG_ARCH_SUPPORTS_MSI=y
>> CONFIG_PCI_MSI=y
>>
>> There is still IO-APIC-fasteoi in /proc/interrupts:
>>
>> # cat /proc/interrupts
>>            CPU0       CPU1
>>   0:        250          0   IO-APIC-edge      timer
>>   4:         71          0   IO-APIC-edge      serial
>>   7:         29          0   IO-APIC-edge
>>   8:          0          0   IO-APIC-edge      rtc0
>>   9:          0          0   IO-APIC-fasteoi   acpi
>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>  19:         41          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>>  23:       5440          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
>>  40:        940          0   PCI-MSI-edge      eth0
>>  41:         21          0   PCI-MSI-edge      xhci_hcd
>>  42:          0          0   PCI-MSI-edge      xhci_hcd
>>  43:          0          0   PCI-MSI-edge      xhci_hcd
>> NMI:          0          0   Non-maskable interrupts
>> LOC:      29559      25129   Local timer interrupts
>> SPU:          0          0   Spurious interrupts
>> PMI:          0          0   Performance monitoring interrupts
>> IWI:          0          0   IRQ work interrupts
>> RTR:          0          0   APIC ICR read retries
>> RES:         20          0   Rescheduling interrupts
>> CAL:          0          8   Function call interrupts
>> TLB:          9          5   TLB shootdowns
>> ERR:         74
>> MIS:          0
> 
> Unless you are short on CPU resources: isolcpus=1. At least bind all
> Linux IRQs to one CPU. That's independent of any potential low-level
> optimizations.

The advantage of the masking at LAPIC using elevated priority I propose
is that for most APICs, the IO-APIC will forward the interrupts to the
cpus not currently running with elevated priority (that is what the
dest_LowestPrio constant means). Dynamically.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 13:46                         ` Gilles Chanteperdrix
@ 2012-09-17 13:54                           ` Jan Kiszka
  2012-09-17 14:02                             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2012-09-17 13:54 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2012-09-17 15:46, Gilles Chanteperdrix wrote:
> On 09/17/2012 02:27 PM, Jan Kiszka wrote:
>> On 2012-09-17 14:15, Henri Roosen wrote:
>>> On Mon, Sep 17, 2012 at 1:14 PM, Gilles Chanteperdrix
>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>> On 09/17/2012 12:39 PM, Henri Roosen wrote:
>>>>
>>>>> On Mon, Sep 17, 2012 at 12:00 PM, Gilles Chanteperdrix
>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>> On 09/17/2012 11:42 AM, Jan Kiszka wrote:
>>>>>>> On 2012-09-17 11:29, Gilles Chanteperdrix wrote:
>>>>>>>> On 09/17/2012 11:07 AM, Jan Kiszka wrote:
>>>>>>>>> On 2012-09-17 10:32, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 09/17/2012 10:18 AM, Jan Kiszka wrote:
>>>>>>>>>>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>>>>>>>>>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> looking at x86 latencies, I found that what was taking long on my atom
>>>>>>>>>>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>>>>>>>>>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>>>>>>>>>>>>> priority" register. This seems to improve latencies on my atom:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>>>>>>>>>>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>>>>>>>>>>>>> to detect a high priority domain, and change the vector at that time.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>>>>>>>>>>>>> it generates spurious interrupts (I do not know if it really is a
>>>>>>>>>>>>>> matter, as handling a spurious interrupt is still faster than masking an
>>>>>>>>>>>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>>>>>>>>>>>> documented behaviour of the LAPIC.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>>>>>>>>>>>> IO-APIC the exception more than the rule, or having to split the vector
>>>>>>>>>>>>>> space appears too great a restriction?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>>>>>>>>>>>> devices - likely what we are primarily interesting in here - due to MSI.
>>>>>>>>>>>>
>>>>>>>>>>>> Even if I enable MSI, the kernel still uses these irqs for the
>>>>>>>>>>>> peripherals integrated to the chipset, such as the USB HCI, or ATA
>>>>>>>>>>>> driver (IOW, non PCI devices).
>>>>>>>>>>>
>>>>>>>>>>> Those are all PCI as well. And modern chipsets include variants of them
>>>>>>>>>>> with MSI(-X) support.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> atom login: root
>>>>>>>>>>>> # cat /proc/interrupts
>>>>>>>>>>>>            CPU0       CPU1
>>>>>>>>>>>>   0:         41          0   IO-APIC-edge      timer
>>>>>>>>>>>>   4:         39          0   IO-APIC-edge      serial
>>>>>>>>>>>>   9:          0          0   IO-APIC-fasteoi   acpi
>>>>>>>>>>>>  14:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>>>>  15:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>>>>>>>>>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>>>>>>>>>>>  19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>>>>>>>>>>>>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
>>>>>>>>>>>>  43:       2704          0   PCI-MSI-edge      eth0
>>>>>>>>>>>>  44:        249          0   PCI-MSI-edge      snd_hda_intel
>>>>>>>>>>>> NMI:          0          0   Non-maskable interrupts
>>>>>>>>>>>> LOC:        661        644   Local timer interrupts
>>>>>>>>>>>> SPU:          0          0   Spurious interrupts
>>>>>>>>>>>> PMI:          0          0   Performance monitoring interrupts
>>>>>>>>>>>> IWI:          0          0   IRQ work interrupts
>>>>>>>>>>>> RTR:          0          0   APIC ICR read retries
>>>>>>>>>>>> RES:       1582       2225   Rescheduling interrupts
>>>>>>>>>>>> CAL:         26         48   Function call interrupts
>>>>>>>>>>>> TLB:         10         19   TLB shootdowns
>>>>>>>>>>>> ERR:          0
>>>>>>>>>>>> MIS:          0
>>>>>>>>>>>>
>>>>>>>>>>>> I do not think peripherals integrated to chipsets can really be
>>>>>>>>>>>> considered "legacy". And they tend to be used in the field...
>>>>>>>>>>>
>>>>>>>>>>> The good news is that, even on your low-end atom, you can avoid those
>>>>>>>>>>> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
>>>>>>>>>>> core and the RT on the other. That's getting easier and easier due to
>>>>>>>>>>> the inflation of cores.
>>>>>>>>>>
>>>>>>>>>> What if you want to use RTUSB for instance?
>>>>>>>>>
>>>>>>>>> Then I will likely not worry about a few micros of additional latency
>>>>>>>>> due to IO-APIC accesses.
>>>>>>>>
>>>>>>>> On my atom, taking an IO-APIC fasteoi interrupt, acking and masking it,
>>>>>>>> takes 10us in UP, and 20us in SMP (with the tracer on).
>>>>>>>
>>>>>>> ...and on more appropriate chipsets? I bet the Atom is (once again) off
>>>>>>> here.
>>>>>>
>>>>>> I do not know, do you care for sharing your traces with us? I only run
>>>>>> Xenomai on atom (which I am not sure do not qualify as "modern", new
>>>>>> atoms seem to be produced), geode (ok, this one is definitely dead, but
>>>>>> there seem to be people still running xenomai on them), and an old
>>>>>> pentium III with an old VIA686 chipset, where masking the IO-APIC is
>>>>>> even slower than acking the i8259.
>>>>>>
>>>>>> Anyway, the IO-APIC registers accesses does not look designed for speed:
>>>>>> it has an indirect scheme that seem more designed to save space in the
>>>>>> processor mapping and to be configured once and for all when
>>>>>> enabling/disabling interrupt, not at each and every interrupt.
>>>>>>
>>>>>> The point is: people may want to use Xenomai on atoms. We do not really
>>>>>> know on what kind of x86 people run xenomai, knowing that would help us
>>>>>> directing our efforts.
>>>>>
>>>>> We are currently investigating whether we can use Atom's for our
>>>>> future products. We have to stick to the x86 architecture and our
>>>>> products should work without big cooling fans. Currently running tests
>>>>> on Atom D2700 (which I know is EOL, but for research purposes should
>>>>> give us a good indication).
>>>>>
>>>>> A 20us latency gain is a lot and would be very welcome in our system!
>>>>
>>>>
>>>> If you enable CONFIG_MSI, do you still see some IO-APIC-fasteoi in
>>>> /proc/interrupts?
>>>>
>>>
>>> The kernel config has no CONFIG_MSI, but instead:
>>> CONFIG_ARCH_SUPPORTS_MSI=y
>>> CONFIG_PCI_MSI=y
>>>
>>> There is still IO-APIC-fasteoi in /proc/interrupts:
>>>
>>> # cat /proc/interrupts
>>>            CPU0       CPU1
>>>   0:        250          0   IO-APIC-edge      timer
>>>   4:         71          0   IO-APIC-edge      serial
>>>   7:         29          0   IO-APIC-edge
>>>   8:          0          0   IO-APIC-edge      rtc0
>>>   9:          0          0   IO-APIC-fasteoi   acpi
>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>>  19:         41          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>>>  23:       5440          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
>>>  40:        940          0   PCI-MSI-edge      eth0
>>>  41:         21          0   PCI-MSI-edge      xhci_hcd
>>>  42:          0          0   PCI-MSI-edge      xhci_hcd
>>>  43:          0          0   PCI-MSI-edge      xhci_hcd
>>> NMI:          0          0   Non-maskable interrupts
>>> LOC:      29559      25129   Local timer interrupts
>>> SPU:          0          0   Spurious interrupts
>>> PMI:          0          0   Performance monitoring interrupts
>>> IWI:          0          0   IRQ work interrupts
>>> RTR:          0          0   APIC ICR read retries
>>> RES:         20          0   Rescheduling interrupts
>>> CAL:          0          8   Function call interrupts
>>> TLB:          9          5   TLB shootdowns
>>> ERR:         74
>>> MIS:          0
>>
>> Unless you are short on CPU resources: isolcpus=1. At least bind all
>> Linux IRQs to one CPU. That's independent of any potential low-level
>> optimizations.
> 
> The advantage of the masking at LAPIC using elevated priority I propose
> is that for most APICs, the IO-APIC will forward the interrupts to the
> cpus not currently running with elevated priority (that is what the
> dest_LowestPrio constant means). Dynamically.

And the advantage of isolcpus is that it avoids any kind of disturbances
due to dynamics, thus provides the best latency.

Also, I'm not sure how what efforts will be required to handle cases
Linux or some RT driver decides to keep an IRQ masked for a longer
period. That would block everything below that level and is surely not
what we want.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 13:54                           ` Jan Kiszka
@ 2012-09-17 14:02                             ` Gilles Chanteperdrix
  2012-09-17 14:35                               ` Jan Kiszka
  0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17 14:02 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/17/2012 03:54 PM, Jan Kiszka wrote:
> On 2012-09-17 15:46, Gilles Chanteperdrix wrote:
>> On 09/17/2012 02:27 PM, Jan Kiszka wrote:
>>> On 2012-09-17 14:15, Henri Roosen wrote:
>>>> On Mon, Sep 17, 2012 at 1:14 PM, Gilles Chanteperdrix
>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>> On 09/17/2012 12:39 PM, Henri Roosen wrote:
>>>>>
>>>>>> On Mon, Sep 17, 2012 at 12:00 PM, Gilles Chanteperdrix
>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>> On 09/17/2012 11:42 AM, Jan Kiszka wrote:
>>>>>>>> On 2012-09-17 11:29, Gilles Chanteperdrix wrote:
>>>>>>>>> On 09/17/2012 11:07 AM, Jan Kiszka wrote:
>>>>>>>>>> On 2012-09-17 10:32, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 09/17/2012 10:18 AM, Jan Kiszka wrote:
>>>>>>>>>>>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> looking at x86 latencies, I found that what was taking long on my atom
>>>>>>>>>>>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>>>>>>>>>>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>>>>>>>>>>>>>> priority" register. This seems to improve latencies on my atom:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>>>>>>>>>>>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>>>>>>>>>>>>>> to detect a high priority domain, and change the vector at that time.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>>>>>>>>>>>>>> it generates spurious interrupts (I do not know if it really is a
>>>>>>>>>>>>>>> matter, as handling a spurious interrupt is still faster than masking an
>>>>>>>>>>>>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>>>>>>>>>>>>> documented behaviour of the LAPIC.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>>>>>>>>>>>>> IO-APIC the exception more than the rule, or having to split the vector
>>>>>>>>>>>>>>> space appears too great a restriction?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>>>>>>>>>>>>> devices - likely what we are primarily interesting in here - due to MSI.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Even if I enable MSI, the kernel still uses these irqs for the
>>>>>>>>>>>>> peripherals integrated to the chipset, such as the USB HCI, or ATA
>>>>>>>>>>>>> driver (IOW, non PCI devices).
>>>>>>>>>>>>
>>>>>>>>>>>> Those are all PCI as well. And modern chipsets include variants of them
>>>>>>>>>>>> with MSI(-X) support.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> atom login: root
>>>>>>>>>>>>> # cat /proc/interrupts
>>>>>>>>>>>>>            CPU0       CPU1
>>>>>>>>>>>>>   0:         41          0   IO-APIC-edge      timer
>>>>>>>>>>>>>   4:         39          0   IO-APIC-edge      serial
>>>>>>>>>>>>>   9:          0          0   IO-APIC-fasteoi   acpi
>>>>>>>>>>>>>  14:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>>>>>  15:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>>>>>>>>>>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>>>>>>>>>>>>  19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>>>>>>>>>>>>>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
>>>>>>>>>>>>>  43:       2704          0   PCI-MSI-edge      eth0
>>>>>>>>>>>>>  44:        249          0   PCI-MSI-edge      snd_hda_intel
>>>>>>>>>>>>> NMI:          0          0   Non-maskable interrupts
>>>>>>>>>>>>> LOC:        661        644   Local timer interrupts
>>>>>>>>>>>>> SPU:          0          0   Spurious interrupts
>>>>>>>>>>>>> PMI:          0          0   Performance monitoring interrupts
>>>>>>>>>>>>> IWI:          0          0   IRQ work interrupts
>>>>>>>>>>>>> RTR:          0          0   APIC ICR read retries
>>>>>>>>>>>>> RES:       1582       2225   Rescheduling interrupts
>>>>>>>>>>>>> CAL:         26         48   Function call interrupts
>>>>>>>>>>>>> TLB:         10         19   TLB shootdowns
>>>>>>>>>>>>> ERR:          0
>>>>>>>>>>>>> MIS:          0
>>>>>>>>>>>>>
>>>>>>>>>>>>> I do not think peripherals integrated to chipsets can really be
>>>>>>>>>>>>> considered "legacy". And they tend to be used in the field...
>>>>>>>>>>>>
>>>>>>>>>>>> The good news is that, even on your low-end atom, you can avoid those
>>>>>>>>>>>> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
>>>>>>>>>>>> core and the RT on the other. That's getting easier and easier due to
>>>>>>>>>>>> the inflation of cores.
>>>>>>>>>>>
>>>>>>>>>>> What if you want to use RTUSB for instance?
>>>>>>>>>>
>>>>>>>>>> Then I will likely not worry about a few micros of additional latency
>>>>>>>>>> due to IO-APIC accesses.
>>>>>>>>>
>>>>>>>>> On my atom, taking an IO-APIC fasteoi interrupt, acking and masking it,
>>>>>>>>> takes 10us in UP, and 20us in SMP (with the tracer on).
>>>>>>>>
>>>>>>>> ...and on more appropriate chipsets? I bet the Atom is (once again) off
>>>>>>>> here.
>>>>>>>
>>>>>>> I do not know, do you care for sharing your traces with us? I only run
>>>>>>> Xenomai on atom (which I am not sure do not qualify as "modern", new
>>>>>>> atoms seem to be produced), geode (ok, this one is definitely dead, but
>>>>>>> there seem to be people still running xenomai on them), and an old
>>>>>>> pentium III with an old VIA686 chipset, where masking the IO-APIC is
>>>>>>> even slower than acking the i8259.
>>>>>>>
>>>>>>> Anyway, the IO-APIC registers accesses does not look designed for speed:
>>>>>>> it has an indirect scheme that seem more designed to save space in the
>>>>>>> processor mapping and to be configured once and for all when
>>>>>>> enabling/disabling interrupt, not at each and every interrupt.
>>>>>>>
>>>>>>> The point is: people may want to use Xenomai on atoms. We do not really
>>>>>>> know on what kind of x86 people run xenomai, knowing that would help us
>>>>>>> directing our efforts.
>>>>>>
>>>>>> We are currently investigating whether we can use Atom's for our
>>>>>> future products. We have to stick to the x86 architecture and our
>>>>>> products should work without big cooling fans. Currently running tests
>>>>>> on Atom D2700 (which I know is EOL, but for research purposes should
>>>>>> give us a good indication).
>>>>>>
>>>>>> A 20us latency gain is a lot and would be very welcome in our system!
>>>>>
>>>>>
>>>>> If you enable CONFIG_MSI, do you still see some IO-APIC-fasteoi in
>>>>> /proc/interrupts?
>>>>>
>>>>
>>>> The kernel config has no CONFIG_MSI, but instead:
>>>> CONFIG_ARCH_SUPPORTS_MSI=y
>>>> CONFIG_PCI_MSI=y
>>>>
>>>> There is still IO-APIC-fasteoi in /proc/interrupts:
>>>>
>>>> # cat /proc/interrupts
>>>>            CPU0       CPU1
>>>>   0:        250          0   IO-APIC-edge      timer
>>>>   4:         71          0   IO-APIC-edge      serial
>>>>   7:         29          0   IO-APIC-edge
>>>>   8:          0          0   IO-APIC-edge      rtc0
>>>>   9:          0          0   IO-APIC-fasteoi   acpi
>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>>>  19:         41          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>>>>  23:       5440          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
>>>>  40:        940          0   PCI-MSI-edge      eth0
>>>>  41:         21          0   PCI-MSI-edge      xhci_hcd
>>>>  42:          0          0   PCI-MSI-edge      xhci_hcd
>>>>  43:          0          0   PCI-MSI-edge      xhci_hcd
>>>> NMI:          0          0   Non-maskable interrupts
>>>> LOC:      29559      25129   Local timer interrupts
>>>> SPU:          0          0   Spurious interrupts
>>>> PMI:          0          0   Performance monitoring interrupts
>>>> IWI:          0          0   IRQ work interrupts
>>>> RTR:          0          0   APIC ICR read retries
>>>> RES:         20          0   Rescheduling interrupts
>>>> CAL:          0          8   Function call interrupts
>>>> TLB:          9          5   TLB shootdowns
>>>> ERR:         74
>>>> MIS:          0
>>>
>>> Unless you are short on CPU resources: isolcpus=1. At least bind all
>>> Linux IRQs to one CPU. That's independent of any potential low-level
>>> optimizations.
>>
>> The advantage of the masking at LAPIC using elevated priority I propose
>> is that for most APICs, the IO-APIC will forward the interrupts to the
>> cpus not currently running with elevated priority (that is what the
>> dest_LowestPrio constant means). Dynamically.
> 
> And the advantage of isolcpus is that it avoids any kind of disturbances
> due to dynamics, thus provides the best latency.

And the worst scalability. But I agree on machines where the cache is
not shared between cores (which I believe is not the case of atom), the
fact to not send the irq to the same core every time is detrimental to
no real-time performances.

> 
> Also, I'm not sure how what efforts will be required to handle cases
> Linux or some RT driver decides to keep an IRQ masked for a longer
> period. That would block everything below that level and is surely not
> what we want.

It is only the masking done by the I-pipe (aka
desc->ipipe_ack/desc->ipipe_end) which use this method. The real masking
is still done by masking at IO-APIC level.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 14:02                             ` Gilles Chanteperdrix
@ 2012-09-17 14:35                               ` Jan Kiszka
  2012-09-17 17:46                                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2012-09-17 14:35 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2012-09-17 16:02, Gilles Chanteperdrix wrote:
> On 09/17/2012 03:54 PM, Jan Kiszka wrote:
>> On 2012-09-17 15:46, Gilles Chanteperdrix wrote:
>>> On 09/17/2012 02:27 PM, Jan Kiszka wrote:
>>>> On 2012-09-17 14:15, Henri Roosen wrote:
>>>>> On Mon, Sep 17, 2012 at 1:14 PM, Gilles Chanteperdrix
>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>> On 09/17/2012 12:39 PM, Henri Roosen wrote:
>>>>>>
>>>>>>> On Mon, Sep 17, 2012 at 12:00 PM, Gilles Chanteperdrix
>>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>> On 09/17/2012 11:42 AM, Jan Kiszka wrote:
>>>>>>>>> On 2012-09-17 11:29, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 09/17/2012 11:07 AM, Jan Kiszka wrote:
>>>>>>>>>>> On 2012-09-17 10:32, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> On 09/17/2012 10:18 AM, Jan Kiszka wrote:
>>>>>>>>>>>>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> looking at x86 latencies, I found that what was taking long on my atom
>>>>>>>>>>>>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>>>>>>>>>>>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>>>>>>>>>>>>>>> priority" register. This seems to improve latencies on my atom:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>>>>>>>>>>>>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>>>>>>>>>>>>>>> to detect a high priority domain, and change the vector at that time.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>>>>>>>>>>>>>>> it generates spurious interrupts (I do not know if it really is a
>>>>>>>>>>>>>>>> matter, as handling a spurious interrupt is still faster than masking an
>>>>>>>>>>>>>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>>>>>>>>>>>>>> documented behaviour of the LAPIC.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>>>>>>>>>>>>>> IO-APIC the exception more than the rule, or having to split the vector
>>>>>>>>>>>>>>>> space appears too great a restriction?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>>>>>>>>>>>>>> devices - likely what we are primarily interesting in here - due to MSI.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Even if I enable MSI, the kernel still uses these irqs for the
>>>>>>>>>>>>>> peripherals integrated to the chipset, such as the USB HCI, or ATA
>>>>>>>>>>>>>> driver (IOW, non PCI devices).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Those are all PCI as well. And modern chipsets include variants of them
>>>>>>>>>>>>> with MSI(-X) support.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> atom login: root
>>>>>>>>>>>>>> # cat /proc/interrupts
>>>>>>>>>>>>>>            CPU0       CPU1
>>>>>>>>>>>>>>   0:         41          0   IO-APIC-edge      timer
>>>>>>>>>>>>>>   4:         39          0   IO-APIC-edge      serial
>>>>>>>>>>>>>>   9:          0          0   IO-APIC-fasteoi   acpi
>>>>>>>>>>>>>>  14:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>>>>>>  15:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>>>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>>>>>>>>>>>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>>>>>>>>>>>>>  19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>>>>>>>>>>>>>>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
>>>>>>>>>>>>>>  43:       2704          0   PCI-MSI-edge      eth0
>>>>>>>>>>>>>>  44:        249          0   PCI-MSI-edge      snd_hda_intel
>>>>>>>>>>>>>> NMI:          0          0   Non-maskable interrupts
>>>>>>>>>>>>>> LOC:        661        644   Local timer interrupts
>>>>>>>>>>>>>> SPU:          0          0   Spurious interrupts
>>>>>>>>>>>>>> PMI:          0          0   Performance monitoring interrupts
>>>>>>>>>>>>>> IWI:          0          0   IRQ work interrupts
>>>>>>>>>>>>>> RTR:          0          0   APIC ICR read retries
>>>>>>>>>>>>>> RES:       1582       2225   Rescheduling interrupts
>>>>>>>>>>>>>> CAL:         26         48   Function call interrupts
>>>>>>>>>>>>>> TLB:         10         19   TLB shootdowns
>>>>>>>>>>>>>> ERR:          0
>>>>>>>>>>>>>> MIS:          0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I do not think peripherals integrated to chipsets can really be
>>>>>>>>>>>>>> considered "legacy". And they tend to be used in the field...
>>>>>>>>>>>>>
>>>>>>>>>>>>> The good news is that, even on your low-end atom, you can avoid those
>>>>>>>>>>>>> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
>>>>>>>>>>>>> core and the RT on the other. That's getting easier and easier due to
>>>>>>>>>>>>> the inflation of cores.
>>>>>>>>>>>>
>>>>>>>>>>>> What if you want to use RTUSB for instance?
>>>>>>>>>>>
>>>>>>>>>>> Then I will likely not worry about a few micros of additional latency
>>>>>>>>>>> due to IO-APIC accesses.
>>>>>>>>>>
>>>>>>>>>> On my atom, taking an IO-APIC fasteoi interrupt, acking and masking it,
>>>>>>>>>> takes 10us in UP, and 20us in SMP (with the tracer on).
>>>>>>>>>
>>>>>>>>> ...and on more appropriate chipsets? I bet the Atom is (once again) off
>>>>>>>>> here.
>>>>>>>>
>>>>>>>> I do not know, do you care for sharing your traces with us? I only run
>>>>>>>> Xenomai on atom (which I am not sure do not qualify as "modern", new
>>>>>>>> atoms seem to be produced), geode (ok, this one is definitely dead, but
>>>>>>>> there seem to be people still running xenomai on them), and an old
>>>>>>>> pentium III with an old VIA686 chipset, where masking the IO-APIC is
>>>>>>>> even slower than acking the i8259.
>>>>>>>>
>>>>>>>> Anyway, the IO-APIC registers accesses does not look designed for speed:
>>>>>>>> it has an indirect scheme that seem more designed to save space in the
>>>>>>>> processor mapping and to be configured once and for all when
>>>>>>>> enabling/disabling interrupt, not at each and every interrupt.
>>>>>>>>
>>>>>>>> The point is: people may want to use Xenomai on atoms. We do not really
>>>>>>>> know on what kind of x86 people run xenomai, knowing that would help us
>>>>>>>> directing our efforts.
>>>>>>>
>>>>>>> We are currently investigating whether we can use Atom's for our
>>>>>>> future products. We have to stick to the x86 architecture and our
>>>>>>> products should work without big cooling fans. Currently running tests
>>>>>>> on Atom D2700 (which I know is EOL, but for research purposes should
>>>>>>> give us a good indication).
>>>>>>>
>>>>>>> A 20us latency gain is a lot and would be very welcome in our system!
>>>>>>
>>>>>>
>>>>>> If you enable CONFIG_MSI, do you still see some IO-APIC-fasteoi in
>>>>>> /proc/interrupts?
>>>>>>
>>>>>
>>>>> The kernel config has no CONFIG_MSI, but instead:
>>>>> CONFIG_ARCH_SUPPORTS_MSI=y
>>>>> CONFIG_PCI_MSI=y
>>>>>
>>>>> There is still IO-APIC-fasteoi in /proc/interrupts:
>>>>>
>>>>> # cat /proc/interrupts
>>>>>            CPU0       CPU1
>>>>>   0:        250          0   IO-APIC-edge      timer
>>>>>   4:         71          0   IO-APIC-edge      serial
>>>>>   7:         29          0   IO-APIC-edge
>>>>>   8:          0          0   IO-APIC-edge      rtc0
>>>>>   9:          0          0   IO-APIC-fasteoi   acpi
>>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>>>>  19:         41          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>>>>>  23:       5440          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
>>>>>  40:        940          0   PCI-MSI-edge      eth0
>>>>>  41:         21          0   PCI-MSI-edge      xhci_hcd
>>>>>  42:          0          0   PCI-MSI-edge      xhci_hcd
>>>>>  43:          0          0   PCI-MSI-edge      xhci_hcd
>>>>> NMI:          0          0   Non-maskable interrupts
>>>>> LOC:      29559      25129   Local timer interrupts
>>>>> SPU:          0          0   Spurious interrupts
>>>>> PMI:          0          0   Performance monitoring interrupts
>>>>> IWI:          0          0   IRQ work interrupts
>>>>> RTR:          0          0   APIC ICR read retries
>>>>> RES:         20          0   Rescheduling interrupts
>>>>> CAL:          0          8   Function call interrupts
>>>>> TLB:          9          5   TLB shootdowns
>>>>> ERR:         74
>>>>> MIS:          0
>>>>
>>>> Unless you are short on CPU resources: isolcpus=1. At least bind all
>>>> Linux IRQs to one CPU. That's independent of any potential low-level
>>>> optimizations.
>>>
>>> The advantage of the masking at LAPIC using elevated priority I propose
>>> is that for most APICs, the IO-APIC will forward the interrupts to the
>>> cpus not currently running with elevated priority (that is what the
>>> dest_LowestPrio constant means). Dynamically.
>>
>> And the advantage of isolcpus is that it avoids any kind of disturbances
>> due to dynamics, thus provides the best latency.
> 
> And the worst scalability.

There is no free lunch.

> But I agree on machines where the cache is
> not shared between cores (which I believe is not the case of atom), the
> fact to not send the irq to the same core every time is detrimental to
> no real-time performances.
> 
>>
>> Also, I'm not sure how what efforts will be required to handle cases
>> Linux or some RT driver decides to keep an IRQ masked for a longer
>> period. That would block everything below that level and is surely not
>> what we want.
> 
> It is only the masking done by the I-pipe (aka
> desc->ipipe_ack/desc->ipipe_end) which use this method. The real masking
> is still done by masking at IO-APIC level.

Then you need to prevent the (mis-)use of XN_ISR_NOENABLE.

And when is the end executed for Linux IRQs? Do you want to migrate from
TPR-based based masking to standard IO-APIC masking when switching to
the Linux domain? But that will not avoid the IO-APIC access latency,
just reshuffle it.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 14:35                               ` Jan Kiszka
@ 2012-09-17 17:46                                 ` Gilles Chanteperdrix
  2012-09-17 18:05                                   ` Jan Kiszka
  0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17 17:46 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/17/2012 04:35 PM, Jan Kiszka wrote:

> On 2012-09-17 16:02, Gilles Chanteperdrix wrote:
>> On 09/17/2012 03:54 PM, Jan Kiszka wrote:
>>> On 2012-09-17 15:46, Gilles Chanteperdrix wrote:
>>>> On 09/17/2012 02:27 PM, Jan Kiszka wrote:
>>>>> On 2012-09-17 14:15, Henri Roosen wrote:
>>>>>> On Mon, Sep 17, 2012 at 1:14 PM, Gilles Chanteperdrix
>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>> On 09/17/2012 12:39 PM, Henri Roosen wrote:
>>>>>>>
>>>>>>>> On Mon, Sep 17, 2012 at 12:00 PM, Gilles Chanteperdrix
>>>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>> On 09/17/2012 11:42 AM, Jan Kiszka wrote:
>>>>>>>>>> On 2012-09-17 11:29, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 09/17/2012 11:07 AM, Jan Kiszka wrote:
>>>>>>>>>>>> On 2012-09-17 10:32, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On 09/17/2012 10:18 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> looking at x86 latencies, I found that what was taking long on my atom
>>>>>>>>>>>>>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I experimented
>>>>>>>>>>>>>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the "task
>>>>>>>>>>>>>>>>> priority" register. This seems to improve latencies on my atom:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>>>>>>>>>>>>>>> priority sets, the final implementation would use ipipe_enable_irqdesc
>>>>>>>>>>>>>>>>> to detect a high priority domain, and change the vector at that time.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This also improves the latencies on my old PIII with a VIA chipset, but
>>>>>>>>>>>>>>>>> it generates spurious interrupts (I do not know if it really is a
>>>>>>>>>>>>>>>>> matter, as handling a spurious interrupt is still faster than masking an
>>>>>>>>>>>>>>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>>>>>>>>>>>>>>> documented behaviour of the LAPIC.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>>>>>>>>>>>>>>> IO-APIC the exception more than the rule, or having to split the vector
>>>>>>>>>>>>>>>>> space appears too great a restriction?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>>>>>>>>>>>>>>> devices - likely what we are primarily interesting in here - due to MSI.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Even if I enable MSI, the kernel still uses these irqs for the
>>>>>>>>>>>>>>> peripherals integrated to the chipset, such as the USB HCI, or ATA
>>>>>>>>>>>>>>> driver (IOW, non PCI devices).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Those are all PCI as well. And modern chipsets include variants of them
>>>>>>>>>>>>>> with MSI(-X) support.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> atom login: root
>>>>>>>>>>>>>>> # cat /proc/interrupts
>>>>>>>>>>>>>>>            CPU0       CPU1
>>>>>>>>>>>>>>>   0:         41          0   IO-APIC-edge      timer
>>>>>>>>>>>>>>>   4:         39          0   IO-APIC-edge      serial
>>>>>>>>>>>>>>>   9:          0          0   IO-APIC-fasteoi   acpi
>>>>>>>>>>>>>>>  14:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>>>>>>>  15:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>>>>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>>>>>>>>>>>>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>>>>>>>>>>>>>>  19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>>>>>>>>>>>>>>>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
>>>>>>>>>>>>>>>  43:       2704          0   PCI-MSI-edge      eth0
>>>>>>>>>>>>>>>  44:        249          0   PCI-MSI-edge      snd_hda_intel
>>>>>>>>>>>>>>> NMI:          0          0   Non-maskable interrupts
>>>>>>>>>>>>>>> LOC:        661        644   Local timer interrupts
>>>>>>>>>>>>>>> SPU:          0          0   Spurious interrupts
>>>>>>>>>>>>>>> PMI:          0          0   Performance monitoring interrupts
>>>>>>>>>>>>>>> IWI:          0          0   IRQ work interrupts
>>>>>>>>>>>>>>> RTR:          0          0   APIC ICR read retries
>>>>>>>>>>>>>>> RES:       1582       2225   Rescheduling interrupts
>>>>>>>>>>>>>>> CAL:         26         48   Function call interrupts
>>>>>>>>>>>>>>> TLB:         10         19   TLB shootdowns
>>>>>>>>>>>>>>> ERR:          0
>>>>>>>>>>>>>>> MIS:          0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I do not think peripherals integrated to chipsets can really be
>>>>>>>>>>>>>>> considered "legacy". And they tend to be used in the field...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The good news is that, even on your low-end atom, you can avoid those
>>>>>>>>>>>>>> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
>>>>>>>>>>>>>> core and the RT on the other. That's getting easier and easier due to
>>>>>>>>>>>>>> the inflation of cores.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What if you want to use RTUSB for instance?
>>>>>>>>>>>>
>>>>>>>>>>>> Then I will likely not worry about a few micros of additional latency
>>>>>>>>>>>> due to IO-APIC accesses.
>>>>>>>>>>>
>>>>>>>>>>> On my atom, taking an IO-APIC fasteoi interrupt, acking and masking it,
>>>>>>>>>>> takes 10us in UP, and 20us in SMP (with the tracer on).
>>>>>>>>>>
>>>>>>>>>> ...and on more appropriate chipsets? I bet the Atom is (once again) off
>>>>>>>>>> here.
>>>>>>>>>
>>>>>>>>> I do not know, do you care for sharing your traces with us? I only run
>>>>>>>>> Xenomai on atom (which I am not sure do not qualify as "modern", new
>>>>>>>>> atoms seem to be produced), geode (ok, this one is definitely dead, but
>>>>>>>>> there seem to be people still running xenomai on them), and an old
>>>>>>>>> pentium III with an old VIA686 chipset, where masking the IO-APIC is
>>>>>>>>> even slower than acking the i8259.
>>>>>>>>>
>>>>>>>>> Anyway, the IO-APIC registers accesses does not look designed for speed:
>>>>>>>>> it has an indirect scheme that seem more designed to save space in the
>>>>>>>>> processor mapping and to be configured once and for all when
>>>>>>>>> enabling/disabling interrupt, not at each and every interrupt.
>>>>>>>>>
>>>>>>>>> The point is: people may want to use Xenomai on atoms. We do not really
>>>>>>>>> know on what kind of x86 people run xenomai, knowing that would help us
>>>>>>>>> directing our efforts.
>>>>>>>>
>>>>>>>> We are currently investigating whether we can use Atom's for our
>>>>>>>> future products. We have to stick to the x86 architecture and our
>>>>>>>> products should work without big cooling fans. Currently running tests
>>>>>>>> on Atom D2700 (which I know is EOL, but for research purposes should
>>>>>>>> give us a good indication).
>>>>>>>>
>>>>>>>> A 20us latency gain is a lot and would be very welcome in our system!
>>>>>>>
>>>>>>>
>>>>>>> If you enable CONFIG_MSI, do you still see some IO-APIC-fasteoi in
>>>>>>> /proc/interrupts?
>>>>>>>
>>>>>>
>>>>>> The kernel config has no CONFIG_MSI, but instead:
>>>>>> CONFIG_ARCH_SUPPORTS_MSI=y
>>>>>> CONFIG_PCI_MSI=y
>>>>>>
>>>>>> There is still IO-APIC-fasteoi in /proc/interrupts:
>>>>>>
>>>>>> # cat /proc/interrupts
>>>>>>            CPU0       CPU1
>>>>>>   0:        250          0   IO-APIC-edge      timer
>>>>>>   4:         71          0   IO-APIC-edge      serial
>>>>>>   7:         29          0   IO-APIC-edge
>>>>>>   8:          0          0   IO-APIC-edge      rtc0
>>>>>>   9:          0          0   IO-APIC-fasteoi   acpi
>>>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>>>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>>>>>  19:         41          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>>>>>>  23:       5440          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
>>>>>>  40:        940          0   PCI-MSI-edge      eth0
>>>>>>  41:         21          0   PCI-MSI-edge      xhci_hcd
>>>>>>  42:          0          0   PCI-MSI-edge      xhci_hcd
>>>>>>  43:          0          0   PCI-MSI-edge      xhci_hcd
>>>>>> NMI:          0          0   Non-maskable interrupts
>>>>>> LOC:      29559      25129   Local timer interrupts
>>>>>> SPU:          0          0   Spurious interrupts
>>>>>> PMI:          0          0   Performance monitoring interrupts
>>>>>> IWI:          0          0   IRQ work interrupts
>>>>>> RTR:          0          0   APIC ICR read retries
>>>>>> RES:         20          0   Rescheduling interrupts
>>>>>> CAL:          0          8   Function call interrupts
>>>>>> TLB:          9          5   TLB shootdowns
>>>>>> ERR:         74
>>>>>> MIS:          0
>>>>>
>>>>> Unless you are short on CPU resources: isolcpus=1. At least bind all
>>>>> Linux IRQs to one CPU. That's independent of any potential low-level
>>>>> optimizations.
>>>>
>>>> The advantage of the masking at LAPIC using elevated priority I propose
>>>> is that for most APICs, the IO-APIC will forward the interrupts to the
>>>> cpus not currently running with elevated priority (that is what the
>>>> dest_LowestPrio constant means). Dynamically.
>>>
>>> And the advantage of isolcpus is that it avoids any kind of disturbances
>>> due to dynamics, thus provides the best latency.
>>
>> And the worst scalability.
> 
> There is no free lunch.
> 
>> But I agree on machines where the cache is
>> not shared between cores (which I believe is not the case of atom), the
>> fact to not send the irq to the same core every time is detrimental to
>> no real-time performances.
>>
>>>
>>> Also, I'm not sure how what efforts will be required to handle cases
>>> Linux or some RT driver decides to keep an IRQ masked for a longer
>>> period. That would block everything below that level and is surely not
>>> what we want.
>>
>> It is only the masking done by the I-pipe (aka
>> desc->ipipe_ack/desc->ipipe_end) which use this method. The real masking
>> is still done by masking at IO-APIC level.
> 
> Then you need to prevent the (mis-)use of XN_ISR_NOENABLE.


ipipe_end is a nop when called from primary domain, yes, but this is not
very different from edge irqs. Also, fasteoi become a bit like MSI: in
the same way as we can not mask MSI from primary domain, we should not
mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
can live with MSI without masking them in primary mode, I guess we can
do the same with fasteoi irqs.

> 
> And when is the end executed for Linux IRQs? Do you want to migrate from
> TPR-based based masking to standard IO-APIC masking when switching to
> the Linux domain?


The end for Linux irqs is executed in handle_fasteoi_irq, as ususal,
through the ->irq_release callback. This irq_release callback restores
the LAPIC priority to the low priority state. The current implementation
of the irq_hold and irq_release callbacks is:

void ipipe_mute_pic(void)
{
#if 0
	int *mutedp = __this_cpu_ptr(&__ipipe_pic_muted);
	int muted = *mutedp;
	*mutedp = muted | 1;
	if (muted == 0)
		apic_write(APIC_TASKPRI, 0x70);
#else
	apic_write(APIC_TASKPRI, 0x70);
	__this_cpu_write(__ipipe_pic_muted, 1);
#endif
}

void ipipe_unmute_pic(void)
{
#if 0
	int *mutedp = __this_cpu_ptr(&__ipipe_pic_muted);
	int muted = *mutedp & ~1;
	*mutedp = muted;
	if (muted == 0)
		apic_write(APIC_TASKPRI, 0);
#else
	apic_write(APIC_TASKPRI, 0);
	__this_cpu_write(__ipipe_pic_muted, 0);
#endif
}

static void hold_ioapic_irq(struct irq_data *data)
{
#if 0
	unsigned cpu = ipipe_processor_id();
	int *mutedp = &per_cpu(__ipipe_pic_muted, cpu);
	int muted = *mutedp;
	*mutedp = muted | 2;
	if (muted == 0)
		apic_write(APIC_TASKPRI, 0x70);
#else
	if (__this_cpu_read(__ipipe_pic_muted) == 0)
		apic_write(APIC_TASKPRI, 0x70);
#endif
	ack_apic_level(data);
}

static void release_ioapic_irq(struct irq_data *data)
{
	unsigned long flags = hard_local_irq_save();
#if 0
	unsigned cpu = ipipe_processor_id();
	int *mutedp = &per_cpu(__ipipe_pic_muted, cpu);
	int muted = *mutedp & ~2;
	*mutedp = muted;
	if (muted == 0)
		apic_write(APIC_TASKPRI, 0);
#else
	if (__this_cpu_read(__ipipe_pic_muted) == 0)
		apic_write(APIC_TASKPRI, 0);
#endif
	hard_local_irq_restore(flags);
}

Both implementation work, but the one in #if 0 seems to have a higher
overhead, though it is more correct (though not completely, what we
would want is to restore the TPRR when we have handled all the pending
linux irqs, to avoid retriggering some of them).

So, we NEVER use the IO-APIC masking, that is the point, except when
really masking.

> But that will not avoid the IO-APIC access latency,
> just reshuffle it.


The IO-APIC access latency only happens when someone masks an IO-APIC
irq, which should be rare (but I agree, maybe with network drivers we
have an issue here).

What I did was a quick test to see whether we gain something, and it
seems we do, I do not claim to have covered all the details.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 17:46                                 ` Gilles Chanteperdrix
@ 2012-09-17 18:05                                   ` Jan Kiszka
  2012-09-17 18:08                                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2012-09-17 18:05 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
> ipipe_end is a nop when called from primary domain, yes, but this is not
> very different from edge irqs. Also, fasteoi become a bit like MSI: in
> the same way as we can not mask MSI from primary domain, we should not
> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
> can live with MSI without masking them in primary mode, I guess we can
> do the same with fasteoi irqs.

MSIs are edge triggered, fasteois are still level-based. They require
masking at the point you defer them - what we do and what Linux may even
extend beyond that. If you mask them by raising the task priority, you
have to keep it raised until Linux finally handled the IRQ. Or you
decide to mask it at IO-APIC level again. If you keep the TPR raised,
you will block more than what Linux wants to block.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 18:05                                   ` Jan Kiszka
@ 2012-09-17 18:08                                     ` Gilles Chanteperdrix
  2012-09-17 18:12                                       ` Jan Kiszka
  0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17 18:08 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/17/2012 08:05 PM, Jan Kiszka wrote:

> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>> ipipe_end is a nop when called from primary domain, yes, but this is not
>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>> the same way as we can not mask MSI from primary domain, we should not
>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>> can live with MSI without masking them in primary mode, I guess we can
>> do the same with fasteoi irqs.
> 
> MSIs are edge triggered, fasteois are still level-based. They require
> masking at the point you defer them - what we do and what Linux may even
> extend beyond that. If you mask them by raising the task priority, you
> have to keep it raised until Linux finally handled the IRQ.


Yes.

> Or you
> decide to mask it at IO-APIC level again.


We do not want that.

> If you keep the TPR raised,
> you will block more than what Linux wants to block.


The point is that if the TPR keeps raised, it means that primary domain
has preempted Linux, so, we want it to keep that way. Otherwise the TPR
gets lowered when Linux has handled the interrupt.

A week-end of testing made me sure of one thing: it works. I assure you.


> 
> Jan
> 



-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 18:08                                     ` Gilles Chanteperdrix
@ 2012-09-17 18:12                                       ` Jan Kiszka
  2012-09-17 18:13                                         ` Gilles Chanteperdrix
                                                           ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Jan Kiszka @ 2012-09-17 18:12 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
> 
>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>> the same way as we can not mask MSI from primary domain, we should not
>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>> can live with MSI without masking them in primary mode, I guess we can
>>> do the same with fasteoi irqs.
>>
>> MSIs are edge triggered, fasteois are still level-based. They require
>> masking at the point you defer them - what we do and what Linux may even
>> extend beyond that. If you mask them by raising the task priority, you
>> have to keep it raised until Linux finally handled the IRQ.
> 
> 
> Yes.
> 
>> Or you
>> decide to mask it at IO-APIC level again.
> 
> 
> We do not want that.
> 
>> If you keep the TPR raised,
>> you will block more than what Linux wants to block.
> 
> 
> The point is that if the TPR keeps raised, it means that primary domain
> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
> gets lowered when Linux has handled the interrupt.
> 
> A week-end of testing made me sure of one thing: it works. I assure you.

Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
you face threaded IRQs - I assure you.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 18:12                                       ` Jan Kiszka
@ 2012-09-17 18:13                                         ` Gilles Chanteperdrix
  2012-09-17 18:15                                         ` Jan Kiszka
  2012-09-17 18:15                                         ` Gilles Chanteperdrix
  2 siblings, 0 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17 18:13 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/17/2012 08:12 PM, Jan Kiszka wrote:

> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>
>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>>> the same way as we can not mask MSI from primary domain, we should not
>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>>> can live with MSI without masking them in primary mode, I guess we can
>>>> do the same with fasteoi irqs.
>>>
>>> MSIs are edge triggered, fasteois are still level-based. They require
>>> masking at the point you defer them - what we do and what Linux may even
>>> extend beyond that. If you mask them by raising the task priority, you
>>> have to keep it raised until Linux finally handled the IRQ.
>>
>>
>> Yes.
>>
>>> Or you
>>> decide to mask it at IO-APIC level again.
>>
>>
>> We do not want that.
>>
>>> If you keep the TPR raised,
>>> you will block more than what Linux wants to block.
>>
>>
>> The point is that if the TPR keeps raised, it means that primary domain
>> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
>> gets lowered when Linux has handled the interrupt.
>>
>> A week-end of testing made me sure of one thing: it works. I assure you.
> 
> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
> you face threaded IRQs - I assure you.


As I said, I have not covered all the cases. Just made it work with my
setup, but I do not see why we could not get it working for IRQF_ONESHOT.


-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 18:12                                       ` Jan Kiszka
  2012-09-17 18:13                                         ` Gilles Chanteperdrix
@ 2012-09-17 18:15                                         ` Jan Kiszka
  2012-09-17 18:16                                           ` Gilles Chanteperdrix
  2012-09-17 18:18                                           ` Gilles Chanteperdrix
  2012-09-17 18:15                                         ` Gilles Chanteperdrix
  2 siblings, 2 replies; 40+ messages in thread
From: Jan Kiszka @ 2012-09-17 18:15 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2012-09-17 20:12, Jan Kiszka wrote:
> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>
>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>>> the same way as we can not mask MSI from primary domain, we should not
>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>>> can live with MSI without masking them in primary mode, I guess we can
>>>> do the same with fasteoi irqs.
>>>
>>> MSIs are edge triggered, fasteois are still level-based. They require
>>> masking at the point you defer them - what we do and what Linux may even
>>> extend beyond that. If you mask them by raising the task priority, you
>>> have to keep it raised until Linux finally handled the IRQ.
>>
>>
>> Yes.
>>
>>> Or you
>>> decide to mask it at IO-APIC level again.
>>
>>
>> We do not want that.
>>
>>> If you keep the TPR raised,
>>> you will block more than what Linux wants to block.
>>
>>
>> The point is that if the TPR keeps raised, it means that primary domain
>> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
>> gets lowered when Linux has handled the interrupt.
>>
>> A week-end of testing made me sure of one thing: it works. I assure you.
> 
> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
> you face threaded IRQs - I assure you.

Well, it may work (if mask/unmask callbacks work as native) but the
benefit is gone: masking at IO-APIC level will be done again. Given that
threaded IRQs become increasingly popular, it will also be hard to avoid
them in common setups.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 18:12                                       ` Jan Kiszka
  2012-09-17 18:13                                         ` Gilles Chanteperdrix
  2012-09-17 18:15                                         ` Jan Kiszka
@ 2012-09-17 18:15                                         ` Gilles Chanteperdrix
  2 siblings, 0 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17 18:15 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/17/2012 08:12 PM, Jan Kiszka wrote:

> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>
>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>>> the same way as we can not mask MSI from primary domain, we should not
>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>>> can live with MSI without masking them in primary mode, I guess we can
>>>> do the same with fasteoi irqs.
>>>
>>> MSIs are edge triggered, fasteois are still level-based. They require
>>> masking at the point you defer them - what we do and what Linux may even
>>> extend beyond that. If you mask them by raising the task priority, you
>>> have to keep it raised until Linux finally handled the IRQ.
>>
>>
>> Yes.
>>
>>> Or you
>>> decide to mask it at IO-APIC level again.
>>
>>
>> We do not want that.
>>
>>> If you keep the TPR raised,
>>> you will block more than what Linux wants to block.
>>
>>
>> The point is that if the TPR keeps raised, it means that primary domain
>> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
>> gets lowered when Linux has handled the interrupt.
>>
>> A week-end of testing made me sure of one thing: it works. I assure you.
> 
> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
> you face threaded IRQs - I assure you.


I am not sure IRQF_ONESHOT works with CONFIG_IPIPE, anyway:

#ifdef CONFIG_IPIPE
	/* XXX: IRQCHIP_EOI_IF_HANDLED is ignored. */
	if (desc->irq_data.chip->irq_release)
		desc->irq_data.chip->irq_release(&desc->irq_data);
out_eoi:
#else  /* !CONFIG_IPIPE */
	if (desc->istate & IRQS_ONESHOT)
		cond_unmask_irq(desc);

out_eoi:
	desc->irq_data.chip->irq_eoi(&desc->irq_data);
#endif	/* !CONFIG_IPIPE */




> 
> Jan
> 



-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 18:15                                         ` Jan Kiszka
@ 2012-09-17 18:16                                           ` Gilles Chanteperdrix
  2012-09-17 18:18                                             ` Jan Kiszka
  2012-09-17 18:18                                           ` Gilles Chanteperdrix
  1 sibling, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17 18:16 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/17/2012 08:15 PM, Jan Kiszka wrote:

> On 2012-09-17 20:12, Jan Kiszka wrote:
>> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>>
>>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>>>> the same way as we can not mask MSI from primary domain, we should not
>>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>>>> can live with MSI without masking them in primary mode, I guess we can
>>>>> do the same with fasteoi irqs.
>>>>
>>>> MSIs are edge triggered, fasteois are still level-based. They require
>>>> masking at the point you defer them - what we do and what Linux may even
>>>> extend beyond that. If you mask them by raising the task priority, you
>>>> have to keep it raised until Linux finally handled the IRQ.
>>>
>>>
>>> Yes.
>>>
>>>> Or you
>>>> decide to mask it at IO-APIC level again.
>>>
>>>
>>> We do not want that.
>>>
>>>> If you keep the TPR raised,
>>>> you will block more than what Linux wants to block.
>>>
>>>
>>> The point is that if the TPR keeps raised, it means that primary domain
>>> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
>>> gets lowered when Linux has handled the interrupt.
>>>
>>> A week-end of testing made me sure of one thing: it works. I assure you.
>>
>> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
>> you face threaded IRQs - I assure you.
> 
> Well, it may work (if mask/unmask callbacks work as native) but the
> benefit is gone: masking at IO-APIC level will be done again. Given that
> threaded IRQs become increasingly popular, it will also be hard to avoid
> them in common setups.


Do they really make sense on low end platforms such as atom? We can make
this new masking a compilation option.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 18:15                                         ` Jan Kiszka
  2012-09-17 18:16                                           ` Gilles Chanteperdrix
@ 2012-09-17 18:18                                           ` Gilles Chanteperdrix
  2012-09-17 18:22                                             ` Gilles Chanteperdrix
  2012-09-17 18:29                                             ` Jan Kiszka
  1 sibling, 2 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17 18:18 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/17/2012 08:15 PM, Jan Kiszka wrote:

> On 2012-09-17 20:12, Jan Kiszka wrote:
>> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>>
>>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>>>> the same way as we can not mask MSI from primary domain, we should not
>>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>>>> can live with MSI without masking them in primary mode, I guess we can
>>>>> do the same with fasteoi irqs.
>>>>
>>>> MSIs are edge triggered, fasteois are still level-based. They require
>>>> masking at the point you defer them - what we do and what Linux may even
>>>> extend beyond that. If you mask them by raising the task priority, you
>>>> have to keep it raised until Linux finally handled the IRQ.
>>>
>>>
>>> Yes.
>>>
>>>> Or you
>>>> decide to mask it at IO-APIC level again.
>>>
>>>
>>> We do not want that.
>>>
>>>> If you keep the TPR raised,
>>>> you will block more than what Linux wants to block.
>>>
>>>
>>> The point is that if the TPR keeps raised, it means that primary domain
>>> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
>>> gets lowered when Linux has handled the interrupt.
>>>
>>> A week-end of testing made me sure of one thing: it works. I assure you.
>>
>> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
>> you face threaded IRQs - I assure you.
> 
> Well, it may work (if mask/unmask callbacks work as native) but the
> benefit is gone: masking at IO-APIC level will be done again. Given that
> threaded IRQs become increasingly popular, it will also be hard to avoid
> them in common setups.


The thing is, if we no longer use the IO-APIC spinlock from primary
domain, we may not have to turn it into an ipipe_spinlock, and may be
able to preempt the IO-APIC masking.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 18:16                                           ` Gilles Chanteperdrix
@ 2012-09-17 18:18                                             ` Jan Kiszka
  0 siblings, 0 replies; 40+ messages in thread
From: Jan Kiszka @ 2012-09-17 18:18 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2012-09-17 20:16, Gilles Chanteperdrix wrote:
> On 09/17/2012 08:15 PM, Jan Kiszka wrote:
> 
>> On 2012-09-17 20:12, Jan Kiszka wrote:
>>> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>>>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>>>
>>>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>>>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>>>>> the same way as we can not mask MSI from primary domain, we should not
>>>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>>>>> can live with MSI without masking them in primary mode, I guess we can
>>>>>> do the same with fasteoi irqs.
>>>>>
>>>>> MSIs are edge triggered, fasteois are still level-based. They require
>>>>> masking at the point you defer them - what we do and what Linux may even
>>>>> extend beyond that. If you mask them by raising the task priority, you
>>>>> have to keep it raised until Linux finally handled the IRQ.
>>>>
>>>>
>>>> Yes.
>>>>
>>>>> Or you
>>>>> decide to mask it at IO-APIC level again.
>>>>
>>>>
>>>> We do not want that.
>>>>
>>>>> If you keep the TPR raised,
>>>>> you will block more than what Linux wants to block.
>>>>
>>>>
>>>> The point is that if the TPR keeps raised, it means that primary domain
>>>> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
>>>> gets lowered when Linux has handled the interrupt.
>>>>
>>>> A week-end of testing made me sure of one thing: it works. I assure you.
>>>
>>> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
>>> you face threaded IRQs - I assure you.
>>
>> Well, it may work (if mask/unmask callbacks work as native) but the
>> benefit is gone: masking at IO-APIC level will be done again. Given that
>> threaded IRQs become increasingly popular, it will also be hard to avoid
>> them in common setups.
> 
> 
> Do they really make sense on low end platforms such as atom? We can make
> this new masking a compilation option.

If your Linux driver is built around having thread context in its IRQ
handler, you lost. And there is no option to disable this feature
anyway, only to control forced threading.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 18:18                                           ` Gilles Chanteperdrix
@ 2012-09-17 18:22                                             ` Gilles Chanteperdrix
  2012-09-17 18:29                                             ` Jan Kiszka
  1 sibling, 0 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17 18:22 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/17/2012 08:18 PM, Gilles Chanteperdrix wrote:

> On 09/17/2012 08:15 PM, Jan Kiszka wrote:
> 
>> On 2012-09-17 20:12, Jan Kiszka wrote:
>>> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>>>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>>>
>>>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>>>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>>>>> the same way as we can not mask MSI from primary domain, we should not
>>>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>>>>> can live with MSI without masking them in primary mode, I guess we can
>>>>>> do the same with fasteoi irqs.
>>>>>
>>>>> MSIs are edge triggered, fasteois are still level-based. They require
>>>>> masking at the point you defer them - what we do and what Linux may even
>>>>> extend beyond that. If you mask them by raising the task priority, you
>>>>> have to keep it raised until Linux finally handled the IRQ.
>>>>
>>>>
>>>> Yes.
>>>>
>>>>> Or you
>>>>> decide to mask it at IO-APIC level again.
>>>>
>>>>
>>>> We do not want that.
>>>>
>>>>> If you keep the TPR raised,
>>>>> you will block more than what Linux wants to block.
>>>>
>>>>
>>>> The point is that if the TPR keeps raised, it means that primary domain
>>>> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
>>>> gets lowered when Linux has handled the interrupt.
>>>>
>>>> A week-end of testing made me sure of one thing: it works. I assure you.
>>>
>>> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
>>> you face threaded IRQs - I assure you.
>>
>> Well, it may work (if mask/unmask callbacks work as native) but the
>> benefit is gone: masking at IO-APIC level will be done again. Given that
>> threaded IRQs become increasingly popular, it will also be hard to avoid
>> them in common setups.
> 
> 
> The thing is, if we no longer use the IO-APIC spinlock from primary
> domain, we may not have to turn it into an ipipe_spinlock, and may be
> able to preempt the IO-APIC masking.
> 


No, we need it in ack_apic_level in the IO-APIC erratum case.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 18:18                                           ` Gilles Chanteperdrix
  2012-09-17 18:22                                             ` Gilles Chanteperdrix
@ 2012-09-17 18:29                                             ` Jan Kiszka
  2012-09-17 18:37                                               ` Gilles Chanteperdrix
  1 sibling, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2012-09-17 18:29 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2012-09-17 20:18, Gilles Chanteperdrix wrote:
> On 09/17/2012 08:15 PM, Jan Kiszka wrote:
> 
>> On 2012-09-17 20:12, Jan Kiszka wrote:
>>> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>>>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>>>
>>>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>>>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>>>>> the same way as we can not mask MSI from primary domain, we should not
>>>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>>>>> can live with MSI without masking them in primary mode, I guess we can
>>>>>> do the same with fasteoi irqs.
>>>>>
>>>>> MSIs are edge triggered, fasteois are still level-based. They require
>>>>> masking at the point you defer them - what we do and what Linux may even
>>>>> extend beyond that. If you mask them by raising the task priority, you
>>>>> have to keep it raised until Linux finally handled the IRQ.
>>>>
>>>>
>>>> Yes.
>>>>
>>>>> Or you
>>>>> decide to mask it at IO-APIC level again.
>>>>
>>>>
>>>> We do not want that.
>>>>
>>>>> If you keep the TPR raised,
>>>>> you will block more than what Linux wants to block.
>>>>
>>>>
>>>> The point is that if the TPR keeps raised, it means that primary domain
>>>> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
>>>> gets lowered when Linux has handled the interrupt.
>>>>
>>>> A week-end of testing made me sure of one thing: it works. I assure you.
>>>
>>> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
>>> you face threaded IRQs - I assure you.
>>
>> Well, it may work (if mask/unmask callbacks work as native) but the
>> benefit is gone: masking at IO-APIC level will be done again. Given that
>> threaded IRQs become increasingly popular, it will also be hard to avoid
>> them in common setups.
> 
> 
> The thing is, if we no longer use the IO-APIC spinlock from primary
> domain, we may not have to turn it into an ipipe_spinlock, and may be
> able to preempt the IO-APIC masking.

That might be true - but is the latency related to the lock or the
hardware access? In the latter case, you will still stall the CPU on it
and have to isolate the load on a non-RT CPU again.

BTW, the task priority for the RT domain is a quite important parameter.
If you put it too low, Linux can run out of vectors. If you put it too
high, the same may happen to Xenomai - on bigger boxes.

On the other hand, it may be useful as a complete mask for Linux, ie.
when applied unconditionally on entry to the RT domain and released on
exit or fasteoi ending. Already played with that model? That would have
a value beyond IO-APICs.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 18:29                                             ` Jan Kiszka
@ 2012-09-17 18:37                                               ` Gilles Chanteperdrix
  2012-09-17 18:54                                                 ` Jan Kiszka
  0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17 18:37 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/17/2012 08:29 PM, Jan Kiszka wrote:

> On 2012-09-17 20:18, Gilles Chanteperdrix wrote:
>> On 09/17/2012 08:15 PM, Jan Kiszka wrote:
>>
>>> On 2012-09-17 20:12, Jan Kiszka wrote:
>>>> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>>>>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>>>>
>>>>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>>>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>>>>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>>>>>> the same way as we can not mask MSI from primary domain, we should not
>>>>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>>>>>> can live with MSI without masking them in primary mode, I guess we can
>>>>>>> do the same with fasteoi irqs.
>>>>>>
>>>>>> MSIs are edge triggered, fasteois are still level-based. They require
>>>>>> masking at the point you defer them - what we do and what Linux may even
>>>>>> extend beyond that. If you mask them by raising the task priority, you
>>>>>> have to keep it raised until Linux finally handled the IRQ.
>>>>>
>>>>>
>>>>> Yes.
>>>>>
>>>>>> Or you
>>>>>> decide to mask it at IO-APIC level again.
>>>>>
>>>>>
>>>>> We do not want that.
>>>>>
>>>>>> If you keep the TPR raised,
>>>>>> you will block more than what Linux wants to block.
>>>>>
>>>>>
>>>>> The point is that if the TPR keeps raised, it means that primary domain
>>>>> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
>>>>> gets lowered when Linux has handled the interrupt.
>>>>>
>>>>> A week-end of testing made me sure of one thing: it works. I assure you.
>>>>
>>>> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
>>>> you face threaded IRQs - I assure you.
>>>
>>> Well, it may work (if mask/unmask callbacks work as native) but the
>>> benefit is gone: masking at IO-APIC level will be done again. Given that
>>> threaded IRQs become increasingly popular, it will also be hard to avoid
>>> them in common setups.
>>
>>
>> The thing is, if we no longer use the IO-APIC spinlock from primary
>> domain, we may not have to turn it into an ipipe_spinlock, and may be
>> able to preempt the IO-APIC masking.
> 
> That might be true - but is the latency related to the lock or the
> hardware access? In the latter case, you will still stall the CPU on it
> and have to isolate the load on a non-RT CPU again.
> 
> BTW, the task priority for the RT domain is a quite important parameter.
> If you put it too low, Linux can run out of vectors. If you put it too
> high, the same may happen to Xenomai - on bigger boxes.


Yes, and there are only 16 levels. But Xenomai does not need to many levels.

> 
> On the other hand, it may be useful as a complete mask for Linux, ie.
> when applied unconditionally on entry to the RT domain and released on
> exit or fasteoi ending. Already played with that model? That would have
> a value beyond IO-APICs.


Yes, that is exactly what I did. We call this "PIC muting" on other
architectures, I simply implemented it on x86 with the APIC TRR (that is
the second curve in the plot posted at the beginning of this thread),
but it does not solve the problem when the APIC timer interrupt happens
while the I-pipe is busy acking an IO-APIC fasteoi. Which is why I
implemented the second solution (third curve in the graph), which is
exactly what you describe and corresponds the code I posted a few
messages ago.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 18:37                                               ` Gilles Chanteperdrix
@ 2012-09-17 18:54                                                 ` Jan Kiszka
  2012-09-17 21:50                                                   ` Gilles Chanteperdrix
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2012-09-17 18:54 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2012-09-17 20:37, Gilles Chanteperdrix wrote:
> On 09/17/2012 08:29 PM, Jan Kiszka wrote:
> 
>> On 2012-09-17 20:18, Gilles Chanteperdrix wrote:
>>> On 09/17/2012 08:15 PM, Jan Kiszka wrote:
>>>
>>>> On 2012-09-17 20:12, Jan Kiszka wrote:
>>>>> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>>>>>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>>>>>
>>>>>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>>>>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>>>>>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>>>>>>> the same way as we can not mask MSI from primary domain, we should not
>>>>>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>>>>>>> can live with MSI without masking them in primary mode, I guess we can
>>>>>>>> do the same with fasteoi irqs.
>>>>>>>
>>>>>>> MSIs are edge triggered, fasteois are still level-based. They require
>>>>>>> masking at the point you defer them - what we do and what Linux may even
>>>>>>> extend beyond that. If you mask them by raising the task priority, you
>>>>>>> have to keep it raised until Linux finally handled the IRQ.
>>>>>>
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> Or you
>>>>>>> decide to mask it at IO-APIC level again.
>>>>>>
>>>>>>
>>>>>> We do not want that.
>>>>>>
>>>>>>> If you keep the TPR raised,
>>>>>>> you will block more than what Linux wants to block.
>>>>>>
>>>>>>
>>>>>> The point is that if the TPR keeps raised, it means that primary domain
>>>>>> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
>>>>>> gets lowered when Linux has handled the interrupt.
>>>>>>
>>>>>> A week-end of testing made me sure of one thing: it works. I assure you.
>>>>>
>>>>> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
>>>>> you face threaded IRQs - I assure you.
>>>>
>>>> Well, it may work (if mask/unmask callbacks work as native) but the
>>>> benefit is gone: masking at IO-APIC level will be done again. Given that
>>>> threaded IRQs become increasingly popular, it will also be hard to avoid
>>>> them in common setups.
>>>
>>>
>>> The thing is, if we no longer use the IO-APIC spinlock from primary
>>> domain, we may not have to turn it into an ipipe_spinlock, and may be
>>> able to preempt the IO-APIC masking.
>>
>> That might be true - but is the latency related to the lock or the
>> hardware access? In the latter case, you will still stall the CPU on it
>> and have to isolate the load on a non-RT CPU again.
>>
>> BTW, the task priority for the RT domain is a quite important parameter.
>> If you put it too low, Linux can run out of vectors. If you put it too
>> high, the same may happen to Xenomai - on bigger boxes.
> 
> 
> Yes, and there are only 16 levels. But Xenomai does not need to many levels.

How is telling you this? It's part of the system setup. And that may
lean toward RT or toward non-RT. This level should be adjusted according
to the current allocation of Linux and the RT domain for a particular
CPU, not hard-coded or compile-time defined.

> 
>>
>> On the other hand, it may be useful as a complete mask for Linux, ie.
>> when applied unconditionally on entry to the RT domain and released on
>> exit or fasteoi ending. Already played with that model? That would have
>> a value beyond IO-APICs.
> 
> 
> Yes, that is exactly what I did. We call this "PIC muting" on other
> architectures, I simply implemented it on x86 with the APIC TRR (that is
> the second curve in the plot posted at the beginning of this thread),
> but it does not solve the problem when the APIC timer interrupt happens
> while the I-pipe is busy acking an IO-APIC fasteoi. Which is why I
> implemented the second solution (third curve in the graph), which is
> exactly what you describe and corresponds the code I posted a few
> messages ago.

I would still refrain from directing non-RT IRQs to RT CPUs. But if
IO-APIC latency reductions comes as a byproduct of Linux IRQ muting,
then it might be worth the additional maintenance pain (shuffling
vectors around will cause pain, be assured).

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 18:54                                                 ` Jan Kiszka
@ 2012-09-17 21:50                                                   ` Gilles Chanteperdrix
  2012-09-18  8:48                                                     ` Jan Kiszka
  0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-17 21:50 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/17/2012 08:54 PM, Jan Kiszka wrote:

> On 2012-09-17 20:37, Gilles Chanteperdrix wrote:
>> On 09/17/2012 08:29 PM, Jan Kiszka wrote:
>>
>>> On 2012-09-17 20:18, Gilles Chanteperdrix wrote:
>>>> On 09/17/2012 08:15 PM, Jan Kiszka wrote:
>>>>
>>>>> On 2012-09-17 20:12, Jan Kiszka wrote:
>>>>>> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>>>>>>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>>>>>>
>>>>>>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>>>>>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>>>>>>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>>>>>>>> the same way as we can not mask MSI from primary domain, we should not
>>>>>>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>>>>>>>> can live with MSI without masking them in primary mode, I guess we can
>>>>>>>>> do the same with fasteoi irqs.
>>>>>>>>
>>>>>>>> MSIs are edge triggered, fasteois are still level-based. They require
>>>>>>>> masking at the point you defer them - what we do and what Linux may even
>>>>>>>> extend beyond that. If you mask them by raising the task priority, you
>>>>>>>> have to keep it raised until Linux finally handled the IRQ.
>>>>>>>
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>> Or you
>>>>>>>> decide to mask it at IO-APIC level again.
>>>>>>>
>>>>>>>
>>>>>>> We do not want that.
>>>>>>>
>>>>>>>> If you keep the TPR raised,
>>>>>>>> you will block more than what Linux wants to block.
>>>>>>>
>>>>>>>
>>>>>>> The point is that if the TPR keeps raised, it means that primary domain
>>>>>>> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
>>>>>>> gets lowered when Linux has handled the interrupt.
>>>>>>>
>>>>>>> A week-end of testing made me sure of one thing: it works. I assure you.
>>>>>>
>>>>>> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
>>>>>> you face threaded IRQs - I assure you.
>>>>>
>>>>> Well, it may work (if mask/unmask callbacks work as native) but the
>>>>> benefit is gone: masking at IO-APIC level will be done again. Given that
>>>>> threaded IRQs become increasingly popular, it will also be hard to avoid
>>>>> them in common setups.
>>>>
>>>>
>>>> The thing is, if we no longer use the IO-APIC spinlock from primary
>>>> domain, we may not have to turn it into an ipipe_spinlock, and may be
>>>> able to preempt the IO-APIC masking.
>>>
>>> That might be true - but is the latency related to the lock or the
>>> hardware access? In the latter case, you will still stall the CPU on it
>>> and have to isolate the load on a non-RT CPU again.
>>>
>>> BTW, the task priority for the RT domain is a quite important parameter.
>>> If you put it too low, Linux can run out of vectors. If you put it too
>>> high, the same may happen to Xenomai - on bigger boxes.
>>
>>
>> Yes, and there are only 16 levels. But Xenomai does not need to many levels.
> 
> How is telling you this? It's part of the system setup. And that may
> lean toward RT or toward non-RT. This level should be adjusted according
> to the current allocation of Linux and the RT domain for a particular
> CPU, not hard-coded or compile-time defined.


In theory, I agree, in practice, lets be crasy, assume someone would
want an RT serial driver with 4 irqs, an RT USB driver with 2 irqs, an
RT CAN driver, and say, 4 RTnet boards. That is still less than the 16
vectors that a single level provides, so, we can probably get along with
2 levels. Or we can use a kernel parameter.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-17 21:50                                                   ` Gilles Chanteperdrix
@ 2012-09-18  8:48                                                     ` Jan Kiszka
  2012-09-18  9:06                                                       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2012-09-18  8:48 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2012-09-17 23:50, Gilles Chanteperdrix wrote:
> On 09/17/2012 08:54 PM, Jan Kiszka wrote:
> 
>> On 2012-09-17 20:37, Gilles Chanteperdrix wrote:
>>> On 09/17/2012 08:29 PM, Jan Kiszka wrote:
>>>
>>>> On 2012-09-17 20:18, Gilles Chanteperdrix wrote:
>>>>> On 09/17/2012 08:15 PM, Jan Kiszka wrote:
>>>>>
>>>>>> On 2012-09-17 20:12, Jan Kiszka wrote:
>>>>>>> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>>>>>>>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>>>>>>>
>>>>>>>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>>>>>>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>>>>>>>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>>>>>>>>> the same way as we can not mask MSI from primary domain, we should not
>>>>>>>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>>>>>>>>> can live with MSI without masking them in primary mode, I guess we can
>>>>>>>>>> do the same with fasteoi irqs.
>>>>>>>>>
>>>>>>>>> MSIs are edge triggered, fasteois are still level-based. They require
>>>>>>>>> masking at the point you defer them - what we do and what Linux may even
>>>>>>>>> extend beyond that. If you mask them by raising the task priority, you
>>>>>>>>> have to keep it raised until Linux finally handled the IRQ.
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes.
>>>>>>>>
>>>>>>>>> Or you
>>>>>>>>> decide to mask it at IO-APIC level again.
>>>>>>>>
>>>>>>>>
>>>>>>>> We do not want that.
>>>>>>>>
>>>>>>>>> If you keep the TPR raised,
>>>>>>>>> you will block more than what Linux wants to block.
>>>>>>>>
>>>>>>>>
>>>>>>>> The point is that if the TPR keeps raised, it means that primary domain
>>>>>>>> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
>>>>>>>> gets lowered when Linux has handled the interrupt.
>>>>>>>>
>>>>>>>> A week-end of testing made me sure of one thing: it works. I assure you.
>>>>>>>
>>>>>>> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
>>>>>>> you face threaded IRQs - I assure you.
>>>>>>
>>>>>> Well, it may work (if mask/unmask callbacks work as native) but the
>>>>>> benefit is gone: masking at IO-APIC level will be done again. Given that
>>>>>> threaded IRQs become increasingly popular, it will also be hard to avoid
>>>>>> them in common setups.
>>>>>
>>>>>
>>>>> The thing is, if we no longer use the IO-APIC spinlock from primary
>>>>> domain, we may not have to turn it into an ipipe_spinlock, and may be
>>>>> able to preempt the IO-APIC masking.
>>>>
>>>> That might be true - but is the latency related to the lock or the
>>>> hardware access? In the latter case, you will still stall the CPU on it
>>>> and have to isolate the load on a non-RT CPU again.
>>>>
>>>> BTW, the task priority for the RT domain is a quite important parameter.
>>>> If you put it too low, Linux can run out of vectors. If you put it too
>>>> high, the same may happen to Xenomai - on bigger boxes.
>>>
>>>
>>> Yes, and there are only 16 levels. But Xenomai does not need to many levels.
>>
>> How is telling you this? It's part of the system setup. And that may
>> lean toward RT or toward non-RT. This level should be adjusted according
>> to the current allocation of Linux and the RT domain for a particular
>> CPU, not hard-coded or compile-time defined.
> 
> 
> In theory, I agree, in practice, lets be crasy, assume someone would
> want an RT serial driver with 4 irqs, an RT USB driver with 2 irqs, an
> RT CAN driver, and say, 4 RTnet boards. That is still less than the 16
> vectors that a single level provides, so, we can probably get along with
> 2 levels. Or we can use a kernel parameter.

Linux - and so should we do - allocates separate levels first as that
provides better performance for external interrupts (need to look up the
precise reason, should be documented in the x86 code). Only if levels
are used up, interrupts will share them. Out of the 16 we have, about
3-4 should already be occupied by exception and system vectors. And, if
you look at today's NICs e.g., you get around 3 vectors per interface at
least. I have a more or less ordinary one here (single port, no SR-IOV)
with 8(!) per port. So interrupt vector shortage is not that far away.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-18  8:48                                                     ` Jan Kiszka
@ 2012-09-18  9:06                                                       ` Gilles Chanteperdrix
  2012-09-18  9:12                                                         ` Gilles Chanteperdrix
  2012-09-18  9:30                                                         ` Jan Kiszka
  0 siblings, 2 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-18  9:06 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/18/2012 10:48 AM, Jan Kiszka wrote:
> On 2012-09-17 23:50, Gilles Chanteperdrix wrote:
>> On 09/17/2012 08:54 PM, Jan Kiszka wrote:
>>
>>> On 2012-09-17 20:37, Gilles Chanteperdrix wrote:
>>>> On 09/17/2012 08:29 PM, Jan Kiszka wrote:
>>>>
>>>>> On 2012-09-17 20:18, Gilles Chanteperdrix wrote:
>>>>>> On 09/17/2012 08:15 PM, Jan Kiszka wrote:
>>>>>>
>>>>>>> On 2012-09-17 20:12, Jan Kiszka wrote:
>>>>>>>> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>>>>>>>>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>>>>>>>>
>>>>>>>>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>>>>>>>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>>>>>>>>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>>>>>>>>>> the same way as we can not mask MSI from primary domain, we should not
>>>>>>>>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>>>>>>>>>> can live with MSI without masking them in primary mode, I guess we can
>>>>>>>>>>> do the same with fasteoi irqs.
>>>>>>>>>>
>>>>>>>>>> MSIs are edge triggered, fasteois are still level-based. They require
>>>>>>>>>> masking at the point you defer them - what we do and what Linux may even
>>>>>>>>>> extend beyond that. If you mask them by raising the task priority, you
>>>>>>>>>> have to keep it raised until Linux finally handled the IRQ.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>> Or you
>>>>>>>>>> decide to mask it at IO-APIC level again.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We do not want that.
>>>>>>>>>
>>>>>>>>>> If you keep the TPR raised,
>>>>>>>>>> you will block more than what Linux wants to block.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The point is that if the TPR keeps raised, it means that primary domain
>>>>>>>>> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
>>>>>>>>> gets lowered when Linux has handled the interrupt.
>>>>>>>>>
>>>>>>>>> A week-end of testing made me sure of one thing: it works. I assure you.
>>>>>>>>
>>>>>>>> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
>>>>>>>> you face threaded IRQs - I assure you.
>>>>>>>
>>>>>>> Well, it may work (if mask/unmask callbacks work as native) but the
>>>>>>> benefit is gone: masking at IO-APIC level will be done again. Given that
>>>>>>> threaded IRQs become increasingly popular, it will also be hard to avoid
>>>>>>> them in common setups.
>>>>>>
>>>>>>
>>>>>> The thing is, if we no longer use the IO-APIC spinlock from primary
>>>>>> domain, we may not have to turn it into an ipipe_spinlock, and may be
>>>>>> able to preempt the IO-APIC masking.
>>>>>
>>>>> That might be true - but is the latency related to the lock or the
>>>>> hardware access? In the latter case, you will still stall the CPU on it
>>>>> and have to isolate the load on a non-RT CPU again.
>>>>>
>>>>> BTW, the task priority for the RT domain is a quite important parameter.
>>>>> If you put it too low, Linux can run out of vectors. If you put it too
>>>>> high, the same may happen to Xenomai - on bigger boxes.
>>>>
>>>>
>>>> Yes, and there are only 16 levels. But Xenomai does not need to many levels.
>>>
>>> How is telling you this? It's part of the system setup. And that may
>>> lean toward RT or toward non-RT. This level should be adjusted according
>>> to the current allocation of Linux and the RT domain for a particular
>>> CPU, not hard-coded or compile-time defined.
>>
>>
>> In theory, I agree, in practice, lets be crasy, assume someone would
>> want an RT serial driver with 4 irqs, an RT USB driver with 2 irqs, an
>> RT CAN driver, and say, 4 RTnet boards. That is still less than the 16
>> vectors that a single level provides, so, we can probably get along with
>> 2 levels. Or we can use a kernel parameter.
> 
> Linux - and so should we do - allocates separate levels first as that
> provides better performance for external interrupts (need to look up the
> precise reason, should be documented in the x86 code). Only if levels
> are used up, interrupts will share them.

I have seen this code, and I wondered if it was not, in fact, only
useful, where the irq flow handler were reenabling irqs (that is, before
the removal of IRQF_DISABLED), but am really not sure.

Also, some additional results on my atom:
the IO-APIC is on IO controller HUB, which is... an ICH4 if I read lspci
and the datasheets correctly. And what is more, its registers are
accessed through the (slow) LPC bus, the ISA bus replacement. It is
probably the reason why it is so slow.

And last but not least, it is not really a multi-core processor, it has
hyper-threading. Booting the processor in UP mode yields a much more
reasonable latency of 23us (still with using the TPR), whereas the usual
latency was around 30u (running the test now, will have results at
noon), so, the real gain of using the TPR is in fact much lower than
what originally announced. Basically, it seems with hyper threading,
everything is doubled.

http://sisyphus.hd.free.fr/core-3.4-latencies/atom.png

> Out of the 16 we have, about
> 3-4 should already be occupied by exception and system vectors. And, if
> you look at today's NICs e.g., you get around 3 vectors per interface at
> least. I have a more or less ordinary one here (single port, no SR-IOV)
> with 8(!) per port. So interrupt vector shortage is not that far away.

MSI vectors, or legacy vectors? Here we generate PCIe peripherals using
FPGAs, and such peripherals can only declare one legacy interrupt,
whereas they can declare several MSI vectors. Though I do not know if
this is a limitation of the PCIe FPGA IP or of the PCIe standard.

Regards.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-18  9:06                                                       ` Gilles Chanteperdrix
@ 2012-09-18  9:12                                                         ` Gilles Chanteperdrix
  2012-09-18  9:30                                                         ` Jan Kiszka
  1 sibling, 0 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-18  9:12 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/18/2012 11:06 AM, Gilles Chanteperdrix wrote:
> On 09/18/2012 10:48 AM, Jan Kiszka wrote:
>> On 2012-09-17 23:50, Gilles Chanteperdrix wrote:
>>> On 09/17/2012 08:54 PM, Jan Kiszka wrote:
>>>
>>>> On 2012-09-17 20:37, Gilles Chanteperdrix wrote:
>>>>> On 09/17/2012 08:29 PM, Jan Kiszka wrote:
>>>>>
>>>>>> On 2012-09-17 20:18, Gilles Chanteperdrix wrote:
>>>>>>> On 09/17/2012 08:15 PM, Jan Kiszka wrote:
>>>>>>>
>>>>>>>> On 2012-09-17 20:12, Jan Kiszka wrote:
>>>>>>>>> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>>>>>>>>>
>>>>>>>>>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>>>>>>>>>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>>>>>>>>>>> the same way as we can not mask MSI from primary domain, we should not
>>>>>>>>>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>>>>>>>>>>> can live with MSI without masking them in primary mode, I guess we can
>>>>>>>>>>>> do the same with fasteoi irqs.
>>>>>>>>>>>
>>>>>>>>>>> MSIs are edge triggered, fasteois are still level-based. They require
>>>>>>>>>>> masking at the point you defer them - what we do and what Linux may even
>>>>>>>>>>> extend beyond that. If you mask them by raising the task priority, you
>>>>>>>>>>> have to keep it raised until Linux finally handled the IRQ.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yes.
>>>>>>>>>>
>>>>>>>>>>> Or you
>>>>>>>>>>> decide to mask it at IO-APIC level again.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We do not want that.
>>>>>>>>>>
>>>>>>>>>>> If you keep the TPR raised,
>>>>>>>>>>> you will block more than what Linux wants to block.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The point is that if the TPR keeps raised, it means that primary domain
>>>>>>>>>> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
>>>>>>>>>> gets lowered when Linux has handled the interrupt.
>>>>>>>>>>
>>>>>>>>>> A week-end of testing made me sure of one thing: it works. I assure you.
>>>>>>>>>
>>>>>>>>> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
>>>>>>>>> you face threaded IRQs - I assure you.
>>>>>>>>
>>>>>>>> Well, it may work (if mask/unmask callbacks work as native) but the
>>>>>>>> benefit is gone: masking at IO-APIC level will be done again. Given that
>>>>>>>> threaded IRQs become increasingly popular, it will also be hard to avoid
>>>>>>>> them in common setups.
>>>>>>>
>>>>>>>
>>>>>>> The thing is, if we no longer use the IO-APIC spinlock from primary
>>>>>>> domain, we may not have to turn it into an ipipe_spinlock, and may be
>>>>>>> able to preempt the IO-APIC masking.
>>>>>>
>>>>>> That might be true - but is the latency related to the lock or the
>>>>>> hardware access? In the latter case, you will still stall the CPU on it
>>>>>> and have to isolate the load on a non-RT CPU again.
>>>>>>
>>>>>> BTW, the task priority for the RT domain is a quite important parameter.
>>>>>> If you put it too low, Linux can run out of vectors. If you put it too
>>>>>> high, the same may happen to Xenomai - on bigger boxes.
>>>>>
>>>>>
>>>>> Yes, and there are only 16 levels. But Xenomai does not need to many levels.
>>>>
>>>> How is telling you this? It's part of the system setup. And that may
>>>> lean toward RT or toward non-RT. This level should be adjusted according
>>>> to the current allocation of Linux and the RT domain for a particular
>>>> CPU, not hard-coded or compile-time defined.
>>>
>>>
>>> In theory, I agree, in practice, lets be crasy, assume someone would
>>> want an RT serial driver with 4 irqs, an RT USB driver with 2 irqs, an
>>> RT CAN driver, and say, 4 RTnet boards. That is still less than the 16
>>> vectors that a single level provides, so, we can probably get along with
>>> 2 levels. Or we can use a kernel parameter.
>>
>> Linux - and so should we do - allocates separate levels first as that
>> provides better performance for external interrupts (need to look up the
>> precise reason, should be documented in the x86 code). Only if levels
>> are used up, interrupts will share them.
> 
> I have seen this code, and I wondered if it was not, in fact, only
> useful, where the irq flow handler were reenabling irqs (that is, before
> the removal of IRQF_DISABLED), but am really not sure.
> 
> Also, some additional results on my atom:
> the IO-APIC is on IO controller HUB, which is... an ICH4 if I read lspci
> and the datasheets correctly. And what is more, its registers are
> accessed through the (slow) LPC bus, the ISA bus replacement. It is
> probably the reason why it is so slow.
> 
> And last but not least, it is not really a multi-core processor, it has
> hyper-threading. Booting the processor in UP mode yields a much more
> reasonable latency of 23us (still with using the TPR), whereas the usual
> latency was around 30u (running the test now, will have results at
> noon), so, the real gain of using the TPR is in fact much lower than
> what originally announced. Basically, it seems with hyper threading,
> everything is doubled.
> 
> http://sisyphus.hd.free.fr/core-3.4-latencies/atom.png

The results are ready actually, so, the net gain is 6.5us over 30us,
that is around 20%.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-18  9:06                                                       ` Gilles Chanteperdrix
  2012-09-18  9:12                                                         ` Gilles Chanteperdrix
@ 2012-09-18  9:30                                                         ` Jan Kiszka
  2012-09-18  9:36                                                           ` Gilles Chanteperdrix
  1 sibling, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2012-09-18  9:30 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2012-09-18 11:06, Gilles Chanteperdrix wrote:
> On 09/18/2012 10:48 AM, Jan Kiszka wrote:
>> On 2012-09-17 23:50, Gilles Chanteperdrix wrote:
>>> On 09/17/2012 08:54 PM, Jan Kiszka wrote:
>>>
>>>> On 2012-09-17 20:37, Gilles Chanteperdrix wrote:
>>>>> On 09/17/2012 08:29 PM, Jan Kiszka wrote:
>>>>>
>>>>>> On 2012-09-17 20:18, Gilles Chanteperdrix wrote:
>>>>>>> On 09/17/2012 08:15 PM, Jan Kiszka wrote:
>>>>>>>
>>>>>>>> On 2012-09-17 20:12, Jan Kiszka wrote:
>>>>>>>>> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>>>>>>>>>
>>>>>>>>>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>>>>>>>>>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>>>>>>>>>>> the same way as we can not mask MSI from primary domain, we should not
>>>>>>>>>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>>>>>>>>>>> can live with MSI without masking them in primary mode, I guess we can
>>>>>>>>>>>> do the same with fasteoi irqs.
>>>>>>>>>>>
>>>>>>>>>>> MSIs are edge triggered, fasteois are still level-based. They require
>>>>>>>>>>> masking at the point you defer them - what we do and what Linux may even
>>>>>>>>>>> extend beyond that. If you mask them by raising the task priority, you
>>>>>>>>>>> have to keep it raised until Linux finally handled the IRQ.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yes.
>>>>>>>>>>
>>>>>>>>>>> Or you
>>>>>>>>>>> decide to mask it at IO-APIC level again.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We do not want that.
>>>>>>>>>>
>>>>>>>>>>> If you keep the TPR raised,
>>>>>>>>>>> you will block more than what Linux wants to block.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The point is that if the TPR keeps raised, it means that primary domain
>>>>>>>>>> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
>>>>>>>>>> gets lowered when Linux has handled the interrupt.
>>>>>>>>>>
>>>>>>>>>> A week-end of testing made me sure of one thing: it works. I assure you.
>>>>>>>>>
>>>>>>>>> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
>>>>>>>>> you face threaded IRQs - I assure you.
>>>>>>>>
>>>>>>>> Well, it may work (if mask/unmask callbacks work as native) but the
>>>>>>>> benefit is gone: masking at IO-APIC level will be done again. Given that
>>>>>>>> threaded IRQs become increasingly popular, it will also be hard to avoid
>>>>>>>> them in common setups.
>>>>>>>
>>>>>>>
>>>>>>> The thing is, if we no longer use the IO-APIC spinlock from primary
>>>>>>> domain, we may not have to turn it into an ipipe_spinlock, and may be
>>>>>>> able to preempt the IO-APIC masking.
>>>>>>
>>>>>> That might be true - but is the latency related to the lock or the
>>>>>> hardware access? In the latter case, you will still stall the CPU on it
>>>>>> and have to isolate the load on a non-RT CPU again.
>>>>>>
>>>>>> BTW, the task priority for the RT domain is a quite important parameter.
>>>>>> If you put it too low, Linux can run out of vectors. If you put it too
>>>>>> high, the same may happen to Xenomai - on bigger boxes.
>>>>>
>>>>>
>>>>> Yes, and there are only 16 levels. But Xenomai does not need to many levels.
>>>>
>>>> How is telling you this? It's part of the system setup. And that may
>>>> lean toward RT or toward non-RT. This level should be adjusted according
>>>> to the current allocation of Linux and the RT domain for a particular
>>>> CPU, not hard-coded or compile-time defined.
>>>
>>>
>>> In theory, I agree, in practice, lets be crasy, assume someone would
>>> want an RT serial driver with 4 irqs, an RT USB driver with 2 irqs, an
>>> RT CAN driver, and say, 4 RTnet boards. That is still less than the 16
>>> vectors that a single level provides, so, we can probably get along with
>>> 2 levels. Or we can use a kernel parameter.
>>
>> Linux - and so should we do - allocates separate levels first as that
>> provides better performance for external interrupts (need to look up the
>> precise reason, should be documented in the x86 code). Only if levels
>> are used up, interrupts will share them.
> 
> I have seen this code, and I wondered if it was not, in fact, only
> useful, where the irq flow handler were reenabling irqs (that is, before
> the removal of IRQF_DISABLED), but am really not sure.

This pattern is still present with IRQF_ONESHOT, aka threaded IRQs.

> 
> Also, some additional results on my atom:
> the IO-APIC is on IO controller HUB, which is... an ICH4 if I read lspci
> and the datasheets correctly. And what is more, its registers are
> accessed through the (slow) LPC bus, the ISA bus replacement. It is
> probably the reason why it is so slow.

Yes, I was expecting some architectural limitation like this.

> 
> And last but not least, it is not really a multi-core processor, it has
> hyper-threading. Booting the processor in UP mode yields a much more
> reasonable latency of 23us (still with using the TPR), whereas the usual
> latency was around 30u (running the test now, will have results at
> noon), so, the real gain of using the TPR is in fact much lower than
> what originally announced. Basically, it seems with hyper threading,
> everything is doubled.

True, hyper-threading doesn't help with latencies in this range.

> 
> http://sisyphus.hd.free.fr/core-3.4-latencies/atom.png
> 
>> Out of the 16 we have, about
>> 3-4 should already be occupied by exception and system vectors. And, if
>> you look at today's NICs e.g., you get around 3 vectors per interface at
>> least. I have a more or less ordinary one here (single port, no SR-IOV)
>> with 8(!) per port. So interrupt vector shortage is not that far away.
> 
> MSI vectors, or legacy vectors?

MSI-X, not legacy MSI. Linux only supports one legacy MSI vector per
device due to the need to allocate multiple vectors in a consecutive
range - hard to achieve on x86 across all CPUs.

> Here we generate PCIe peripherals using
> FPGAs, and such peripherals can only declare one legacy interrupt,
> whereas they can declare several MSI vectors. Though I do not know if
> this is a limitation of the PCIe FPGA IP or of the PCIe standard.

PCIe recommends MSI-X, IIRC. Legacy INTx and MSI are still allowed and
can also be found in the field.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Xenomai] IO-APIC latencies
  2012-09-18  9:30                                                         ` Jan Kiszka
@ 2012-09-18  9:36                                                           ` Gilles Chanteperdrix
  0 siblings, 0 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-18  9:36 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On 09/18/2012 11:30 AM, Jan Kiszka wrote:
> On 2012-09-18 11:06, Gilles Chanteperdrix wrote:
>> On 09/18/2012 10:48 AM, Jan Kiszka wrote:
>>> On 2012-09-17 23:50, Gilles Chanteperdrix wrote:
>>>> On 09/17/2012 08:54 PM, Jan Kiszka wrote:
>>>>
>>>>> On 2012-09-17 20:37, Gilles Chanteperdrix wrote:
>>>>>> On 09/17/2012 08:29 PM, Jan Kiszka wrote:
>>>>>>
>>>>>>> On 2012-09-17 20:18, Gilles Chanteperdrix wrote:
>>>>>>>> On 09/17/2012 08:15 PM, Jan Kiszka wrote:
>>>>>>>>
>>>>>>>>> On 2012-09-17 20:12, Jan Kiszka wrote:
>>>>>>>>>> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> ipipe_end is a nop when called from primary domain, yes, but this is not
>>>>>>>>>>>>> very different from edge irqs. Also, fasteoi become a bit like MSI: in
>>>>>>>>>>>>> the same way as we can not mask MSI from primary domain, we should not
>>>>>>>>>>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If we
>>>>>>>>>>>>> can live with MSI without masking them in primary mode, I guess we can
>>>>>>>>>>>>> do the same with fasteoi irqs.
>>>>>>>>>>>>
>>>>>>>>>>>> MSIs are edge triggered, fasteois are still level-based. They require
>>>>>>>>>>>> masking at the point you defer them - what we do and what Linux may even
>>>>>>>>>>>> extend beyond that. If you mask them by raising the task priority, you
>>>>>>>>>>>> have to keep it raised until Linux finally handled the IRQ.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yes.
>>>>>>>>>>>
>>>>>>>>>>>> Or you
>>>>>>>>>>>> decide to mask it at IO-APIC level again.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> We do not want that.
>>>>>>>>>>>
>>>>>>>>>>>> If you keep the TPR raised,
>>>>>>>>>>>> you will block more than what Linux wants to block.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The point is that if the TPR keeps raised, it means that primary domain
>>>>>>>>>>> has preempted Linux, so, we want it to keep that way. Otherwise the TPR
>>>>>>>>>>> gets lowered when Linux has handled the interrupt.
>>>>>>>>>>>
>>>>>>>>>>> A week-end of testing made me sure of one thing: it works. I assure you.
>>>>>>>>>>
>>>>>>>>>> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer if
>>>>>>>>>> you face threaded IRQs - I assure you.
>>>>>>>>>
>>>>>>>>> Well, it may work (if mask/unmask callbacks work as native) but the
>>>>>>>>> benefit is gone: masking at IO-APIC level will be done again. Given that
>>>>>>>>> threaded IRQs become increasingly popular, it will also be hard to avoid
>>>>>>>>> them in common setups.
>>>>>>>>
>>>>>>>>
>>>>>>>> The thing is, if we no longer use the IO-APIC spinlock from primary
>>>>>>>> domain, we may not have to turn it into an ipipe_spinlock, and may be
>>>>>>>> able to preempt the IO-APIC masking.
>>>>>>>
>>>>>>> That might be true - but is the latency related to the lock or the
>>>>>>> hardware access? In the latter case, you will still stall the CPU on it
>>>>>>> and have to isolate the load on a non-RT CPU again.
>>>>>>>
>>>>>>> BTW, the task priority for the RT domain is a quite important parameter.
>>>>>>> If you put it too low, Linux can run out of vectors. If you put it too
>>>>>>> high, the same may happen to Xenomai - on bigger boxes.
>>>>>>
>>>>>>
>>>>>> Yes, and there are only 16 levels. But Xenomai does not need to many levels.
>>>>>
>>>>> How is telling you this? It's part of the system setup. And that may
>>>>> lean toward RT or toward non-RT. This level should be adjusted according
>>>>> to the current allocation of Linux and the RT domain for a particular
>>>>> CPU, not hard-coded or compile-time defined.
>>>>
>>>>
>>>> In theory, I agree, in practice, lets be crasy, assume someone would
>>>> want an RT serial driver with 4 irqs, an RT USB driver with 2 irqs, an
>>>> RT CAN driver, and say, 4 RTnet boards. That is still less than the 16
>>>> vectors that a single level provides, so, we can probably get along with
>>>> 2 levels. Or we can use a kernel parameter.
>>>
>>> Linux - and so should we do - allocates separate levels first as that
>>> provides better performance for external interrupts (need to look up the
>>> precise reason, should be documented in the x86 code). Only if levels
>>> are used up, interrupts will share them.
>>
>> I have seen this code, and I wondered if it was not, in fact, only
>> useful, where the irq flow handler were reenabling irqs (that is, before
>> the removal of IRQF_DISABLED), but am really not sure.
> 
> This pattern is still present with IRQF_ONESHOT, aka threaded IRQs.

No, from what I understand, it is different: with threaded IRQS, the
flow handler masks irqs then sends the EOI. So, the APIC does not nest.

If you re-enable the hardware interrupts before sending the EOI, you
cause the LAPIC to nest.

But I am not sure.


-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2012-09-18  9:36 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-17  6:30 [Xenomai] IO-APIC latencies Gilles Chanteperdrix
2012-09-17  7:43 ` Jan Kiszka
2012-09-17  8:07   ` Gilles Chanteperdrix
2012-09-17  8:18     ` Jan Kiszka
2012-09-17  8:32       ` Gilles Chanteperdrix
2012-09-17  9:07         ` Jan Kiszka
2012-09-17  9:29           ` Gilles Chanteperdrix
2012-09-17  9:42             ` Jan Kiszka
2012-09-17 10:00               ` Gilles Chanteperdrix
2012-09-17 10:39                 ` Henri Roosen
2012-09-17 11:14                   ` Gilles Chanteperdrix
2012-09-17 12:15                     ` Henri Roosen
2012-09-17 12:27                       ` Jan Kiszka
2012-09-17 13:46                         ` Gilles Chanteperdrix
2012-09-17 13:54                           ` Jan Kiszka
2012-09-17 14:02                             ` Gilles Chanteperdrix
2012-09-17 14:35                               ` Jan Kiszka
2012-09-17 17:46                                 ` Gilles Chanteperdrix
2012-09-17 18:05                                   ` Jan Kiszka
2012-09-17 18:08                                     ` Gilles Chanteperdrix
2012-09-17 18:12                                       ` Jan Kiszka
2012-09-17 18:13                                         ` Gilles Chanteperdrix
2012-09-17 18:15                                         ` Jan Kiszka
2012-09-17 18:16                                           ` Gilles Chanteperdrix
2012-09-17 18:18                                             ` Jan Kiszka
2012-09-17 18:18                                           ` Gilles Chanteperdrix
2012-09-17 18:22                                             ` Gilles Chanteperdrix
2012-09-17 18:29                                             ` Jan Kiszka
2012-09-17 18:37                                               ` Gilles Chanteperdrix
2012-09-17 18:54                                                 ` Jan Kiszka
2012-09-17 21:50                                                   ` Gilles Chanteperdrix
2012-09-18  8:48                                                     ` Jan Kiszka
2012-09-18  9:06                                                       ` Gilles Chanteperdrix
2012-09-18  9:12                                                         ` Gilles Chanteperdrix
2012-09-18  9:30                                                         ` Jan Kiszka
2012-09-18  9:36                                                           ` Gilles Chanteperdrix
2012-09-17 18:15                                         ` Gilles Chanteperdrix
2012-09-17 12:12                 ` Richard Cochran
2012-09-17 12:21       ` Gilles Chanteperdrix
2012-09-17 12:27         ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.