All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: xen-devel@lists.xensource.com
Subject: Re: Problems with MSI interrupts
Date: Wed, 3 Aug 2011 13:05:02 +0100	[thread overview]
Message-ID: <4E39396E.1000205@citrix.com> (raw)
In-Reply-To: <4E393632.4020300@citrix.com>



On 03/08/11 12:51, Andrew Cooper wrote:
> Hello,
>
> I am currently investigating an issue with MSI allocation/deallocation
> which appears to be an MSI resource leak in Xen.  This is XenServer 6.0
> based on Xen 4.1.1, with no changesets I can see affecting the relevant
> Xen codepaths.
>
> The box in question is a Netscalar SDX box with 24 logical cores (2
> Nehalem sockets , 6 cores , hyperthreading), 96GB RAM, with 4 dual-port
> Intel 10G ixgbe cards, (and two SSL 'Xcelerator' cards, but I have
> disabled these for debugging purposes).  Each of the 8 NIC ports exports
> 40 virtual functions.  There are 40 (identical) VMs which have 1 VF from
> each NIC passed through to them, giving each VM 8 VFs.  Each VF itself
> uses 3 MSI-X interrupts.  Therefore, for all VMs to be working
> correctly, there are 3irqs per VF for 8 VFs for 40 VMs = 960 MSI-X
> interrupts.
>
> The symptoms are: Reboot the VMs a couple of times, and eventually Xen
> says "(XEN) ../physdev.c:140: domXXX: can't create irq for msi!".  After
> adding extra debugging, the call call to create_irq() was returning
> -ENOSPC.  At the point at which create_irq() was failing, there were
> huge numbers of irqs listed with the debugkeys 'i' with a descriptor
> affinity mask of all cpus, which I believe is interfering with the
> calculations in __assign_irq_vector().
>
> I suspected that this might be because of scheduling under load swapping
> VCPUs across PCPUs, resulting in the irq descriptor being written into
> all PCPU IDTs.  As a result, I pinned each VM to a specific PCPU in the
> hope that this would go away.
>
> When starting each VM individually, the problem appears to go away. 
> However, when starting all VMs at once, there are still some irqs with
> an affinity mask of all CPUs.
>
> Specifically, one case is this:  (I added extra debugging to put
> irq_cfg->cpu_mask into the 'i' debugkeys)
>
> (XEN)    IRQ: 845 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000010 vec:7e type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 55(----),
> (XEN)    IRQ: 846 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:86 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 54(----),
> (XEN)    IRQ: 847 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:96 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 53(----),
> (XEN)    IRQ: 848 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:be type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 52(----),
> (XEN)    IRQ: 849 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:c6 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 51(----),
> (XEN)    IRQ: 850 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:ce type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 50(----),
> (XEN)    IRQ: 851 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:b7 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 49(----),
> (XEN)    IRQ: 852 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:cf type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 48(----),
> (XEN)    IRQ: 853 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:d7 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 47(----),
> (XEN)    IRQ: 854 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:d9 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 46(----),
> (XEN)    IRQ: 855 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:22 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 45(----),
> (XEN)    IRQ: 856 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:2a type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 44(----),
> (XEN)    IRQ: 857 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000010 vec:3c type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 43(----),
> (XEN)    IRQ: 858 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:4c type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 42(----),
> (XEN)    IRQ: 859 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:54 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 41(----),
> (XEN)    IRQ: 860 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:b5 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 40(----),
> (XEN)    IRQ: 861 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:ae type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 39(----),
> (XEN)    IRQ: 862 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:de type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 38(----),
> (XEN)    IRQ: 863 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000010 vec:55 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 37(----),
> (XEN)    IRQ: 864 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:9d type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 36(----),
> (XEN)    IRQ: 865 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:46 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 35(----),
> (XEN)    IRQ: 866 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:a6 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 34(----),
> (XEN)    IRQ: 867 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:5f type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 33(----),
> (XEN)    IRQ: 868 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:7f type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 32(----),
>
> Shows all irqs for dom34.  The descriptors have full affinity, but the
> irq_cfg has a cpu_mask between processor 8 and 9.
>
> The domain dump for dom34 is
> (XEN) General information for domain 34:
> (XEN)     refcnt=3 dying=0 nr_pages=131065 xenheap_pages=8 dirty_cpus={}
> max_pages=133376
> (XEN)     handle=97ef6eef-69c2-024c-1bbb-a150ca668691 vm_assist=00000000
> (XEN)     paging assistance: hap refcounts translate external
> (XEN) Rangesets belonging to domain 34:
> (XEN)     I/O Ports  { }
> (XEN)     Interrupts { 32-55 }
> (XEN)     I/O Memory { f9f00-f9f03, fa001-fa003, fa19c-fa19f,
> fa29d-fa29f, fa39c-fa39f, fa49d-fa49f, fa59c-fa59f, fa69d-fa69f,
> fa79c-fa79f, fa89d-fa89f, fa99c-fa99f, faa9d-faa9f, fab9c-fab9f,
> fac9d-fac9f, fad9c-fad9f, fae9d-fae9f }
> (XEN) Memory pages belonging to domain 34:
> (XEN)     DomPage list too long to display
> (XEN)     P2M entry stats:
> (XEN)      L1:     1590 entries, 6512640 bytes
> (XEN)      L2:      253 entries, 530579456 bytes
> (XEN)     PoD entries=0 cachesize=0 superpages=0
> (XEN)     XenPage 00000000001146e1: caf=c000000000000001,
> taf=7400000000000001
> (XEN)     XenPage 00000000001146e0: caf=c000000000000001,
> taf=7400000000000001
> (XEN)     XenPage 00000000001146df: caf=c000000000000001,
> taf=7400000000000001
> (XEN)     XenPage 00000000001146de: caf=c000000000000001,
> taf=7400000000000001
> (XEN)     XenPage 00000000000bdc0e: caf=c000000000000001,
> taf=7400000000000001
> (XEN)     XenPage 0000000000114592: caf=c000000000000001,
> taf=7400000000000001
> (XEN)     XenPage 000000000011458f: caf=c000000000000001,
> taf=7400000000000001
> (XEN)     XenPage 000000000011458c: caf=c000000000000001,
> taf=7400000000000001
> (XEN) VCPU information and callbacks for domain 34:
> (XEN)     VCPU0: CPU3 [has=F] flags=1 poll=0 upcall_pend = 00,
> upcall_mask = 00 dirty_cpus={} cpu_affinity={3}
> (XEN)     paging assistance: hap, 4 levels
> (XEN)     No periodic timer
> (XEN)     VCPU1: CPU3 [has=F] flags=1 poll=0 upcall_pend = 00,
> upcall_mask = 00 dirty_cpus={3} cpu_affinity={3}
> (XEN)     paging assistance: hap, 4 levels
> (XEN)     No periodic timer
>
> Showing that this domain is actually pinned to pcpu 3.
>
> Am I mis-interpreting the information, or does this indicate that the
> scheduler (credit) is not obeying the cpu_affinity?  The virtual
> functions seem to be passing network traffic correctly so I would assume
> that interrupts are getting where they are supposed to be going.
>
>
> Another question which may or may not be related.  cpu_cfg has a vector
> and a cpu_mask.  From this, I assume that the same interrupt must occupy
> the same IDT entry for every pcpu it might be received on.  Is there an
> architectural reason why this should be the case, or is it just the way
> Xen is coded?
>
> (Also, it seems that <asm/irq.h> and <xen/irq.h> both define struct
> irq_cfg and while one is strictly an extension of the other, there
> appears to be no guards around them meaning that sizeof(irq_cfg) depends
> on which header file you include.  I don't know if this is relevant or
> not, but it strikes me that code getting confused as to which they are
> using could be computing on junk if it is expecting the longer irq_cfg
> and actually getting the shorter irq_cfg.
Correction - I wasn't reading the source closely enough.  There are
#ifdef __ia64__ guards around this.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

      parent reply	other threads:[~2011-08-03 12:05 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-03 11:51 Problems with MSI interrupts Andrew Cooper
2011-08-03 12:02 ` Keir Fraser
2011-08-03 12:05 ` Andrew Cooper [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E39396E.1000205@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.