From mboxrd@z Thu Jan 1 00:00:00 1970 From: Radim =?utf-8?B?S3LEjW3DocWZ?= Subject: Re: [PATCH 8/8] KVM: x86: simplify kvm_apic_map Date: Fri, 30 Jan 2015 17:57:47 +0100 Message-ID: <20150130165747.GD27414@potion.redhat.com> References: <1422568135-28402-1-git-send-email-rkrcmar@redhat.com> <1422568135-28402-9-git-send-email-rkrcmar@redhat.com> <54CB4C5C.6090706@redhat.com> <20150130151442.GC27414@potion.redhat.com> <54CBA1F9.6070206@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Nadav Amit , Gleb Natapov To: Paolo Bonzini Return-path: Content-Disposition: inline In-Reply-To: <54CBA1F9.6070206@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org 2015-01-30 16:23+0100, Paolo Bonzini: > On 30/01/2015 16:14, Radim Kr=C4=8Dm=C3=A1=C5=99 wrote: > > > > + case KVM_APIC_MODE_XAPIC_FLAT: > > > > + *cid =3D 0; > > > > + *lid =3D ldr & 0xff; > > > > + return true; > > > > + case KVM_APIC_MODE_XAPIC_CLUSTER: > > > > + *cid =3D (ldr >> 4) & 0xf; > > > > + *lid =3D ldr & 0xf; > > > > + return true; > > > > + case KVM_APIC_MODE_X2APIC: > > > > + *cid =3D ldr >> 16; > > > > + *lid =3D ldr & 0xffff; > > > > + return true; > > > > + } > >=20 > >> > lid_bits =3D mode; > >> > cid_bits =3D mode & (16 | 4); > >> > lid_mask =3D (1 << lid_bits) - 1; > >> > cid_mask =3D (1 << cid_bits) - 1; > >> >=20 > >> > *cid =3D (ldr >> lid_bits) & cid_mask; > >> > *lid =3D ldr & lid_mask; > > Would jump predictor fail on the switch? Or is size of the code th= at > > important? This code is shorter, but is going to execute far more > > operations, so I think it would be slower ... (And harder to read.) >=20 > Considering the additional comparisons for the switch, I don't think > it's going to execute far more operations... (A quick assembly comparison [1].) As optimizations go, we could drop the "&" on cid as "cid < 16" is checked later, so mode=3D4 practically does nothing ... Not the best fo= r future bugs, but still pretty safe -- only x2APIC can set a value that would require the "&" and it can't have valid XAPIC mode, so u32 still protects us enough. *cid =3D ldr >> mode; *lid =3D ldr & ((1 << mode) - 1); Which is probably faster a solution where we don't shuffle switch cases to jump to x2APIC first. A comparison is at the very bottom [2]. Would that be ok in v2? --- 1: To make it easier, it wasn't inlined, sorry. There are four parts, first one compares the whole functions and three subsequent compare each switch case, with common head/tail omitted. (We don't care about performance when the map is invalid.= ) The optimized version is always first and next to it is the old one= =2E 0000000000001700 : (ALI from now on) 1700: callq 1705 1700: callq 1705 1705: push %rbp 1705: movzbl 0x10(%rdi),%eax 1706: movzbl 0x10(%rdi),%r8d 1709: push %rbp 170b: mov %rcx,%r10 170a: mov %rsp,%rbp 170e: xor %eax,%eax 170d: cmp $0x8,%al 1710: mov %rsp,%rbp 170f: je 1758 1713: lea -0x1(%r8),%ecx 1711: cmp $0x10,%al 1717: test %r8d,%ecx 1713: je 1740 171a: jne 1752 1715: cmp $0x4,%al 171c: mov %r8d,%edi 1717: je 1720 171f: mov $0x1,%eax 1719: xor %eax,%eax 1724: mov %esi,%r8d 171b: pop %rbp 1727: mov %edi,%ecx 171c: retq =20 1729: mov %eax,%r9d 171d: nopl (%rax) 172c: shr %cl,%r8d 1720: mov %esi,%eax 172f: and $0x14,%ecx 1722: and $0xf,%esi 1732: shl %cl,%r9d 1725: shr $0x4,%eax 1735: mov %edi,%ecx 1728: and $0xf,%eax 1737: shl %cl,%eax 172b: mov %ax,(%rdx) 1739: sub $0x1,%r9d 172e: mov $0x1,%eax 173d: sub $0x1,%eax 1733: mov %si,(%rcx) 1740: and %r9d,%r8d 1736: pop %rbp 1743: and %esi,%eax 1737: retq =20 1745: mov %r8w,(%rdx) 1738: nopl 0x0(%rax,%rax,1) 1749: mov %ax,(%r10) 173f:=09 174d: mov $0x1,%eax 1740: mov %esi,%eax 1752: pop %rbp 1742: shr $0x10,%eax 1753: retq 1745: mov %ax,(%rdx) 1748: mov $0x1,%eax 174d: mov %si,(%rcx) 1750: pop %rbp 1751: retq =20 1752: nopw 0x0(%rax,%rax,1) 1758: xor %eax,%eax 175a: and $0xff,%si 175f: mov %ax,(%rdx) 1762: mov $0x1,%eax 1767: mov %si,(%rcx) 176a: pop %rbp 176b: retq =20 170b: mov %rcx,%r10 170a: mov %rsp,%rbp 170e: xor %eax,%eax 170d: cmp $0x8,%al 1710: mov %rsp,%rbp 170f: je 1758 1713: lea -0x1(%r8),%ecx [jump] 1717: test %r8d,%ecx 1758: xor %eax,%eax 171a: jne 1752 175a: and $0xff,%si 171c: mov %r8d,%edi 175f: mov %ax,(%rdx) 171f: mov $0x1,%eax 1762: mov $0x1,%eax 1724: mov %esi,%r8d 1767: mov %si,(%rcx) 1727: mov %edi,%ecx =20 1729: mov %eax,%r9d =20 172c: shr %cl,%r8d =20 172f: and $0x14,%ecx =20 1732: shl %cl,%r9d =20 1735: mov %edi,%ecx =20 1737: shl %cl,%eax =20 1739: sub $0x1,%r9d =20 173d: sub $0x1,%eax =20 1740: and %r9d,%r8d =20 1743: and %esi,%eax =20 1745: mov %r8w,(%rdx) =20 1749: mov %ax,(%r10) =20 174d: mov $0x1,%eax =20 170b: mov %rcx,%r10 170a: mov %rsp,%rbp 170e: xor %eax,%eax 170d: cmp $0x8,%al 1710: mov %rsp,%rbp 170f: je 1758 1713: lea -0x1(%r8),%ecx 1711: cmp $0x10,%al 1717: test %r8d,%ecx 1713: je 1740 171a: jne 1752 [jump] 171c: mov %r8d,%edi 1740: mov %esi,%eax 171f: mov $0x1,%eax 1742: shr $0x10,%eax 1724: mov %esi,%r8d 1745: mov %ax,(%rdx) 1727: mov %edi,%ecx 1748: mov $0x1,%eax 1729: mov %eax,%r9d 174d: mov %si,(%rcx) 172c: shr %cl,%r8d =20 172f: and $0x14,%ecx =20 1732: shl %cl,%r9d =20 1735: mov %edi,%ecx =20 1737: shl %cl,%eax =20 1739: sub $0x1,%r9d =20 173d: sub $0x1,%eax =20 1740: and %r9d,%r8d =20 1743: and %esi,%eax =20 1745: mov %r8w,(%rdx) =20 1749: mov %ax,(%r10) =20 174d: mov $0x1,%eax =20 170b: mov %rcx,%r10 170a: mov %rsp,%rbp =20 170e: xor %eax,%eax 170d: cmp $0x8,%al 1710: mov %rsp,%rbp 170f: je 1758 1713: lea -0x1(%r8),%ecx 1711: cmp $0x10,%al 1717: test %r8d,%ecx 1713: je 1740 171a: jne 1752 1715: cmp $0x4,%al 171c: mov %r8d,%edi 1717: je 1720 171f: mov $0x1,%eax [jump] 1724: mov %esi,%r8d 1720: mov %esi,%eax 1727: mov %edi,%ecx 1722: and $0xf,%esi 1729: mov %eax,%r9d 1725: shr $0x4,%eax 172c: shr %cl,%r8d 1728: and $0xf,%eax 172f: and $0x14,%ecx 172b: mov %ax,(%rdx) 1732: shl %cl,%r9d 172e: mov $0x1,%eax 1735: mov %edi,%ecx 1733: mov %si,(%rcx) 1737: shl %cl,%eax =20 1739: sub $0x1,%r9d =20 173d: sub $0x1,%eax =20 1740: and %r9d,%r8d =20 1743: and %esi,%eax =20 1745: mov %r8w,(%rdx) =20 1749: mov %ax,(%r10) =20 174d: mov $0x1,%eax =20 2: A comparison of aggressively optimized ALI with the worst case scenario, where x2APIC is the third jump. 16e5: mov %rcx,%r9 170a: mov %rsp,%rbp =20 16ed: xor %eax,%eax 170d: cmp $0x8,%al 16ef: mov %rsp,%rbp 170f: je 1758 16f2: lea -0x1(%rcx),%r8d 1711: cmp $0x10,%al 16f6: test %ecx,%r8d 1713: je 1740 16f9: jne 1717 1715: cmp $0x4,%al 16fb: mov %esi,%eax 1717: je 1720 16fd: shr %cl,%eax [jump] 16ff: mov %ax,(%rdx) 1720: mov %esi,%eax 1702: mov $0x1,%eax 1722: and $0xf,%esi 1707: shl %cl,%eax 1725: shr $0x4,%eax 1709: sub $0x1,%eax 1728: and $0xf,%eax 170c: and %esi,%eax 172b: mov %ax,(%rdx) 170e: mov %ax,(%r9) 172e: mov $0x1,%eax 1712: mov $0x1,%eax 1733: mov %si,(%rcx)