diff for duplicates of <1373501728.8183.239@snotra> diff --git a/a/1.txt b/N1/1.txt index 95b843e..64d44a1 100644 --- a/a/1.txt +++ b/N1/1.txt @@ -1,87 +1,87 @@ On 07/10/2013 05:50:01 PM, Alexander Graf wrote: -> +>=20 > On 10.07.2013, at 20:42, Scott Wood wrote: -> +>=20 > > On 07/10/2013 05:15:09 AM, Alexander Graf wrote: > >> On 10.07.2013, at 02:06, Scott Wood wrote: > >> > On 07/09/2013 04:44:24 PM, Alexander Graf wrote: > >> >> On 09.07.2013, at 20:46, Scott Wood wrote: -> >> >> > I suspect that tlbsx is faster, or at worst similar. And -> unlike comparing tlbsx to lwepx (not counting a fix for the threading -> problem), we don't already have code to search the guest TLB, so +> >> >> > I suspect that tlbsx is faster, or at worst similar. And =20 +> unlike comparing tlbsx to lwepx (not counting a fix for the threading =20 +> problem), we don't already have code to search the guest TLB, so =20 > testing would be more work. -> >> >> We have code to walk the guest TLB for TLB misses. This really +> >> >> We have code to walk the guest TLB for TLB misses. This really =20 > is just the TLB miss search without host TLB injection. -> >> >> So let's say we're using the shadow TLB. The guest always has -> its say 64 TLB entries that it can count on - we never evict anything -> by accident, because we store all of the 64 entries in our guest TLB -> cache. When the guest faults at an address, the first thing we do is +> >> >> So let's say we're using the shadow TLB. The guest always has =20 +> its say 64 TLB entries that it can count on - we never evict anything =20 +> by accident, because we store all of the 64 entries in our guest TLB =20 +> cache. When the guest faults at an address, the first thing we do is =20 > we check the cache whether we have that page already mapped. -> >> >> However, with this method we now have 2 enumeration methods for -> guest TLB searches. We have the tlbsx one which searches the host TLB -> and we have our guest TLB cache. The guest TLB cache might still -> contain an entry for an address that we already invalidated on the +> >> >> However, with this method we now have 2 enumeration methods for =20 +> guest TLB searches. We have the tlbsx one which searches the host TLB =20 +> and we have our guest TLB cache. The guest TLB cache might still =20 +> contain an entry for an address that we already invalidated on the =20 > host. Would that impose a problem? -> >> >> I guess not because we're swizzling the exit code around to -> instead be an instruction miss which means we restore the TLB entry -> into our host's TLB so that when we resume, we land here and the +> >> >> I guess not because we're swizzling the exit code around to =20 +> instead be an instruction miss which means we restore the TLB entry =20 +> into our host's TLB so that when we resume, we land here and the =20 > tlbsx hits. But it feels backwards. > >> > -> >> > Any better way? Searching the guest TLB won't work for the LRAT -> case, so we'd need to have this logic around anyway. We shouldn't -> add a second codepath unless it's a clear performance gain -- and -> again, I suspect it would be the opposite, especially if the entry is -> not in TLB0 or in one of the first few entries searched in TLB1. The +> >> > Any better way? Searching the guest TLB won't work for the LRAT =20 +> case, so we'd need to have this logic around anyway. We shouldn't =20 +> add a second codepath unless it's a clear performance gain -- and =20 +> again, I suspect it would be the opposite, especially if the entry is =20 +> not in TLB0 or in one of the first few entries searched in TLB1. The =20 > tlbsx miss case is not what we should optimize for. > >> Hrm. -> >> So let's redesign this thing theoretically. We would have an exit -> that requires an instruction fetch. We would override -> kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can -> fail because it can't find the TLB entry in the host TLB. When it -> fails, we have to abort the emulation and resume the guest at the +> >> So let's redesign this thing theoretically. We would have an exit =20 +> that requires an instruction fetch. We would override =20 +> kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can =20 +> fail because it can't find the TLB entry in the host TLB. When it =20 +> fails, we have to abort the emulation and resume the guest at the =20 > same IP. -> >> Now the guest gets the TLB miss, we populate, go back into the -> guest. The guest hits the emulation failure again. We go back to -> kvmppc_ld_inst() which succeeds this time and we can emulate the +> >> Now the guest gets the TLB miss, we populate, go back into the =20 +> guest. The guest hits the emulation failure again. We go back to =20 +> kvmppc_ld_inst() which succeeds this time and we can emulate the =20 > instruction. > > -> > That's pretty much what this patch does, except that it goes -> immediately to the TLB miss code rather than having the extra -> round-trip back to the guest. Is there any benefit from adding that -> extra round-trip? Rewriting the exit type instead doesn't seem that +> > That's pretty much what this patch does, except that it goes =20 +> immediately to the TLB miss code rather than having the extra =20 +> round-trip back to the guest. Is there any benefit from adding that =20 +> extra round-trip? Rewriting the exit type instead doesn't seem that =20 > bad... -> -> It's pretty bad. I want to have code that is easy to follow - and I -> don't care whether the very rare case of a TLB entry getting evicted -> by a random other thread right when we execute the exit path is +>=20 +> It's pretty bad. I want to have code that is easy to follow - and I =20 +> don't care whether the very rare case of a TLB entry getting evicted =20 +> by a random other thread right when we execute the exit path is =20 > slower by a few percent if we get cleaner code for that. -I guess I just don't see how this is so much harder to follow than -returning to guest. I find it harder to follow the flow when there are -more round trips to the guest involved. "Treat this as an ITLB miss" -is simpler than, "Let this fail, and make sure we retry the trapping +I guess I just don't see how this is so much harder to follow than =20 +returning to guest. I find it harder to follow the flow when there are =20 +more round trips to the guest involved. "Treat this as an ITLB miss" =20 +is simpler than, "Let this fail, and make sure we retry the trapping =20 instruction on failure. Then, an ITLB miss will happen." -Also note that making kvmppc_get_last_inst() able to fail means -updating several existing callsites, both for the change in function +Also note that making kvmppc_get_last_inst() able to fail means =20 +updating several existing callsites, both for the change in function =20 signature and to actually handle failures. -I don't care that deeply either way, it just doesn't seem obviously +I don't care that deeply either way, it just doesn't seem obviously =20 better. -> >> I think this works. Just make sure that the gateway to the -> instruction fetch is kvmppc_get_last_inst() and make that failable. -> Then the difference between looking for the TLB entry in the host's +> >> I think this works. Just make sure that the gateway to the =20 +> instruction fetch is kvmppc_get_last_inst() and make that failable. =20 +> Then the difference between looking for the TLB entry in the host's =20 > TLB or in the guest's TLB cache is hopefully negligible. > > -> > I don't follow here. What does this have to do with looking in the +> > I don't follow here. What does this have to do with looking in the =20 > guest TLB? -> -> I want to hide the fact that we're cheating as much as possible, +>=20 +> I want to hide the fact that we're cheating as much as possible, =20 > that's it. -How are we cheating, and what specifically are you proposing to do to -hide that? How is the guest TLB involved at all in the change you're +How are we cheating, and what specifically are you proposing to do to =20 +hide that? How is the guest TLB involved at all in the change you're =20 asking for? --Scott +-Scott= diff --git a/a/content_digest b/N1/content_digest index b49b792..ca04a8d 100644 --- a/a/content_digest +++ b/N1/content_digest @@ -1,100 +1,100 @@ "ref\01C54E9AA-5CE3-4540-A37D-5C2FD535EA89@suse.de\0" "From\0Scott Wood <scottwood@freescale.com>\0" "Subject\0Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation\0" - "Date\0Thu, 11 Jul 2013 00:15:28 +0000\0" + "Date\0Wed, 10 Jul 2013 19:15:28 -0500\0" "To\0Alexander Graf <agraf@suse.de>\0" "Cc\0Mihai Caraman <mihai.caraman@freescale.com>" - kvm-ppc@vger.kernel.org + linuxppc-dev@lists.ozlabs.org kvm@vger.kernel.org - " linuxppc-dev@lists.ozlabs.org\0" + " kvm-ppc@vger.kernel.org\0" "\00:1\0" "b\0" "On 07/10/2013 05:50:01 PM, Alexander Graf wrote:\n" - "> \n" + ">=20\n" "> On 10.07.2013, at 20:42, Scott Wood wrote:\n" - "> \n" + ">=20\n" "> > On 07/10/2013 05:15:09 AM, Alexander Graf wrote:\n" "> >> On 10.07.2013, at 02:06, Scott Wood wrote:\n" "> >> > On 07/09/2013 04:44:24 PM, Alexander Graf wrote:\n" "> >> >> On 09.07.2013, at 20:46, Scott Wood wrote:\n" - "> >> >> > I suspect that tlbsx is faster, or at worst similar. And \n" - "> unlike comparing tlbsx to lwepx (not counting a fix for the threading \n" - "> problem), we don't already have code to search the guest TLB, so \n" + "> >> >> > I suspect that tlbsx is faster, or at worst similar. And =20\n" + "> unlike comparing tlbsx to lwepx (not counting a fix for the threading =20\n" + "> problem), we don't already have code to search the guest TLB, so =20\n" "> testing would be more work.\n" - "> >> >> We have code to walk the guest TLB for TLB misses. This really \n" + "> >> >> We have code to walk the guest TLB for TLB misses. This really =20\n" "> is just the TLB miss search without host TLB injection.\n" - "> >> >> So let's say we're using the shadow TLB. The guest always has \n" - "> its say 64 TLB entries that it can count on - we never evict anything \n" - "> by accident, because we store all of the 64 entries in our guest TLB \n" - "> cache. When the guest faults at an address, the first thing we do is \n" + "> >> >> So let's say we're using the shadow TLB. The guest always has =20\n" + "> its say 64 TLB entries that it can count on - we never evict anything =20\n" + "> by accident, because we store all of the 64 entries in our guest TLB =20\n" + "> cache. When the guest faults at an address, the first thing we do is =20\n" "> we check the cache whether we have that page already mapped.\n" - "> >> >> However, with this method we now have 2 enumeration methods for \n" - "> guest TLB searches. We have the tlbsx one which searches the host TLB \n" - "> and we have our guest TLB cache. The guest TLB cache might still \n" - "> contain an entry for an address that we already invalidated on the \n" + "> >> >> However, with this method we now have 2 enumeration methods for =20\n" + "> guest TLB searches. We have the tlbsx one which searches the host TLB =20\n" + "> and we have our guest TLB cache. The guest TLB cache might still =20\n" + "> contain an entry for an address that we already invalidated on the =20\n" "> host. Would that impose a problem?\n" - "> >> >> I guess not because we're swizzling the exit code around to \n" - "> instead be an instruction miss which means we restore the TLB entry \n" - "> into our host's TLB so that when we resume, we land here and the \n" + "> >> >> I guess not because we're swizzling the exit code around to =20\n" + "> instead be an instruction miss which means we restore the TLB entry =20\n" + "> into our host's TLB so that when we resume, we land here and the =20\n" "> tlbsx hits. But it feels backwards.\n" "> >> >\n" - "> >> > Any better way? Searching the guest TLB won't work for the LRAT \n" - "> case, so we'd need to have this logic around anyway. We shouldn't \n" - "> add a second codepath unless it's a clear performance gain -- and \n" - "> again, I suspect it would be the opposite, especially if the entry is \n" - "> not in TLB0 or in one of the first few entries searched in TLB1. The \n" + "> >> > Any better way? Searching the guest TLB won't work for the LRAT =20\n" + "> case, so we'd need to have this logic around anyway. We shouldn't =20\n" + "> add a second codepath unless it's a clear performance gain -- and =20\n" + "> again, I suspect it would be the opposite, especially if the entry is =20\n" + "> not in TLB0 or in one of the first few entries searched in TLB1. The =20\n" "> tlbsx miss case is not what we should optimize for.\n" "> >> Hrm.\n" - "> >> So let's redesign this thing theoretically. We would have an exit \n" - "> that requires an instruction fetch. We would override \n" - "> kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can \n" - "> fail because it can't find the TLB entry in the host TLB. When it \n" - "> fails, we have to abort the emulation and resume the guest at the \n" + "> >> So let's redesign this thing theoretically. We would have an exit =20\n" + "> that requires an instruction fetch. We would override =20\n" + "> kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can =20\n" + "> fail because it can't find the TLB entry in the host TLB. When it =20\n" + "> fails, we have to abort the emulation and resume the guest at the =20\n" "> same IP.\n" - "> >> Now the guest gets the TLB miss, we populate, go back into the \n" - "> guest. The guest hits the emulation failure again. We go back to \n" - "> kvmppc_ld_inst() which succeeds this time and we can emulate the \n" + "> >> Now the guest gets the TLB miss, we populate, go back into the =20\n" + "> guest. The guest hits the emulation failure again. We go back to =20\n" + "> kvmppc_ld_inst() which succeeds this time and we can emulate the =20\n" "> instruction.\n" "> >\n" - "> > That's pretty much what this patch does, except that it goes \n" - "> immediately to the TLB miss code rather than having the extra \n" - "> round-trip back to the guest. Is there any benefit from adding that \n" - "> extra round-trip? Rewriting the exit type instead doesn't seem that \n" + "> > That's pretty much what this patch does, except that it goes =20\n" + "> immediately to the TLB miss code rather than having the extra =20\n" + "> round-trip back to the guest. Is there any benefit from adding that =20\n" + "> extra round-trip? Rewriting the exit type instead doesn't seem that =20\n" "> bad...\n" - "> \n" - "> It's pretty bad. I want to have code that is easy to follow - and I \n" - "> don't care whether the very rare case of a TLB entry getting evicted \n" - "> by a random other thread right when we execute the exit path is \n" + ">=20\n" + "> It's pretty bad. I want to have code that is easy to follow - and I =20\n" + "> don't care whether the very rare case of a TLB entry getting evicted =20\n" + "> by a random other thread right when we execute the exit path is =20\n" "> slower by a few percent if we get cleaner code for that.\n" "\n" - "I guess I just don't see how this is so much harder to follow than \n" - "returning to guest. I find it harder to follow the flow when there are \n" - "more round trips to the guest involved. \"Treat this as an ITLB miss\" \n" - "is simpler than, \"Let this fail, and make sure we retry the trapping \n" + "I guess I just don't see how this is so much harder to follow than =20\n" + "returning to guest. I find it harder to follow the flow when there are =20\n" + "more round trips to the guest involved. \"Treat this as an ITLB miss\" =20\n" + "is simpler than, \"Let this fail, and make sure we retry the trapping =20\n" "instruction on failure. Then, an ITLB miss will happen.\"\n" "\n" - "Also note that making kvmppc_get_last_inst() able to fail means \n" - "updating several existing callsites, both for the change in function \n" + "Also note that making kvmppc_get_last_inst() able to fail means =20\n" + "updating several existing callsites, both for the change in function =20\n" "signature and to actually handle failures.\n" "\n" - "I don't care that deeply either way, it just doesn't seem obviously \n" + "I don't care that deeply either way, it just doesn't seem obviously =20\n" "better.\n" "\n" - "> >> I think this works. Just make sure that the gateway to the \n" - "> instruction fetch is kvmppc_get_last_inst() and make that failable. \n" - "> Then the difference between looking for the TLB entry in the host's \n" + "> >> I think this works. Just make sure that the gateway to the =20\n" + "> instruction fetch is kvmppc_get_last_inst() and make that failable. =20\n" + "> Then the difference between looking for the TLB entry in the host's =20\n" "> TLB or in the guest's TLB cache is hopefully negligible.\n" "> >\n" - "> > I don't follow here. What does this have to do with looking in the \n" + "> > I don't follow here. What does this have to do with looking in the =20\n" "> guest TLB?\n" - "> \n" - "> I want to hide the fact that we're cheating as much as possible, \n" + ">=20\n" + "> I want to hide the fact that we're cheating as much as possible, =20\n" "> that's it.\n" "\n" - "How are we cheating, and what specifically are you proposing to do to \n" - "hide that? How is the guest TLB involved at all in the change you're \n" + "How are we cheating, and what specifically are you proposing to do to =20\n" + "hide that? How is the guest TLB involved at all in the change you're =20\n" "asking for?\n" "\n" - -Scott + -Scott= -315049d8e518f928bcacfbfd95d7c009ba190999487de9ee41b71541916859df +aa6a66625501297ab56acb62934affa938bb4bbc2e9d324a448d7f5cd202393f
diff --git a/a/content_digest b/N2/content_digest index b49b792..da9dbfd 100644 --- a/a/content_digest +++ b/N2/content_digest @@ -1,12 +1,12 @@ "ref\01C54E9AA-5CE3-4540-A37D-5C2FD535EA89@suse.de\0" "From\0Scott Wood <scottwood@freescale.com>\0" "Subject\0Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation\0" - "Date\0Thu, 11 Jul 2013 00:15:28 +0000\0" + "Date\0Wed, 10 Jul 2013 19:15:28 -0500\0" "To\0Alexander Graf <agraf@suse.de>\0" "Cc\0Mihai Caraman <mihai.caraman@freescale.com>" - kvm-ppc@vger.kernel.org - kvm@vger.kernel.org - " linuxppc-dev@lists.ozlabs.org\0" + <kvm-ppc@vger.kernel.org> + <kvm@vger.kernel.org> + " <linuxppc-dev@lists.ozlabs.org>\0" "\00:1\0" "b\0" "On 07/10/2013 05:50:01 PM, Alexander Graf wrote:\n" @@ -97,4 +97,4 @@ "\n" -Scott -315049d8e518f928bcacfbfd95d7c009ba190999487de9ee41b71541916859df +c90efabdc75f6726b6774d1db27c3f7faefe0995da70c6de94767de522dd7591
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.