* Issue
The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
walk and zap all shadow pages one by one, also it need to zap all guest
page's rmap and all shadow page's parent spte list. Particularly, things
become worse if guest uses more memory or vcpus. It is not good for
scalability.

* Idea
Since all shadow page will be zapped, we can directly zap the mmu-cache
and rmap so that vcpu will fault on the new mmu-cache, after that, we can
directly free the memory used by old mmu-cache.

The root shadow page is little especial since they are currently used by
vcpus, we can not directly free them. So, we zap the root shadow pages and
re-add them into the new mmu-cache.

* TODO
Avoiding Marcelo beat me :), they are some works not attached to make the
patchset more smaller:
(1): batch kvm_reload_remote_mmus for zapping root shadow pages
(2): free shadow pages by using generation-number
(3): remove unneeded kvm_reload_remote_mmus after kvm_mmu_zap_all
(4): drop unnecessary @npages from kvm_arch_create_memslot
(5): rename init_kvm_mmu to init_vcpu_mmu

* Performance
The attached testcase is used to measure the time of delete / add memslot.
At that time, all vcpus are waiting, that means, no mmu-lock contention.
I believe the result be more beautiful if other vcpus and mmu notification
need to catch the mmu-lock.

Guest VCPU:6, Mem:2048M

before: Run 10 times, Avg time:46078825 ns.

after: Run 10 times, Avg time:21558774 ns. (+ 113%)