All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: Sean Christopherson <seanjc@google.com>
Cc: David Stevens <stevensd@chromium.org>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	 Yu Zhang <yu.c.zhang@linux.intel.com>,
	 Isaku Yamahata <isaku.yamahata@gmail.com>,
	 Zhi Wang <zhi.wang.linux@gmail.com>,
	Maxim Levitsky <mlevitsk@redhat.com>,
	 kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org,
	 kvm@vger.kernel.org
Subject: Re: [PATCH v11 0/8] KVM: allow mapping non-refcounted pages
Date: Wed, 31 Jul 2024 12:41:22 +0100	[thread overview]
Message-ID: <87cymtdc0t.fsf@draig.linaro.org> (raw)
In-Reply-To: <ZnXHQid_N1w4kLoC@google.com> (Sean Christopherson's message of "Fri, 21 Jun 2024 11:32:34 -0700")

Sean Christopherson <seanjc@google.com> writes:

> On Thu, Feb 29, 2024, David Stevens wrote:
>> From: David Stevens <stevensd@chromium.org>
>> 
>> This patch series adds support for mapping VM_IO and VM_PFNMAP memory
>> that is backed by struct pages that aren't currently being refcounted
>> (e.g. tail pages of non-compound higher order allocations) into the
>> guest.
>> 
>> Our use case is virtio-gpu blob resources [1], which directly map host
>> graphics buffers into the guest as "vram" for the virtio-gpu device.
>> This feature currently does not work on systems using the amdgpu driver,
>> as that driver allocates non-compound higher order pages via
>> ttm_pool_alloc_page().
>> 
>> First, this series replaces the gfn_to_pfn_memslot() API with a more
>> extensible kvm_follow_pfn() API. The updated API rearranges
>> gfn_to_pfn_memslot()'s args into a struct and where possible packs the
>> bool arguments into a FOLL_ flags argument. The refactoring changes do
>> not change any behavior.
>> 
>> From there, this series extends the kvm_follow_pfn() API so that
>> non-refconuted pages can be safely handled. This invloves adding an
>> input parameter to indicate whether the caller can safely use
>> non-refcounted pfns and an output parameter to tell the caller whether
>> or not the returned page is refcounted. This change includes a breaking
>> change, by disallowing non-refcounted pfn mappings by default, as such
>> mappings are unsafe. To allow such systems to continue to function, an
>> opt-in module parameter is added to allow the unsafe behavior.
>> 
>> This series only adds support for non-refcounted pages to x86. Other
>> MMUs can likely be updated without too much difficulty, but it is not
>> needed at this point. Updating other parts of KVM (e.g. pfncache) is not
>> straightforward [2].
>
> FYI, on the off chance that someone else is eyeballing this, I am working on
> revamping this series.  It's still a ways out, but I'm optimistic that we'll be
> able to address the concerns raised by Christoph and Christian, and maybe even
> get KVM out of the weeds straightaway (PPC looks thorny :-/).

I've applied this series to the latest 6.9.x while attempting to
diagnose some of the virtio-gpu problems it may or may not address.
However launching KVM guests keeps triggering a bunch of BUGs that
eventually leave a hung guest:

  12:16:54 [root@draig:~] # dmesg -c                                                                                                                                           
  [252080.141629] RAX: ffffffffffffffda RBX: 0000560a64915500 RCX: 00007faa23e81c5b                                                                                            
  [252080.141629] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000017                                                                                            
  [252080.141630] RBP: 000000000000ae80 R08: 0000000000000000 R09: 0000000000000000                                                                                            
  [252080.141630] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000                                                                                            
  [252080.141631] R13: 0000000000000001 R14: 00000000000000b2 R15: 0000000000000002                                                                                            
  [252080.141632]  </TASK>                                                                                                                                                     
  [252080.141632] BUG: Bad page state in process CPU 0/KVM  pfn:fb1665                                                                                                         
  [252080.141633] page: refcount:0 mapcount:1 mapping:0000000000000000 index:0x7fa8117c3 pfn:0xfb1665                                                                          
  [252080.141633] flags: 0x17ffffc00a000c(referenced|uptodate|mappedtodisk|swapbacked|node=0|zone=2|lastcpupid=0x1fffff)                                                       
  [252080.141634] page_type: 0x0()                                                                                                                                             
  [252080.141635] raw: 0017ffffc00a000c dead000000000100 dead000000000122 0000000000000000                                                                                     
  [252080.141635] raw: 00000007fa8117c3 0000000000000000 0000000000000000 0000000000000000                                                                                     
  [252080.141635] page dumped because: nonzero mapcount                                                                                                                        
  [252080.141636] Modules linked in: vhost_net vhost vhost_iotlb tap tun uas usb_storage veth cfg80211 nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter nft_ma
  sq wireguard libchacha20poly1305 chacha_x86_64 poly1305_x86_64 curve25519_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel rfcomm snd_seq_dummy snd_hrtimer s
  nd_seq xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetl
  ink bridge stp llc qrtr overlay cmac algif_hash algif_skcipher af_alg bnep binfmt_misc squashfs snd_hda_codec_hdmi intel_uncore_frequency snd_ctl_led intel_uncore_frequency_
  common ledtrig_audio x86_pkg_temp_thermal intel_powerclamp coretemp snd_sof_pci_intel_tgl snd_sof_intel_hda_common kvm_intel soundwire_intel soundwire_generic_allocation btu
  sb snd_sof_intel_hda_mlink sd_mod soundwire_cadence btrtl snd_hda_codec_realtek kvm sg snd_sof_intel_hda btintel snd_sof_pci btbcm snd_hda_codec_generic btmtk               
  [252080.141656]  snd_sof_xtensa_dsp crc32_pclmul bluetooth snd_hda_scodec_component ghash_clmulni_intel snd_sof sha256_ssse3 sha1_ssse3 snd_sof_utils snd_soc_hdac_hda snd_hd
  a_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress soundwire_bus sha3_generic jitterentropy_rng aesni_intel snd_hda_intel snd_intel_dspcfg crypto_sim
  d sha512_ssse3 snd_intel_sdw_acpi cryptd sha512_generic uvcvideo snd_hda_codec snd_usb_audio videobuf2_vmalloc uvc ctr videobuf2_memops snd_hda_core snd_usbmidi_lib videobuf
  2_v4l2 snd_rawmidi drbg snd_hwdep dell_wmi snd_seq_device nls_ascii ahci ansi_cprng iTCO_wdt processor_thermal_device_pci videodev nls_cp437 snd_pcm intel_pmc_bxt dell_smbio
  s libahci processor_thermal_device rapl rtsx_pci_sdmmc iTCO_vendor_support ecdh_generic mmc_core mei_hdcp watchdog libata intel_rapl_msr videobuf2_common rfkill vfat process
  or_thermal_wt_hint pl2303 snd_timer dcdbas dell_wmi_ddv dell_wmi_sysman processor_thermal_rfim ucsi_acpi fat intel_cstate usbserial intel_uncore cdc_acm mc battery ecc      
  [252080.141670]  firmware_attributes_class dell_wmi_descriptor wmi_bmof dell_smm_hwmon processor_thermal_rapl pcspkr scsi_mod mei_me intel_lpss_pci snd typec_ucsi igc e1000e
   i2c_i801 rtsx_pci intel_rapl_common intel_lpss roles mei soundcore processor_thermal_wt_req i2c_smbus idma64 scsi_common processor_thermal_power_floor typec processor_therm
  al_mbox button intel_pmc_core int3403_thermal int340x_thermal_zone intel_vsec pmt_telemetry intel_hid int3400_thermal pmt_class sparse_keymap acpi_tad acpi_pad acpi_thermal_
  rel msr parport_pc ppdev lp parport fuse loop efi_pstore configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 hid_microsoft joydev ff_memless hid_generic usb
  hid hid btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq evdev dm_mod i915 i2c_algo_bit drm_buddy ttm drm_display_helper xhci_pci xhci_hcd drm_kms_helper nvme nvm
  e_core drm t10_pi usbcore video crc64_rocksoft crc64 crc_t10dif cec crct10dif_generic crct10dif_pclmul crc32c_intel rc_core usb_common crct10dif_common wmi                  
  [252080.141686]  pinctrl_alderlake                                                                                                                                           
  [252080.141686] CPU: 8 PID: 1819169 Comm: CPU 0/KVM Tainted: G    B   W          6.9.12-ajb-00008-gfcd4b7efbad0 #17                                                          
  [252080.141687] Hardware name: Dell Inc. Precision 3660/0PRR48, BIOS 2.8.1 08/14/2023                                                                                        
  [252080.141688] Call Trace:                                                                                                                                                  
  [252080.141688]  <TASK>                                                                                                                                                      
  [252080.141688]  dump_stack_lvl+0x60/0x80                                                                                                                                    
  [252080.141689]  bad_page+0x70/0x100                                                                                                                                         
  [252080.141690]  free_unref_page_prepare+0x22a/0x370                                                                                                                         
  [252080.141692]  free_unref_folios+0xe5/0x340                                                                                                                                
  [252080.141693]  ? __mem_cgroup_uncharge_folios+0x7a/0xa0                                                                                                                    
  [252080.141694]  folios_put_refs+0x147/0x1e0                                                                                                                                 
  [252080.141696]  ? __pfx_lru_add_fn+0x10/0x10                                                                                                                                
  [252080.141697]  folio_batch_move_lru+0xc8/0x140                                                                                                                             
  [252080.141699]  folio_add_lru+0x51/0xa0                                                                                                                                     
  [252080.141700]  do_wp_page+0x4dd/0xb60                                                                                                                                      
  [252080.141701]  __handle_mm_fault+0xb2a/0xe30          
  [252080.141703]  handle_mm_fault+0x18c/0x320                                                                                                                                 
  [252080.141704]  __get_user_pages+0x164/0x6f0                                                                                                                                
  [252080.141705]  get_user_pages_unlocked+0xe2/0x370                                                                                                                          
  [252080.141706]  hva_to_pfn+0xa0/0x740 [kvm]                                                                                                                                 
  [252080.141724]  kvm_faultin_pfn+0xf3/0x5f0 [kvm]                                                                                                                            
  [252080.141750]  kvm_tdp_page_fault+0x100/0x150 [kvm]                                                                                                                        
  [252080.141774]  kvm_mmu_page_fault+0x27e/0x7f0 [kvm]                                                                                                                        
  [252080.141798]  ? em_rsm+0xad/0x170 [kvm]                                                                                                                                   
  [252080.141823]  ? writeback_registers+0x44/0x80 [kvm]                                                                                                                       
  [252080.141848]  ? vmx_set_cr0+0xc7/0x1320 [kvm_intel]                                                                                                                       
  [252080.141853]  ? x86_emulate_insn+0x484/0xe60 [kvm]                                                                                                                        
  [252080.141877]  ? vmx_vmexit+0x6e/0xd0 [kvm_intel]                                                                                                                          
  [252080.141882]  ? vmx_vmexit+0x99/0xd0 [kvm_intel]                                                                                                                          
  [252080.141887]  vmx_handle_exit+0x129/0x930 [kvm_intel]                                                                                                                     
  [252080.141892]  kvm_arch_vcpu_ioctl_run+0x682/0x15b0 [kvm]                                                                                                                  
  [252080.141918]  kvm_vcpu_ioctl+0x23d/0x6f0 [kvm]                                                                                                                            
  [252080.141936]  ? __seccomp_filter+0x32f/0x500                                                                                                                              
  [252080.141937]  ? kvm_io_bus_read+0x42/0xd0 [kvm]                                                                                                                           
  [252080.141956]  __x64_sys_ioctl+0x90/0xd0                                                                                                                                   
  [252080.141957]  do_syscall_64+0x80/0x190                                                                                                                                    
  [252080.141958]  ? kvm_arch_vcpu_put+0x126/0x160 [kvm]                                                                                                                       
  [252080.141982]  ? vcpu_put+0x1e/0x50 [kvm]                                                                                                                                  
  [252080.141999]  ? kvm_arch_vcpu_ioctl_run+0x757/0x15b0 [kvm]                                                                                                                
  [252080.142023]  ? kvm_vcpu_ioctl+0x29e/0x6f0 [kvm]                                                                                                                          
  [252080.142040]  ? __seccomp_filter+0x32f/0x500                                                                                                                              
  [252080.142042]  ? kvm_on_user_return+0x60/0x90 [kvm]                                                                                                                        
  [252080.142065]  ? fire_user_return_notifiers+0x30/0x60                                                                                                                      
  [252080.142066]  ? syscall_exit_to_user_mode+0x73/0x200                                                                                                                      
  [252080.142067]  ? do_syscall_64+0x8c/0x190                                                                                                                                  
  [252080.142068]  ? kvm_on_user_return+0x60/0x90 [kvm]                                                                                                                        
  [252080.142090]  ? fire_user_return_notifiers+0x30/0x60                                                                                                                      
  [252080.142091]  ? syscall_exit_to_user_mode+0x73/0x200                                                                                                                      
  [252080.142092]  ? do_syscall_64+0x8c/0x190                                                                                                                                  
  [252080.142093]  ? do_syscall_64+0x8c/0x190                                                                                                                                  
  [252080.142094]  ? do_syscall_64+0x8c/0x190                                                                                                                                  
  [252080.142095]  ? exc_page_fault+0x72/0x170                                                                                                                                 
  [252080.142096]  entry_SYSCALL_64_after_hwframe+0x76/0x7e                                                                                                                    

This backtrace repeats for a large chunk of pfns

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

  reply	other threads:[~2024-07-31 11:41 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-29  2:57 [PATCH v11 0/8] KVM: allow mapping non-refcounted pages David Stevens
2024-02-29  2:57 ` [PATCH v11 1/8] KVM: Assert that a page's refcount is elevated when marking accessed/dirty David Stevens
2024-02-29  2:57 ` [PATCH v11 2/8] KVM: Relax BUG_ON argument validation David Stevens
2024-02-29  2:57 ` [PATCH v11 3/8] KVM: mmu: Introduce kvm_follow_pfn() David Stevens
2024-02-29  2:57 ` [PATCH v11 4/8] KVM: mmu: Improve handling of non-refcounted pfns David Stevens
2024-02-29  2:57 ` [PATCH v11 5/8] KVM: Migrate kvm_vcpu_map() to kvm_follow_pfn() David Stevens
2024-02-29  2:57 ` [PATCH v11 6/8] KVM: x86: Migrate " David Stevens
2024-02-29  2:57 ` [PATCH v11 7/8] KVM: x86/mmu: Track if sptes refer to refcounted pages David Stevens
2024-02-29  2:57 ` [PATCH v11 8/8] KVM: x86/mmu: Handle non-refcounted pages David Stevens
2024-04-04 16:03   ` Dmitry Osipenko
2024-04-15  7:28     ` David Stevens
2024-04-15  9:36       ` Paolo Bonzini
2024-02-29 13:36 ` [PATCH v11 0/8] KVM: allow mapping " Christoph Hellwig
2024-03-13  4:55   ` David Stevens
2024-03-13  9:55     ` Christian König
2024-03-13 13:34       ` Sean Christopherson
2024-03-13 14:37         ` Christian König
2024-03-13 14:48           ` Sean Christopherson
     [not found]             ` <9e604f99-5b63-44d7-8476-00859dae1dc4@amd.com>
2024-03-13 15:09               ` Christian König
2024-03-13 15:47               ` Sean Christopherson
     [not found]                 ` <93df19f9-6dab-41fc-bbcd-b108e52ff50b@amd.com>
2024-03-13 17:26                   ` Sean Christopherson
     [not found]                     ` <c84fcf0a-f944-4908-b7f6-a1b66a66a6bc@amd.com>
2024-03-14  9:20                       ` Christian König
2024-03-14 11:31                         ` David Stevens
2024-03-14 11:51                           ` Christian König
2024-03-14 14:45                             ` Sean Christopherson
2024-03-18  1:26                             ` Christoph Hellwig
2024-03-18 13:10                               ` Paolo Bonzini
2024-03-18 23:20                                 ` Christoph Hellwig
2024-03-14 16:17                           ` Sean Christopherson
2024-03-14 17:19                             ` Sean Christopherson
2024-03-15 17:59                               ` Sean Christopherson
2024-03-20 20:54                                 ` Axel Rasmussen
2024-03-13 13:33     ` Christoph Hellwig
2024-06-21 18:32 ` Sean Christopherson
2024-07-31 11:41   ` Alex Bennée [this message]
2024-07-31 15:01     ` Sean Christopherson
2024-08-05 23:44     ` David Stevens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cymtdc0t.fsf@draig.linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=isaku.yamahata@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=stevensd@chromium.org \
    --cc=yu.c.zhang@linux.intel.com \
    --cc=zhi.wang.linux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.