public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: Sean Christopherson <seanjc@google.com>
Cc: David Stevens <stevensd@chromium.org>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	 Yu Zhang <yu.c.zhang@linux.intel.com>,
	 Isaku Yamahata <isaku.yamahata@gmail.com>,
	 Zhi Wang <zhi.wang.linux@gmail.com>,
	Maxim Levitsky <mlevitsk@redhat.com>,
	 kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org,
	 kvm@vger.kernel.org
Subject: Re: [PATCH v11 0/8] KVM: allow mapping non-refcounted pages
Date: Wed, 31 Jul 2024 12:41:22 +0100	[thread overview]
Message-ID: <87cymtdc0t.fsf@draig.linaro.org> (raw)
In-Reply-To: <ZnXHQid_N1w4kLoC@google.com> (Sean Christopherson's message of "Fri, 21 Jun 2024 11:32:34 -0700")

Sean Christopherson <seanjc@google.com> writes:

> On Thu, Feb 29, 2024, David Stevens wrote:
>> From: David Stevens <stevensd@chromium.org>
>> 
>> This patch series adds support for mapping VM_IO and VM_PFNMAP memory
>> that is backed by struct pages that aren't currently being refcounted
>> (e.g. tail pages of non-compound higher order allocations) into the
>> guest.
>> 
>> Our use case is virtio-gpu blob resources [1], which directly map host
>> graphics buffers into the guest as "vram" for the virtio-gpu device.
>> This feature currently does not work on systems using the amdgpu driver,
>> as that driver allocates non-compound higher order pages via
>> ttm_pool_alloc_page().
>> 
>> First, this series replaces the gfn_to_pfn_memslot() API with a more
>> extensible kvm_follow_pfn() API. The updated API rearranges
>> gfn_to_pfn_memslot()'s args into a struct and where possible packs the
>> bool arguments into a FOLL_ flags argument. The refactoring changes do
>> not change any behavior.
>> 
>> From there, this series extends the kvm_follow_pfn() API so that
>> non-refconuted pages can be safely handled. This invloves adding an
>> input parameter to indicate whether the caller can safely use
>> non-refcounted pfns and an output parameter to tell the caller whether
>> or not the returned page is refcounted. This change includes a breaking
>> change, by disallowing non-refcounted pfn mappings by default, as such
>> mappings are unsafe. To allow such systems to continue to function, an
>> opt-in module parameter is added to allow the unsafe behavior.
>> 
>> This series only adds support for non-refcounted pages to x86. Other
>> MMUs can likely be updated without too much difficulty, but it is not
>> needed at this point. Updating other parts of KVM (e.g. pfncache) is not
>> straightforward [2].
>
> FYI, on the off chance that someone else is eyeballing this, I am working on
> revamping this series.  It's still a ways out, but I'm optimistic that we'll be
> able to address the concerns raised by Christoph and Christian, and maybe even
> get KVM out of the weeds straightaway (PPC looks thorny :-/).

I've applied this series to the latest 6.9.x while attempting to
diagnose some of the virtio-gpu problems it may or may not address.
However launching KVM guests keeps triggering a bunch of BUGs that
eventually leave a hung guest:

  12:16:54 [root@draig:~] # dmesg -c                                                                                                                                           
  [252080.141629] RAX: ffffffffffffffda RBX: 0000560a64915500 RCX: 00007faa23e81c5b                                                                                            
  [252080.141629] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000017                                                                                            
  [252080.141630] RBP: 000000000000ae80 R08: 0000000000000000 R09: 0000000000000000                                                                                            
  [252080.141630] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000                                                                                            
  [252080.141631] R13: 0000000000000001 R14: 00000000000000b2 R15: 0000000000000002                                                                                            
  [252080.141632]  </TASK>                                                                                                                                                     
  [252080.141632] BUG: Bad page state in process CPU 0/KVM  pfn:fb1665                                                                                                         
  [252080.141633] page: refcount:0 mapcount:1 mapping:0000000000000000 index:0x7fa8117c3 pfn:0xfb1665                                                                          
  [252080.141633] flags: 0x17ffffc00a000c(referenced|uptodate|mappedtodisk|swapbacked|node=0|zone=2|lastcpupid=0x1fffff)                                                       
  [252080.141634] page_type: 0x0()                                                                                                                                             
  [252080.141635] raw: 0017ffffc00a000c dead000000000100 dead000000000122 0000000000000000                                                                                     
  [252080.141635] raw: 00000007fa8117c3 0000000000000000 0000000000000000 0000000000000000                                                                                     
  [252080.141635] page dumped because: nonzero mapcount                                                                                                                        
  [252080.141636] Modules linked in: vhost_net vhost vhost_iotlb tap tun uas usb_storage veth cfg80211 nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter nft_ma
  sq wireguard libchacha20poly1305 chacha_x86_64 poly1305_x86_64 curve25519_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel rfcomm snd_seq_dummy snd_hrtimer s
  nd_seq xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetl
  ink bridge stp llc qrtr overlay cmac algif_hash algif_skcipher af_alg bnep binfmt_misc squashfs snd_hda_codec_hdmi intel_uncore_frequency snd_ctl_led intel_uncore_frequency_
  common ledtrig_audio x86_pkg_temp_thermal intel_powerclamp coretemp snd_sof_pci_intel_tgl snd_sof_intel_hda_common kvm_intel soundwire_intel soundwire_generic_allocation btu
  sb snd_sof_intel_hda_mlink sd_mod soundwire_cadence btrtl snd_hda_codec_realtek kvm sg snd_sof_intel_hda btintel snd_sof_pci btbcm snd_hda_codec_generic btmtk               
  [252080.141656]  snd_sof_xtensa_dsp crc32_pclmul bluetooth snd_hda_scodec_component ghash_clmulni_intel snd_sof sha256_ssse3 sha1_ssse3 snd_sof_utils snd_soc_hdac_hda snd_hd
  a_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress soundwire_bus sha3_generic jitterentropy_rng aesni_intel snd_hda_intel snd_intel_dspcfg crypto_sim
  d sha512_ssse3 snd_intel_sdw_acpi cryptd sha512_generic uvcvideo snd_hda_codec snd_usb_audio videobuf2_vmalloc uvc ctr videobuf2_memops snd_hda_core snd_usbmidi_lib videobuf
  2_v4l2 snd_rawmidi drbg snd_hwdep dell_wmi snd_seq_device nls_ascii ahci ansi_cprng iTCO_wdt processor_thermal_device_pci videodev nls_cp437 snd_pcm intel_pmc_bxt dell_smbio
  s libahci processor_thermal_device rapl rtsx_pci_sdmmc iTCO_vendor_support ecdh_generic mmc_core mei_hdcp watchdog libata intel_rapl_msr videobuf2_common rfkill vfat process
  or_thermal_wt_hint pl2303 snd_timer dcdbas dell_wmi_ddv dell_wmi_sysman processor_thermal_rfim ucsi_acpi fat intel_cstate usbserial intel_uncore cdc_acm mc battery ecc      
  [252080.141670]  firmware_attributes_class dell_wmi_descriptor wmi_bmof dell_smm_hwmon processor_thermal_rapl pcspkr scsi_mod mei_me intel_lpss_pci snd typec_ucsi igc e1000e
   i2c_i801 rtsx_pci intel_rapl_common intel_lpss roles mei soundcore processor_thermal_wt_req i2c_smbus idma64 scsi_common processor_thermal_power_floor typec processor_therm
  al_mbox button intel_pmc_core int3403_thermal int340x_thermal_zone intel_vsec pmt_telemetry intel_hid int3400_thermal pmt_class sparse_keymap acpi_tad acpi_pad acpi_thermal_
  rel msr parport_pc ppdev lp parport fuse loop efi_pstore configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 hid_microsoft joydev ff_memless hid_generic usb
  hid hid btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq evdev dm_mod i915 i2c_algo_bit drm_buddy ttm drm_display_helper xhci_pci xhci_hcd drm_kms_helper nvme nvm
  e_core drm t10_pi usbcore video crc64_rocksoft crc64 crc_t10dif cec crct10dif_generic crct10dif_pclmul crc32c_intel rc_core usb_common crct10dif_common wmi                  
  [252080.141686]  pinctrl_alderlake                                                                                                                                           
  [252080.141686] CPU: 8 PID: 1819169 Comm: CPU 0/KVM Tainted: G    B   W          6.9.12-ajb-00008-gfcd4b7efbad0 #17                                                          
  [252080.141687] Hardware name: Dell Inc. Precision 3660/0PRR48, BIOS 2.8.1 08/14/2023                                                                                        
  [252080.141688] Call Trace:                                                                                                                                                  
  [252080.141688]  <TASK>                                                                                                                                                      
  [252080.141688]  dump_stack_lvl+0x60/0x80                                                                                                                                    
  [252080.141689]  bad_page+0x70/0x100                                                                                                                                         
  [252080.141690]  free_unref_page_prepare+0x22a/0x370                                                                                                                         
  [252080.141692]  free_unref_folios+0xe5/0x340                                                                                                                                
  [252080.141693]  ? __mem_cgroup_uncharge_folios+0x7a/0xa0                                                                                                                    
  [252080.141694]  folios_put_refs+0x147/0x1e0                                                                                                                                 
  [252080.141696]  ? __pfx_lru_add_fn+0x10/0x10                                                                                                                                
  [252080.141697]  folio_batch_move_lru+0xc8/0x140                                                                                                                             
  [252080.141699]  folio_add_lru+0x51/0xa0                                                                                                                                     
  [252080.141700]  do_wp_page+0x4dd/0xb60                                                                                                                                      
  [252080.141701]  __handle_mm_fault+0xb2a/0xe30          
  [252080.141703]  handle_mm_fault+0x18c/0x320                                                                                                                                 
  [252080.141704]  __get_user_pages+0x164/0x6f0                                                                                                                                
  [252080.141705]  get_user_pages_unlocked+0xe2/0x370                                                                                                                          
  [252080.141706]  hva_to_pfn+0xa0/0x740 [kvm]                                                                                                                                 
  [252080.141724]  kvm_faultin_pfn+0xf3/0x5f0 [kvm]                                                                                                                            
  [252080.141750]  kvm_tdp_page_fault+0x100/0x150 [kvm]                                                                                                                        
  [252080.141774]  kvm_mmu_page_fault+0x27e/0x7f0 [kvm]                                                                                                                        
  [252080.141798]  ? em_rsm+0xad/0x170 [kvm]                                                                                                                                   
  [252080.141823]  ? writeback_registers+0x44/0x80 [kvm]                                                                                                                       
  [252080.141848]  ? vmx_set_cr0+0xc7/0x1320 [kvm_intel]                                                                                                                       
  [252080.141853]  ? x86_emulate_insn+0x484/0xe60 [kvm]                                                                                                                        
  [252080.141877]  ? vmx_vmexit+0x6e/0xd0 [kvm_intel]                                                                                                                          
  [252080.141882]  ? vmx_vmexit+0x99/0xd0 [kvm_intel]                                                                                                                          
  [252080.141887]  vmx_handle_exit+0x129/0x930 [kvm_intel]                                                                                                                     
  [252080.141892]  kvm_arch_vcpu_ioctl_run+0x682/0x15b0 [kvm]                                                                                                                  
  [252080.141918]  kvm_vcpu_ioctl+0x23d/0x6f0 [kvm]                                                                                                                            
  [252080.141936]  ? __seccomp_filter+0x32f/0x500                                                                                                                              
  [252080.141937]  ? kvm_io_bus_read+0x42/0xd0 [kvm]                                                                                                                           
  [252080.141956]  __x64_sys_ioctl+0x90/0xd0                                                                                                                                   
  [252080.141957]  do_syscall_64+0x80/0x190                                                                                                                                    
  [252080.141958]  ? kvm_arch_vcpu_put+0x126/0x160 [kvm]                                                                                                                       
  [252080.141982]  ? vcpu_put+0x1e/0x50 [kvm]                                                                                                                                  
  [252080.141999]  ? kvm_arch_vcpu_ioctl_run+0x757/0x15b0 [kvm]                                                                                                                
  [252080.142023]  ? kvm_vcpu_ioctl+0x29e/0x6f0 [kvm]                                                                                                                          
  [252080.142040]  ? __seccomp_filter+0x32f/0x500                                                                                                                              
  [252080.142042]  ? kvm_on_user_return+0x60/0x90 [kvm]                                                                                                                        
  [252080.142065]  ? fire_user_return_notifiers+0x30/0x60                                                                                                                      
  [252080.142066]  ? syscall_exit_to_user_mode+0x73/0x200                                                                                                                      
  [252080.142067]  ? do_syscall_64+0x8c/0x190                                                                                                                                  
  [252080.142068]  ? kvm_on_user_return+0x60/0x90 [kvm]                                                                                                                        
  [252080.142090]  ? fire_user_return_notifiers+0x30/0x60                                                                                                                      
  [252080.142091]  ? syscall_exit_to_user_mode+0x73/0x200                                                                                                                      
  [252080.142092]  ? do_syscall_64+0x8c/0x190                                                                                                                                  
  [252080.142093]  ? do_syscall_64+0x8c/0x190                                                                                                                                  
  [252080.142094]  ? do_syscall_64+0x8c/0x190                                                                                                                                  
  [252080.142095]  ? exc_page_fault+0x72/0x170                                                                                                                                 
  [252080.142096]  entry_SYSCALL_64_after_hwframe+0x76/0x7e                                                                                                                    

This backtrace repeats for a large chunk of pfns

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

  reply	other threads:[~2024-07-31 11:41 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-29  2:57 [PATCH v11 0/8] KVM: allow mapping non-refcounted pages David Stevens
2024-02-29  2:57 ` [PATCH v11 1/8] KVM: Assert that a page's refcount is elevated when marking accessed/dirty David Stevens
2024-02-29  2:57 ` [PATCH v11 2/8] KVM: Relax BUG_ON argument validation David Stevens
2024-02-29  2:57 ` [PATCH v11 3/8] KVM: mmu: Introduce kvm_follow_pfn() David Stevens
2024-02-29  2:57 ` [PATCH v11 4/8] KVM: mmu: Improve handling of non-refcounted pfns David Stevens
2024-02-29  2:57 ` [PATCH v11 5/8] KVM: Migrate kvm_vcpu_map() to kvm_follow_pfn() David Stevens
2024-02-29  2:57 ` [PATCH v11 6/8] KVM: x86: Migrate " David Stevens
2024-02-29  2:57 ` [PATCH v11 7/8] KVM: x86/mmu: Track if sptes refer to refcounted pages David Stevens
2024-02-29  2:57 ` [PATCH v11 8/8] KVM: x86/mmu: Handle non-refcounted pages David Stevens
2024-04-04 16:03   ` Dmitry Osipenko
2024-04-15  7:28     ` David Stevens
2024-04-15  9:36       ` Paolo Bonzini
2024-02-29 13:36 ` [PATCH v11 0/8] KVM: allow mapping " Christoph Hellwig
2024-03-13  4:55   ` David Stevens
2024-03-13  9:55     ` Christian König
2024-03-13 13:34       ` Sean Christopherson
2024-03-13 14:37         ` Christian König
2024-03-13 14:48           ` Sean Christopherson
     [not found]             ` <9e604f99-5b63-44d7-8476-00859dae1dc4@amd.com>
2024-03-13 15:09               ` Christian König
2024-03-13 15:47               ` Sean Christopherson
     [not found]                 ` <93df19f9-6dab-41fc-bbcd-b108e52ff50b@amd.com>
2024-03-13 17:26                   ` Sean Christopherson
     [not found]                     ` <c84fcf0a-f944-4908-b7f6-a1b66a66a6bc@amd.com>
2024-03-14  9:20                       ` Christian König
2024-03-14 11:31                         ` David Stevens
2024-03-14 11:51                           ` Christian König
2024-03-14 14:45                             ` Sean Christopherson
2024-03-18  1:26                             ` Christoph Hellwig
2024-03-18 13:10                               ` Paolo Bonzini
2024-03-18 23:20                                 ` Christoph Hellwig
2024-03-14 16:17                           ` Sean Christopherson
2024-03-14 17:19                             ` Sean Christopherson
2024-03-15 17:59                               ` Sean Christopherson
2024-03-20 20:54                                 ` Axel Rasmussen
2024-03-13 13:33     ` Christoph Hellwig
2024-06-21 18:32 ` Sean Christopherson
2024-07-31 11:41   ` Alex Bennée [this message]
2024-07-31 15:01     ` Sean Christopherson
2024-08-05 23:44     ` David Stevens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cymtdc0t.fsf@draig.linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=isaku.yamahata@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=stevensd@chromium.org \
    --cc=yu.c.zhang@linux.intel.com \
    --cc=zhi.wang.linux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox