From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 49363C021B1 for ; Thu, 20 Feb 2025 12:37:32 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.893813.1302656 (Exim 4.92) (envelope-from ) id 1tl5nh-00052y-Sr; Thu, 20 Feb 2025 12:37:09 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 893813.1302656; Thu, 20 Feb 2025 12:37:09 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tl5nh-00052r-Pk; Thu, 20 Feb 2025 12:37:09 +0000 Received: by outflank-mailman (input) for mailman id 893813; Thu, 20 Feb 2025 12:37:09 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tl5nh-00052l-0J for xen-devel@lists.xenproject.org; Thu, 20 Feb 2025 12:37:09 +0000 Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [2607:f8b0:4864:20::634]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 64f66467-ef87-11ef-9aa8-95dc52dad729; Thu, 20 Feb 2025 13:37:07 +0100 (CET) Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-220d601886fso12161245ad.1 for ; Thu, 20 Feb 2025 04:37:07 -0800 (PST) Received: from localhost ([84.78.159.3]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d545d0a9sm118109075ad.151.2025.02.20.04.37.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Feb 2025 04:37:05 -0800 (PST) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 64f66467-ef87-11ef-9aa8-95dc52dad729 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=citrix.com; s=google; t=1740055026; x=1740659826; darn=lists.xenproject.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=tV5gknVESYVQ9d2LTtWONne+cfL3ZiDSeiDQ7lEQ3TA=; b=jRl+Z+PVhQMrLdiouFgUM5Y1A2O5M7bOpRP4kXOg3yLypHLcIrBlZWRciOKFHwNJQr X5vmV7wB4AAebo1ALNrPnkLEO0qI2dJhR1c9LXQ2tPl9ARnac1bNya848BLTdFuAd+fI zt/CYx+Erq/S1Hop+fUJDkw0G3ZUsRUSxuG4k= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740055026; x=1740659826; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tV5gknVESYVQ9d2LTtWONne+cfL3ZiDSeiDQ7lEQ3TA=; b=ok0KyHxD9FXfDWM274/NTVgxa+cCdH9Ku/6hnZR+tAdH8jRqGqML27OaiK2bzkqDub lkFHVO7edLU+jDSIbV5dxfLBGk7rsGYhZFema85TQAkdkLrIHBH+LchyhpJawW6oDZW6 L6aSZF2WD85GkCdhHM68nfl9y0zhzXQOonshumVWJxnxWO2yvikJcLxGhSpD/MRqnpH8 vXGAGSuvEOKxDRujNDpvXL31LvHQOX65Vh7BdVKkbPpggG/zVAR/4MTdn2CuFjL1b0CH BTTuJj2V030i7ADc9WPr2PJqBZafAZ+ksu7yy969OfZg2BH1VcsQtMwKJ6pt4xF9CGm9 rn0A== X-Forwarded-Encrypted: i=1; AJvYcCUylOC00rDLyzKvuCtvxb68WtoBaN/4AihUy7bL4ncEFHVPZtfqPFs36fvS9bLRp5dRKgW6ZB0htkw=@lists.xenproject.org X-Gm-Message-State: AOJu0YwJNRsyXWtIx1fhHcuG4fqM3s34Ys4oZUbZm7HXRCmIX4ics7YO G4GwbouLNR1XrO+dHgF9JC5QTejlPSWR3Qahf8vvigOdWedbimZH4BV1dKV3pK4= X-Gm-Gg: ASbGnct1dX2FKldVikPB2rQ8GYhfvCeQ+H24dcv+f9LVCCMgW6YvFfLN8ii+4BID/Iq ArWLL//HFKYFPF12UK6pT9Cp79/ZUd/yC7jZ3E73DSaeFZqY98NQ9k7Aq6ytscKq5fXVlBXmTAn lmb5BL81/JH0wQyFRCW/Mx6W8VG/et9r/T2gOotOf/6YID2pyjkif2AR+WxcFE2n/+7OM6+ylVJ mijCwn2RKUIodKwFTQ2Ekkv6bzSJhXawBjrUWsr7YPJGTwQrqajTvERY0rUH++uX1nuugiRQI/K 4GJs7ZK+tY9MvMbG6Kc+ X-Google-Smtp-Source: AGHT+IGnLfiN+9OyVcdiQQ7iTJn8jKRXGmncdQ0RmQiAzknf3bUN4IczClWpWXLQ+u35kGtMrJAeoQ== X-Received: by 2002:a17:902:ce03:b0:215:b1a3:4701 with SMTP id d9443c01a7336-22103f16850mr337065795ad.13.1740055026018; Thu, 20 Feb 2025 04:37:06 -0800 (PST) Date: Thu, 20 Feb 2025 13:37:00 +0100 From: Roger Pau =?utf-8?B?TW9ubsOp?= To: =?utf-8?B?SsO8cmdlbiBHcm/Dnw==?= Cc: =?utf-8?B?UGF3ZcWC?= Srokosz , xen-devel , andrew cooper3 , JBeulich@suse.com Subject: Re: Memory corruption bug with Xen PV Dom0 and BOSS-S1 RAID card Message-ID: References: <1050214476.1105853.1739823581696.JavaMail.zimbra@cert.pl> <1001969494.1457790.1739990267113.JavaMail.zimbra@cert.pl> <23b12ff3-717f-4321-b3be-9c39367b8d14@suse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <23b12ff3-717f-4321-b3be-9c39367b8d14@suse.com> On Thu, Feb 20, 2025 at 10:31:02AM +0100, Jürgen Groß wrote: > On 20.02.25 10:16, Roger Pau Monné wrote: > > On Wed, Feb 19, 2025 at 07:37:47PM +0100, Paweł Srokosz wrote: > > > Hello, > > > > > > > So the issue doesn't happen on debug=y builds? That's unexpected. I would > > > > expect the opposite, that some code in Linux assumes that pfn + 1 == mfn + > > > > 1, and hence breaks when the relation is reversed. > > > > > > It was also surprising for me but I think the key thing is that debug=y > > > causes whole mapping to be reversed so each PFN lands on completely different > > > MFN e.g. MFN=0x1300000 is mapped to PFN=0x20e50c in ndebug, but in debug > > > it's mapped to PFN=0x5FFFFF. I guess that's why I can't reproduce the > > > problem. > > > > > > > Can you see if you can reproduce with dom0-iommu=strict in the Xen command > > > > line? > > > > > > Unfortunately, it doesn't help. But I have few more observations. > > > > > > Firstly, I checked the "xen-mfndump dump-m2p" output and found that misread > > > blocks are mapped to suspiciously round MFNs. I have different versions of > > > Xen and Linux kernel on each machine and I see some coincidence. > > > > > > I'm writing few huge files without Xen to ensure that they have been written > > > correctly (because under Xen both read and writeback is affected). Then I'm > > > booting to Xen, memory-mapping the files and reading each page. I see that when > > > block is corrupted, it is mapped on round MFN e.g. pfn=0x5095d9/mfn=0x1600000, > > > another on pfn=0x4095d9/mfn=0x1500000 etc. > > > > > > On another machine with different Linux/Xen version these faults appear on > > > pfn=0x20e50c/mfn=0x1300000, pfn=0x30e50c/mfn=0x1400000 etc. > > > > > > I also noticed that during read of page that is mapped to > > > pfn=0x20e50c/mfn=0x1300000, I'm getting these faults from DMAR: > > > > > > ``` > > > (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:65:00.0] fault addr 1200000000 > > > (XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set > > > (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:65:00.0] fault addr 1200001000 > > > (XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set > > > (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:65:00.0] fault addr 1200006000 > > > (XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set > > > (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:65:00.0] fault addr 1200008000 > > > (XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set > > > (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:65:00.0] fault addr 1200009000 > > > (XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set > > > (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:65:00.0] fault addr 120000a000 > > > (XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set > > > (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:65:00.0] fault addr 120000c000 > > > (XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set > > > ``` > > > > That's interesting, it seems to me that Linux is assuming that pages > > at certain boundaries are superpages, and thus it can just increase > > the mfn to get the next physical page. > > I'm not sure this is true. See below. > > > > and every time I'm dropping the cache and reading this region, I'm getting > > > DMAR faults on few random addresses from 1200000000-120000f000 range (I guess > > > MFNs 0x1200000-120000f). MFNs 0x1200000-0x12000ff are not mapped to any PFN in > > > Dom0 (based on xen-mfndump output.). > > > > It would be very interesting to figure out where those requests > > originate, iow: which entity in Linux creates the bios with the > > faulting address(es). > > I _think_ this is related to the kernel trying to get some contiguous areas > for the buffers used by the I/Os. As those areas are being given back after > the I/O, they don't appear in the mfndump. > > > It's a wild guess, but could you try to boot Linux with swiotlb=force > > on the command line and attempt to trigger the issue? I wonder > > whether imposing the usage of the swiotlb will surface the issues as > > CPU accesses, rather then IOMMU faults, and that could get us a trace > > inside Linux of how those requests are generated. > > > > > On the other hand, I'm not getting these DMAR faults while reading other regions. > > > Also I can't trigger the bug with reversed Dom0 mapping, even if I fill the page > > > cache with reads. > > > > There's possibly some condition we are missing that causes a component > > in Linux to assume the next address is mfn + 1, instead of doing the > > full address translation from the linear or pfn space. > > My theory is: > > The kernel is seeing the used buffer to be a physically contiguous area, > so it is _not_ using a scatter-gather list (it does in the debug Xen case, > resulting in it not to show any errors). Unfortunately the buffer is not > aligned to its size, so swiotlb-xen will remap the buffer to a suitably > aligned one. The driver will then use the returned machine address for > I/Os to both the devices of the RAID configuration. When the first I/O is > done, the driver probably is calling the DMA unmap or device sync function > already, causing the intermediate contiguous region to be destroyed again > (this is the time when the DMAR errors should show up for the 2nd I/O still > running). > > So the main issue IMHO is, that a DMA buffer mapped for one device is used > for 2 devices instead. But that won't cause IOMMU faults? Because the memory used by the bounce buffer would still be owned by dom0 (and thus part of it's IOMMU page-tables), just probably re-written to contain different data. Or is the swiotlb contiguous region torn down after every operation? That would seem extremely wasteful to me, I assume the buffer is allocated during device init, and stays the same until the device is detached. Thanks, Roger.