From: Jacob Pan <jacob.jun.pan@linux.intel.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: "iommu@lists.linux.dev" <iommu@lists.linux.dev>,
LKML <linux-kernel@vger.kernel.org>,
Lu Baolu <baolu.lu@linux.intel.com>,
"Liu, Yi L" <yi.l.liu@intel.com>,
"Zhang, Tina" <tina.zhang@intel.com>,
"Kumar, Sanjay K" <sanjay.k.kumar@intel.com>,
jacob.jun.pan@linux.intel.com
Subject: Re: [PATCH] iommu/vt-d: Fix potential soft lockup due to reclaim
Date: Thu, 25 Jul 2024 19:41:59 -0700 [thread overview]
Message-ID: <20240725194159.41192480@jacob-builder> (raw)
In-Reply-To: <BN9PR11MB52766AC7AEFA103E5B1D067B8CB42@BN9PR11MB5276.namprd11.prod.outlook.com>
On Fri, 26 Jul 2024 00:18:13 +0000, "Tian, Kevin" <kevin.tian@intel.com>
wrote:
> > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Sent: Friday, July 26, 2024 5:11 AM
> > > > > > @@ -1463,8 +1462,14 @@ int qi_submit_sync(struct intel_iommu
> > > > *iommu,
> > > > > > struct qi_desc *desc,
> > > > > > raw_spin_lock(&qi->q_lock);
> > > > > > }
> > > > > >
> > > > > > - for (i = 0; i < count; i++)
> > > > > > - qi->desc_status[(index + i) % QI_LENGTH] =
> > > > > > QI_DONE;
> > > > > > + /*
> > > > > > + * The reclaim code can free descriptors from multiple
> > > > > > submissions
> > > > > > + * starting from the tail of the queue. When count ==
> > > > > > 0, the
> > > > > > + * status of the standalone wait descriptor at the
> > > > > > tail of the queue
> > > > > > + * must be set to QI_TO_BE_FREED to allow the reclaim
> > > > > > code to proceed.
> > > > > > + */
> > > > > > + for (i = 0; i <= count; i++)
> > > > > > + qi->desc_status[(index + i) % QI_LENGTH] =
> > > > > > QI_TO_BE_FREED;
> > > > >
> > > > > We don't really need a new flag. Just set them to QI_FREE and then
> > > > > reclaim QI_FREE slots until hitting qi->head in
> > > > > reclaim_free_desc().
> > > > We do need to have a separate state for descriptors pending to be
> > > > freed. Otherwise, reclaim code will advance pass the intended range.
> > > >
> > >
> > > The commit msg said that QI_DONE is currently used for conflicting
> > > purpose.
> > >
> > > Using QI_FREE keeps it only for signaling that a wait desc is
> > > completed.
> > >
> > > The key is that reclaim() should not change a desc's state before it's
> > > consumed by the owner. Instead we always let the owner to change the
> > > state and reclaim() only does scan and adjust the tracking fields then
> > > such race condition disappears.
> > >
> > > In this example T2's slots are changed to QI_FREE by T2 after it
> > > completes all the checks. Only at this point those slots can be
> > > reclaimed.
> >
> > The problem is that without the TO_BE_FREED state, the reclaim code
> > would have no way of knowing which ones are to be reclaimed and which
> > ones are currently free. Therefore, it cannot track free_cnt.
> >
> > The current reclaim code is not aware of owners nor how many to reclaim.
> >
> > If I make the following changes and run, free_cnt will keep going up and
> > system cannot boot. Perhaps you meant another way?
> >
> > --- a/drivers/iommu/intel/dmar.c
> > +++ b/drivers/iommu/intel/dmar.c
> > @@ -1204,8 +1204,7 @@ static void free_iommu(struct intel_iommu
> > *iommu)
> > */
> > static inline void reclaim_free_desc(struct q_inval *qi)
> > {
> > - while (qi->desc_status[qi->free_tail] == QI_TO_BE_FREED) {
> > - qi->desc_status[qi->free_tail] = QI_FREE;
> > + while (qi->desc_status[qi->free_tail] == QI_FREE) {
> > qi->free_tail = (qi->free_tail + 1) % QI_LENGTH;
> > qi->free_cnt++;
>
> Here miss a check to prevent reclaiming unused slots:
>
> if (qi->free_tail == qi->free_head)
> break;
>
> In the example flow reclaim_free_desc() in T1 will only reclaim slots
> used by T1 as slots of T2 are either QI_IN_USE or QI_DONE. T2 slots
> will be reclaimed when T2 calls reclaim_free_desc() after setting them
> to QI_FREE, and reclaim will stop at qi->free_head.
>
> If for some reason T2 completes earlier than T1. reclaim_free_desc()
> in T2 does nothing as the first slot qi->free_tail belongs to T1 still
> IN_USE. T2's slots will then wait until reclaim is triggered by T1 later.
>
This makes sense. Unlike the original code, we now only allow freeing the
submitter's own descriptors.
> > }
> > @@ -1466,10 +1465,10 @@ int qi_submit_sync(struct intel_iommu *iommu,
> > struct qi_desc *desc,
> > * The reclaim code can free descriptors from multiple
> > submissions
> > * starting from the tail of the queue. When count == 0, the
> > * status of the standalone wait descriptor at the tail of the
> > queue
> > - * must be set to QI_TO_BE_FREED to allow the reclaim code to
> > proceed.
> > + * must be set to QI_FREE to allow the reclaim code to proceed.
> > */
> > for (i = 0; i <= count; i++)
> > - qi->desc_status[(index + i) % QI_LENGTH] =
> > QI_TO_BE_FREED;
> > + qi->desc_status[(index + i) % QI_LENGTH] = QI_FREE;
> >
> > reclaim_free_desc(qi);
> > raw_spin_unlock_irqrestore(&qi->q_lock, flags);
> > diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> > index 1ab39f9145f2..eaf015b4353b 100644
> > --- a/drivers/iommu/intel/iommu.h
> > +++ b/drivers/iommu/intel/iommu.h
> > @@ -382,8 +382,7 @@ enum {
> > QI_FREE,
> > QI_IN_USE,
> > QI_DONE,
> > - QI_ABORT,
> > - QI_TO_BE_FREED
> > + QI_ABORT
> > };
> >
> > Thanks,
> >
> > Jacob
>
Thanks,
Jacob
next prev parent reply other threads:[~2024-07-26 2:36 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-19 18:17 [PATCH] iommu/vt-d: Fix potential soft lockup due to reclaim Jacob Pan
2024-07-24 7:40 ` Tian, Kevin
2024-07-24 16:25 ` Jacob Pan
2024-07-25 0:44 ` Tian, Kevin
2024-07-25 21:11 ` Jacob Pan
2024-07-26 0:18 ` Tian, Kevin
2024-07-26 2:41 ` Jacob Pan [this message]
2024-09-02 5:44 ` Kumar, Sanjay K
2024-09-02 11:50 ` Baolu Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240725194159.41192480@jacob-builder \
--to=jacob.jun.pan@linux.intel.com \
--cc=baolu.lu@linux.intel.com \
--cc=iommu@lists.linux.dev \
--cc=kevin.tian@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sanjay.k.kumar@intel.com \
--cc=tina.zhang@intel.com \
--cc=yi.l.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.