All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Ewan Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	linux-scsi@vger.kernel.org, ming.lei@redhat.com
Subject: Re: [PATCH] scsi: core: move scsi_host_busy() out of host lock for waking up EH handler
Date: Sat, 13 Jan 2024 09:59:10 +0800	[thread overview]
Message-ID: <ZaHubv1sH2I14z20@fedora> (raw)
In-Reply-To: <CAGtn9r=Qko22+9Zxg8BnaAMtfEH_WYpkE7mDBmKWSdcm98Ui1Q@mail.gmail.com>

On Fri, Jan 12, 2024 at 02:34:52PM -0500, Ewan Milne wrote:
> On Fri, Jan 12, 2024 at 7:43 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Fri, Jan 12, 2024 at 12:12:57PM +0100, Hannes Reinecke wrote:
> > > On 1/12/24 08:00, Ming Lei wrote:
> > > > Inside scsi_eh_wakeup(), scsi_host_busy() is called & checked with host lock
> > > > every time for deciding if error handler kthread needs to be waken up.
> > > >
> > > > This way can be too heavy in case of recovery, such as:
> > > >
> > > > - N hardware queues
> > > > - queue depth is M for each hardware queue
> > > > - each scsi_host_busy() iterates over (N * M) tag/requests
> > > >
> > > > If recovery is triggered in case that all requests are in-flight, each
> > > > scsi_eh_wakeup() is strictly serialized, when scsi_eh_wakeup() is called
> > > > for the last in-flight request, scsi_host_busy() has been run for (N * M - 1)
> > > > times, and request has been iterated for (N*M - 1) * (N * M) times.
> > > >
> > > > If both N and M are big enough, hard lockup can be triggered on acquiring
> > > > host lock, and it is observed on mpi3mr(128 hw queues, queue depth 8169).
> > > >
> > > > Fix the issue by calling scsi_host_busy() outside host lock, and we
> > > > don't need host lock for getting busy count because host lock never
> > > > covers that.
> > > >
> > > Can you share details for the hard lockup?
> > > I do agree that scsi_host_busy() is an expensive operation, so it
> > > might not be ideal to call it under a spin lock.
> > > But I wonder where the lockup comes in here.
> > > Care to explain?
> >
> > Recovery happens when there is N * M inflight requests, then scsi_dec_host_busy()
> > can be called for each inflight request/scmnd from irq context.
> >
> > host lock serializes every scsi_eh_wakeup().
> >
> > Given each hardware queue has its own irq handler, so there could be one
> > request, scsi_dec_host_busy() is called and the host lock is spinned until
> > it is released from scsi_dec_host_busy() for all requests from all other
> > hardware queues.
> >
> > The spin time can be long enough to trigger the hard lockup if N and M
> > is big enough, and the total wait time can be:
> >
> >         (N - 1) * M * time_taken_in_scsi_host_busy().
> >
> > Meantime the same story happens on scsi_eh_inc_host_failed() which is
> > called from softirq context, so host lock spin can be much more worse.
> >
> > It is observed on mpi3mr with 128(N) hw queues and 8169(M) queue depth.
> >
> > >
> > > And if it leads to a lockup, aren't other instances calling scsi_host_busy()
> > > under a spinlock affected, as well?
> >
> > It is only possible when it is called in per-command situation.
> >
> >
> > Thanks,
> > Ming
> >
> 
> I can't see why this wouldn't work, or cause a problem with a lost wakeup,
> but the cost of iterating to obtain the host_busy value is still being paid,
> just outside the host_lock.  If this has triggered a hard lockup, should
> we revisit the algorithm, e.g. are we still delaying EH wakeup for a noticeable
> amount of time?

SCSI EH is designed to start handling until all in-flight commands are
failed, so it waits until all requests are failed first.

> O(n^2) algorithms in the kernel don't seem like the best idea.

It is actually O(n) because each hardware queue handles request
in parallel.

It is degraded to O(n^2) or O(n * m) just because of shared host lock.

Single or N scsi_host_busy() won't take too long without host lock, what
matters is actually the per-host lock spin time which can be accumulated
as too big.

> 
> In any case...
> Reviewed-by: Ewan D. Milne <emilne@redhat.com>

Thanks for the review!


-- 
Ming


  reply	other threads:[~2024-01-13  1:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-12  7:00 [PATCH] scsi: core: move scsi_host_busy() out of host lock for waking up EH handler Ming Lei
2024-01-12 11:12 ` Hannes Reinecke
2024-01-12 12:42   ` Ming Lei
2024-01-12 19:34     ` Ewan Milne
2024-01-13  1:59       ` Ming Lei [this message]
2024-01-23  7:04         ` Sathya Prakash Veerichetty
2024-01-23 15:23 ` Bart Van Assche
2024-01-24  3:00 ` Martin K. Petersen
2024-02-03  2:31   ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZaHubv1sH2I14z20@fedora \
    --to=ming.lei@redhat.com \
    --cc=emilne@redhat.com \
    --cc=hare@suse.de \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.