All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: dgilbert@interlog.com, Tomas Henzl <thenzl@redhat.com>,
	linux-scsi@vger.kernel.org
Subject: Re: [PATCH] scsi: take module reference during async scan
Date: Mon, 07 Sep 2020 15:56:14 -0700	[thread overview]
Message-ID: <1599519374.4232.71.camel@HansenPartnership.com> (raw)
In-Reply-To: <a31b8640-307f-a14f-7cf8-7673fa8a4ff1@interlog.com>

On Mon, 2020-09-07 at 17:32 -0400, Douglas Gilbert wrote:
> On 2020-09-07 1:46 p.m., James Bottomley wrote:
> > On Mon, 2020-09-07 at 17:47 +0200, Tomas Henzl wrote:
> > > During an async scan the driver shost->hostt structures are used,
> > > that may cause issues when the driver is removed at that time.
> > > As protection take the module reference.
> > 
> > Can I just ask what issues?  Today, our module model is that
> > scsi_device_get() bumps the module refcount and therefore makes the
> > module ineligible to be removed.  scsi_host_get() doesn't do this
> > because the way the host model is supposed to be coded, we can call
> > remove at any time but the module won't get freed until the last
> > put of the host.  I can see we have a potential problem with
> > scsi_forget_host() racing with the async scan thread ... is that
> > what you see? What's supposed to happen is that scsi_device_get()
> > starts failing as soon as the module begins it's exit routine, so
> > if a scan is in progress, it can't add any new devices ... in
> > theory this means that the list is stable for scsi_forget_host(),
> > so knowing how that assumption is breaking would be useful.
> 
> James,
> If you think it is bullet-proof try using 

I'm not saying it's got no bugs, just that the above is the way it's
supposed to work.

> CONFIG_DEBUG_TEST_DRIVER_REMOVE=y .

The problem with this option is it basically gives you a thundering
herd of removal reinsertions ... trying to do it for a single driver
(or set of drivers) is likely a better way to get actionable debugging
information.

> John Garry reported that:
> 
>   # insmod scsi_debug.ko
> 
> Gave errors like this:
> 
> [  140.115244] debugfs: Directory 'sde' with parent 'block' already
> present!
> [  140.376426] debugfs: Directory 'sde' with parent 'block' already
> present!
> [  140.420613] sd 3:0:0:0: [sde] tag#40 access beyond end of device
> [  140.426655] blk_update_request: I/O error, dev sde, sector 15984
> op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [  140.437319] sd 3:0:0:0: [sde] tag#41 access beyond end of device
> [  140.443368] blk_update_request: I/O error, dev sde, sector 15984
> op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> ...
> 
> Which wasn't the scsi_debug driver directly as it doesn't use
> debugfs. So I suspect something is rotten in the mid-level.
> 
> When I tried to replicate John's config I couldn't even boot my
> Ubuntu 20.04 based system (with a MKP kernel). Seemed to fail/lockup
> before any kernel prints came out to the serial port (yes, still
> useful), perhaps in initrd. I'm guessing another, non-SCSI module
> caused the lockup. So I gave up and turned off that config setting.

If that can be distilled down to a better test case, I can look into
it.

James


  reply	other threads:[~2020-09-07 22:56 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 15:47 [PATCH] scsi: take module reference during async scan Tomas Henzl
2020-09-07 16:57 ` Bart Van Assche
2020-09-07 21:12   ` Tomas Henzl
2020-09-07 17:46 ` James Bottomley
2020-09-07 20:09   ` Tomas Henzl
2020-09-07 20:24     ` James Bottomley
2020-09-07 21:02       ` Tomas Henzl
2020-09-07 22:02         ` James Bottomley
2020-09-08  8:22           ` Tomas Henzl
2020-09-07 21:32   ` Douglas Gilbert
2020-09-07 22:56     ` James Bottomley [this message]
2020-09-08 14:04     ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1599519374.4232.71.camel@HansenPartnership.com \
    --to=james.bottomley@hansenpartnership.com \
    --cc=dgilbert@interlog.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=thenzl@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.