From: Boris Brezillon <boris.brezillon@collabora.com>
To: Sean Nyekjaer <sean@geanix.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>,
Richard Weinberger <richard@nod.at>,
Vignesh Raghavendra <vigneshr@ti.com>,
Boris Brezillon <bbrezillon@kernel.org>,
linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] mtd: rawnand: use mutex to protect access while in suspend
Date: Thu, 7 Oct 2021 14:18:58 +0200 [thread overview]
Message-ID: <20211007141858.314533f2@collabora.com> (raw)
In-Reply-To: <20211007114351.3nafhtpefezxhanc@skn-laptop>
On Thu, 7 Oct 2021 13:43:51 +0200
Sean Nyekjaer <sean@geanix.com> wrote:
> On Tue, Oct 05, 2021 at 10:58:36AM +0200, Boris Brezillon wrote:
> > On Tue, 5 Oct 2021 10:49:38 +0200
> > Sean Nyekjaer <sean@geanix.com> wrote:
> >
> > > On Tue, Oct 05, 2021 at 10:23:00AM +0200, Boris Brezillon wrote:
> > > > On Tue, 5 Oct 2021 09:09:30 +0200
> > > > Sean Nyekjaer <sean@geanix.com> wrote:
> > >
> > > [ ... ]
> > >
> > > > >
> > > > > Have you seen the reproducer script?
> > > >
> > > > How would I know about this script or your previous attempt (mentioned
> > > > at the end of this email) given I was not Cc-ed on the previous
> > > > discussion, and nothing mentions it in this RFC...
> > > >
> > >
> > > That's why I shared it here ;)
> > > Initially I thought this was a bug introduced by exec_op.
> > >
> > > > > ---
> > > > > root@iwg26-v1:/data/root# cat /data/crash.sh
> > > > > #!/bin/sh -x
> > > > >
> > > > > echo enabled > /sys/devices/platform/soc/2100000.bus/21f4000.serial/tty/ttymxc4/power/wakeup
> > > > >
> > > > > rm /data/test50M
> > > > > dd if=/dev/urandom of=/tmp/test50M bs=1M count=50
> > > > > cp /tmp/test50M /data/ &
> > > > > sleep 1
> > > > > echo mem > /sys/power/state
> > > > > ---
> > > > >
> > > > > As seen in the log above disk is synced before suspend.
> > > > > cp is continuing to copy data to ubifs.
> > > > > And then user space processes are frozen.
> > > > > At this point the kernel thread would have unwritten data.
> > > > >
> > > > > We tried to solve this with:
> > > > > https://lkml.org/lkml/2021/9/1/280
> > > >
> > > > I see. It's still unclear to me when the write happens. Is it in the
> > > > suspend path (before the system is actually suspended), or in the
> > > > resume path (when the system is being resumed).
> > > >
> > > > Anyway, let's admit writing to a storage device while it's suspended is
> > > > a valid use case and requires the storage layer to put this request on
> > > > old. This wait should not, IMHO, be handled at the NAND level, but at
> > > > the MTD level (using a waitqueue, and an atomic to make
> > > > suspended/resumed transitions safe). And abusing a mutex to implement
> > > > that is certainly not a good idea.
> > >
> > > I did't say this was the right solution ;) I actually asked in the RFC:
> > > "Should we introduce a new mutex? Or maybe a spin_lock?"
> > >
> > > What are you proposing, a waitqueue in mtd_info? That gets checked in
> > > mtd_write()/mtd_read()?
> >
> > Yes, and replacing the suspended state by an atomic, and providing a
> > helper to wait on the device readiness. Helper you will call in every
> > path involving a communication with the HW, not just mtd_read/write()
> > (you're missing erase at least, and I fear there are other hooks that
> > might lead to commands being issued to the device). But before we get
> > there, I think it's important to understand what the kernel expects.
> > IOW, if and when threads can do a request on a suspended device, and
> > when it's acceptable to wait (vs returning -EBUSY), otherwise I fear
> > we'll end up with deadlocks in the suspend/resume path.
>
> I have a proposal [0] and yes I have ended up in many deadlocks during
> testing. The hardest part is the locking when going into suspend.
> I'm not sure the wait_queue is initialized the right place :)
> And I'm kinda abusing the nand_get_device() for this...
>
> Who do you think we should add to the discussion?
>
> /Sean
>
> [0]:
> diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
> index 3d6c6e880520..735dfff18143 100644
> --- a/drivers/mtd/nand/raw/nand_base.c
> +++ b/drivers/mtd/nand/raw/nand_base.c
As I said previously, I think this should be handled MTD level
(drivers/mtd/mtdcore.c) not in the raw NAND framework.
> @@ -337,11 +337,10 @@ static int nand_isbad_bbm(struct nand_chip *chip, loff_t ofs)
> */
> static int nand_get_device(struct nand_chip *chip)
> {
> + struct mtd_info *mtd = nand_to_mtd(chip);
> +
> + wait_event(mtd->wait_queue, atomic_read(&chip->suspended) == 0);
> mutex_lock(&chip->lock);
> - if (chip->suspended) {
> - mutex_unlock(&chip->lock);
> - return -EBUSY;
> - }
There's a race here: the device might enter suspend again before you're
able to acquire the lock.
> mutex_lock(&chip->controller->lock);
>
> return 0;
> @@ -4562,11 +4561,15 @@ static int nand_suspend(struct mtd_info *mtd)
> struct nand_chip *chip = mtd_to_nand(mtd);
> int ret = 0;
>
> + atomic_inc(&chip->suspended);
> mutex_lock(&chip->lock);
And it's racy here as well: you mark the device as suspended before you
even acquired the lock.
> if (chip->ops.suspend)
> ret = chip->ops.suspend(chip);
> - if (!ret)
> - chip->suspended = 1;
> + if (ret) {
> + /* Wake things up again if suspend fails */
> + atomic_dec(&chip->suspended);
> + wake_up(&mtd->wait_queue);
> + }
> mutex_unlock(&chip->lock);
>
> return ret;
> @@ -4581,10 +4584,12 @@ static void nand_resume(struct mtd_info *mtd)
> struct nand_chip *chip = mtd_to_nand(mtd);
>
> mutex_lock(&chip->lock);
> - if (chip->suspended) {
> + if (atomic_read(&chip->suspended)) {
> if (chip->ops.resume)
> chip->ops.resume(chip);
> - chip->suspended = 0;
> +
> + atomic_dec(&chip->suspended);
> + wake_up(&mtd->wait_queue);
> } else {
> pr_err("%s called for a chip which is not in suspended state\n",
> __func__);
> @@ -5099,6 +5104,9 @@ static int nand_detect(struct nand_chip *chip, struct nand_flash_dev *type)
> pr_info("%d MiB, %s, erase size: %d KiB, page size: %d, OOB size: %d\n",
> (int)(targetsize >> 20), nand_is_slc(chip) ? "SLC" : "MLC",
> mtd->erasesize >> 10, mtd->writesize, mtd->oobsize);
> +
> + init_waitqueue_head(&mtd->wait_queue);
> +
It's an MTD field. It should be initialized somewhere in mtdcore.c.
> return 0;
>
> free_detect_allocation:
> @@ -6264,6 +6272,8 @@ static int nand_scan_tail(struct nand_chip *chip)
> if (chip->options & NAND_SKIP_BBTSCAN)
> return 0;
>
> + atomic_set(&chip->suspended, 0);
> +
> /* Build bad block table */
> ret = nand_create_bbt(chip);
> if (ret)
> diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
> index 88227044fc86..f7dcbc336170 100644
> --- a/include/linux/mtd/mtd.h
> +++ b/include/linux/mtd/mtd.h
> @@ -360,6 +360,8 @@ struct mtd_info {
> int (*_get_device) (struct mtd_info *mtd);
> void (*_put_device) (struct mtd_info *mtd);
>
> + wait_queue_head_t wait_queue;
> +
wait_queue doesn't really describe what this waitqueue is used for
(maybe resume_wq), and the suspended state should be here as well
(actually, there's one already).
Actually, what we need is a way to prevent the device from being
suspended while accesses are still in progress, and new accesses from
being queued if a suspend is pending. So, I think you need a readwrite
lock here:
* take the lock in read mode for all IO accesses, check the
mtd->suspended value
- if true, release the lock, and wait (retry on wakeup)
- if false, just do the IO
* take the lock in write mode when you want to suspend/resume the
device and update the suspended field. Call wake_up_all() in the
resume path
next prev parent reply other threads:[~2021-10-07 12:19 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-04 6:56 [RFC PATCH] mtd: rawnand: use mutex to protect access while in suspend Sean Nyekjaer
2021-10-04 8:41 ` Boris Brezillon
2021-10-04 8:55 ` Sean Nyekjaer
2021-10-04 9:58 ` Boris Brezillon
2021-10-04 10:12 ` Sean Nyekjaer
2021-10-04 11:47 ` Boris Brezillon
2021-10-05 7:09 ` Sean Nyekjaer
2021-10-05 8:23 ` Boris Brezillon
2021-10-05 8:49 ` Sean Nyekjaer
2021-10-05 8:58 ` Boris Brezillon
2021-10-07 11:43 ` Sean Nyekjaer
2021-10-07 12:18 ` Boris Brezillon [this message]
2021-10-07 12:39 ` Sean Nyekjaer
2021-10-07 13:14 ` Boris Brezillon
2021-10-08 10:04 ` Sean Nyekjaer
2021-10-08 11:20 ` Boris Brezillon
2021-10-08 11:54 ` Sean Nyekjaer
2021-10-08 12:15 ` Boris Brezillon
2021-10-08 14:38 ` [RFC PATCH 1/2] mtd: core: protect access to mtd devices " Sean Nyekjaer
2021-10-08 15:30 ` Boris Brezillon
2021-10-08 17:31 ` Sean Nyekjaer
2021-10-08 15:35 ` Miquel Raynal
2021-10-08 16:08 ` Boris Brezillon
2021-10-08 17:50 ` Sean Nyekjaer
2021-10-08 14:38 ` [RFC PATCH 2/2] mtd: rawnand: remove suspended check Sean Nyekjaer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211007141858.314533f2@collabora.com \
--to=boris.brezillon@collabora.com \
--cc=bbrezillon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mtd@lists.infradead.org \
--cc=miquel.raynal@bootlin.com \
--cc=richard@nod.at \
--cc=sean@geanix.com \
--cc=vigneshr@ti.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox