From: NeilBrown <neilb@suse.com>
To: Mikael Abrahamsson <swmike@swm.pp.se>
Cc: linux-raid@vger.kernel.org
Subject: Re: Linux Plumbers MD BOF discussion notes
Date: Wed, 04 Oct 2017 11:49:00 +1100 [thread overview]
Message-ID: <87lgkr3fgj.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <alpine.DEB.2.20.1710010730210.31961@uplift.swm.pp.se>
[-- Attachment #1: Type: text/plain, Size: 3706 bytes --]
On Sun, Oct 01 2017, Mikael Abrahamsson wrote:
> On Mon, 18 Sep 2017, NeilBrown wrote:
>
>> Anyway, thanks for the example of a real problem related to this. It
>> does make it easier to think about.
>
> Btw, if someone does --zero-superblock or dd /dev/zero to to a component
> device that is active, what happens when mdadm --stop /dev/mdX is run?
> Does it write out the complete superblock again?
--zero-superblock won't work on a device that is currently part of an
array. dd /dev/zero will.
When the array is stopped the metadata will be written if the array is
not read-only and is not clean.
So for 'linear' and 'raid0' it is never written. For others it probably
is but may not be.
I'm not sure that forcing a write makes sense. A dd could corrupt lots
of stuff, and just saving the metadata is not a big win.
I've been playing with some code, and this patch makes it impossible to
write to a device which is in-use by md.
Well... not exactly. If a partition is in-use by md, the whole device
can still be written to. But the partition itself cannot.
Also if metadata is managed by user-space, writes are still allowed.
To fix that, we would need to capture each write request and validate
the sector range. Not impossible, but ugly.
Also, by itself, this patch breaks the use of raid6check on an active
array. We could fix that by enabling writes whenever a region is
suspended.
Still... maybe it is a starting point for thinking about the problem.
NeilBrown
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0ff1bbf6c90e..7c469cd9febc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2264,6 +2264,7 @@ static int lock_rdev(struct md_rdev *rdev, dev_t dev, int shared)
pr_warn("md: could not open %s.\n", __bdevname(dev, b));
return PTR_ERR(bdev);
}
+ bdev->bd_holder_only_writes = !shared;
rdev->bdev = bdev;
return err;
}
@@ -2272,6 +2273,7 @@ static void unlock_rdev(struct md_rdev *rdev)
{
struct block_device *bdev = rdev->bdev;
rdev->bdev = NULL;
+ bdev->bd_holder_only_writes = 0;
blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
}
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 93d088ffc05c..673b71bac731 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1816,10 +1816,14 @@ void blkdev_put(struct block_device *bdev, fmode_t mode)
WARN_ON_ONCE(--bdev->bd_contains->bd_holders < 0);
/* bd_contains might point to self, check in a separate step */
- if ((bdev_free = !bdev->bd_holders))
+ if ((bdev_free = !bdev->bd_holders)) {
+ bdev->bd_holder_only_writes = 0;
bdev->bd_holder = NULL;
- if (!bdev->bd_contains->bd_holders)
+ }
+ if (!bdev->bd_contains->bd_holders) {
+ bdev->bd_contains->bd_holder_only_writes = 0;
bdev->bd_contains->bd_holder = NULL;
+ }
spin_unlock(&bdev_lock);
@@ -1884,8 +1888,13 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
loff_t size = i_size_read(bd_inode);
struct blk_plug plug;
ssize_t ret;
+ struct block_device *bdev = I_BDEV(bd_inode);
- if (bdev_read_only(I_BDEV(bd_inode)))
+ if (bdev_read_only(bdev))
+ return -EPERM;
+ if (bdev->bd_holder != NULL &&
+ bdev->bd_holder_only_writes &&
+ bdev->bd_holder != file)
return -EPERM;
if (!iov_iter_count(from))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 339e73742e73..79e3a2822867 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -424,6 +424,7 @@ struct block_device {
void * bd_holder;
int bd_holders;
bool bd_write_holder;
+ bool bd_holder_only_writes;
#ifdef CONFIG_SYSFS
struct list_head bd_holder_disks;
#endif
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2017-10-04 0:49 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-15 14:27 Linux Plumbers MD BOF discussion notes Shaohua Li
2017-09-15 20:42 ` Coly Li
2017-09-15 21:20 ` Shaohua Li
2017-09-16 0:08 ` NeilBrown
2017-09-18 4:54 ` Shaohua Li
2017-09-18 7:04 ` Mikael Abrahamsson
2017-09-18 8:56 ` NeilBrown
2017-10-01 5:32 ` Mikael Abrahamsson
2017-10-04 0:49 ` NeilBrown [this message]
2017-10-04 11:02 ` Artur Paszkiewicz
2017-10-04 11:23 ` Artur Paszkiewicz
2017-10-04 17:30 ` Piergiorgio Sartor
2017-10-04 18:03 ` John Stoffel
2017-10-04 21:18 ` Phil Turmel
2017-10-04 21:41 ` NeilBrown
2017-10-05 18:52 ` Artur Paszkiewicz
2017-10-05 23:39 ` NeilBrown
2017-10-06 7:13 ` Christoph Hellwig
2017-10-06 7:59 ` Mikael Abrahamsson
2017-10-04 17:28 ` Piergiorgio Sartor
2017-10-04 18:13 ` Anthony Youngman
2017-09-18 13:57 ` Wols Lists
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87lgkr3fgj.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=linux-raid@vger.kernel.org \
--cc=swmike@swm.pp.se \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).