* Why aren't partitions limited to fit within the device?
@ 2006-10-12 23:50 Neil Brown
2006-10-13 1:02 ` Andries Brouwer
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Neil Brown @ 2006-10-12 23:50 UTC (permalink / raw)
To: linux-kernel; +Cc: aeb, Jens Axboe
Hi,
I was looking into an issue that someone was having with raid5.
They made an md/raid5 out of 5 whole devices and by luck the data
that was written to the first block of the 5th device looked
slightly like a partition table. fdisk output below for the curious.
However some partitions were beyond the end of the device.
This resulted in annoying error messages popping up when various
programs such as mount (mounting by label) looked at various
partitions.
The messages were like:
Buffer I/O error on device sde3, logical block 1794
This logical block is way beyond the end of sde.
What is happening is that the partition is found to be beyond the end
of the device, a message is printed about this
sde: p3 exceeds device capacity
but nothing is done about it.
So IO requests in the partition can go beyond the end of the device -
and the code that is supposed to protect devices from this doesn't.
In ll_rw_blk.c, in generic_make_request, the max bi_sector is checked
against maxsector and an error is reported if bi_sector is too large.
However this only happens before blk_partition_remap is called to map
the partition-sector to the device-sector. The result of
blk_partition_remap is clearly expected to still be in range
*
* NOTE: we don't repeat the blk_size check for each new device.
* Stacking drivers are expected to know what they are doing.
Partitioning isn't exactly a 'stacking driver', but it doesn't seem
to know what it is doing :-)
So: Is there any good reason to not clip the partitions to fit
within the device - and discard those that are completely beyond
the end of the device??
The patch at the end of the mail does that. Is it OK to submit this
to mainline?
NeilBrown
# fdisk /dev/sde
Command (m for help): p
Disk /dev/sde: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sde1 1 1 995+ c7 Syrinx
Partition 1 does not end on cylinder boundary.
/dev/sde2 1 1 0 0 Empty
Partition 2 does not end on cylinder boundary.
/dev/sde3 133675 133675 995+ c7 Syrinx
Partition 3 does not end on cylinder boundary.
/dev/sde4 1 1 0 0 Empty
Partition 4 does not end on cylinder boundary.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./block/ioctl.c | 6 ++++++
./fs/partitions/check.c | 12 ++++++++++--
2 files changed, 16 insertions(+), 2 deletions(-)
diff .prev/block/ioctl.c ./block/ioctl.c
--- .prev/block/ioctl.c 2006-10-09 14:25:11.000000000 +1000
+++ ./block/ioctl.c 2006-10-13 09:45:52.000000000 +1000
@@ -42,6 +42,12 @@ static int blkpg_ioctl(struct block_devi
|| pstart < 0 || plength < 0)
return -EINVAL;
}
+ /* Does partition fit within device */
+ if (start + length > get_capacity(disk)) {
+ if (start >= get_capacity(disk))
+ return -EINVAL;
+ length = get_capacity(disk) - start;
+ }
/* partition number in use? */
mutex_lock(&bdev->bd_mutex);
if (disk->part[part - 1]) {
diff .prev/fs/partitions/check.c ./fs/partitions/check.c
--- .prev/fs/partitions/check.c 2006-10-13 09:28:21.000000000 +1000
+++ ./fs/partitions/check.c 2006-10-13 09:46:12.000000000 +1000
@@ -466,8 +466,16 @@ int rescan_partitions(struct gendisk *di
if (!size)
continue;
if (from + size > get_capacity(disk)) {
- printk(" %s: p%d exceeds device capacity\n",
- disk->disk_name, p);
+ if (from >= get_capacity(disk)) {
+ printk(KERN_INFO
+ " %s: p%d starts beyond end of device\n",
+ disk->disk_name, p);
+ continue;
+ }
+ printk(KERN_INFO
+ " %s: p%d exceeds device capacity - clipping\n",
+ disk->disk_name, p);
+ size = get_capacity(disk) - from;
}
add_partition(disk, p, from, size);
#ifdef CONFIG_BLK_DEV_MD
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: Why aren't partitions limited to fit within the device? 2006-10-12 23:50 Why aren't partitions limited to fit within the device? Neil Brown @ 2006-10-13 1:02 ` Andries Brouwer 2006-10-13 1:31 ` Neil Brown 2006-10-13 15:07 ` Alan Cox 2006-10-15 8:29 ` Ville Herva 2 siblings, 1 reply; 20+ messages in thread From: Andries Brouwer @ 2006-10-13 1:02 UTC (permalink / raw) To: Neil Brown; +Cc: linux-kernel, Andries.Brouwer, Jens Axboe On Fri, Oct 13, 2006 at 09:50:49AM +1000, Neil Brown wrote: > So: Is there any good reason to not clip the partitions to fit > within the device - and discard those that are completely beyond > the end of the device?? Almost precisely this issue came up recently. If I recall correctly at that time the idea was to discard partitions that do not fit on the known disk. A bad idea. Your idea is better, namely to clip partitions. Still, there are a few reasons why one should be careful. One is the existence of clipped disks. There are various ways of making a disk appear smaller than it really is - there may be a HPA or DCO or so, or just a capacity-limiting jumper. This may mean that the kernel does not really know the size of the disk. The jumper may cause IDENTIFY to return a small size while actual I/O succeeds beyond that. Or, a SETMAX command is needed to make all of a partition available. Etc. One is the numbering of partitions. People are very unhappy when something causes their partitions to be renumbered. That is an argument against the discarding. In the forensics situation you want to take a copy of a disk. But often that is impractical - copying this 500GB disk takes too long, or the scratch disk is not large enough, and the copy only holds the initial part of a disk. You do not want to discard such partial partitions - maybe clipping is OK, although I would prefer to see precisely the same data on the copy as on the original, except of course that actually accessing nonexistent data returns an I/O error, but discarding would again cause renumbering. Bad. [As an aside: for the past twelve years or so I have muttered once a year that it is bad that Linux does automatic probing for partitions. It will be mistaken every now and then. With some partition types there is a fairly large probability that random data is seen as a partition table. A correct system does not guess (unless asked to guess by the user). A correct system is set up in such a way that the boot parameters tell it 1. the root disk, 2. the partition type of the root disk, 3. the root partition, 4. the filesystem type of the root filesystem. Now the root disk can contain configuration data that causes the system to look at specified disks in specified ways, or to do default things. With a system that was set up correctly, your nonsense partitions would never have been found by the kernel, and I suppose mount by label would not have encountered any problems.] Andries ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-13 1:02 ` Andries Brouwer @ 2006-10-13 1:31 ` Neil Brown 0 siblings, 0 replies; 20+ messages in thread From: Neil Brown @ 2006-10-13 1:31 UTC (permalink / raw) To: Andries Brouwer; +Cc: linux-kernel, Jens Axboe On Friday October 13, Andries.Brouwer@cwi.nl wrote: > > One is the existence of clipped disks. There are various ways of > making a disk appear smaller than it really is - there may be > a HPA or DCO or so, or just a capacity-limiting jumper. > This may mean that the kernel does not really know the size > of the disk. The jumper may cause IDENTIFY to return a small size > while actual I/O succeeds beyond that. Or, a SETMAX command is > needed to make all of a partition available. Etc. Yes..... but access to the whole device is clipped based on the kernel's understanding of the size of the device, and allowing code too by-pass that clipping by using a partition seems just *wrong*. Presumably a very-privileged BLKSETSIZE64 would be the appropriate thing for that sort of situation. If a device claims to be smaller will the low-level driver actually allow a request to a larger address at all??? > > One is the numbering of partitions. People are very unhappy > when something causes their partitions to be renumbered. > That is an argument against the discarding. Valid comment. My patch doesn't affect numbering. Partitions that are beyond the end of the device are not enabled at all, but that doesn't affect the numbering of later partitions. > > In the forensics situation you want to take a copy of a disk. > But often that is impractical - copying this 500GB disk takes too long, > or the scratch disk is not large enough, and the copy only holds the > initial part of a disk. > You do not want to discard such partial partitions - maybe clipping is OK, > although I would prefer to see precisely the same data on the copy as on > the original, except of course that actually accessing nonexistent data > returns an I/O error, but discarding would again cause renumbering. Bad. So your main points seem to be: - removing access that people currently have is bad - causing partitions to be renumbered is bad. I certainly agree. Unless people actually do use partitions to bypass the device-size check in generic_make_request, this patch doesn't do either these bad things. The only visible effect should be: - partitions that have no accessible blocks will no longer be able to be opened. - partitions that start in-range and end beyond range will appear smaller than they would have done. - reading the last block of a partition will no longer cause error messages that look like there are device errors. All of these seem to be acceptable effects. Do you agree? > > [As an aside: for the past twelve years or so I have muttered once a year > that it is bad that Linux does automatic probing for partitions. > It will be mistaken every now and then. > With some partition types there is a fairly large probability > that random data is seen as a partition table. > A correct system does not guess (unless asked to guess by the user). > A correct system is set up in such a way that the boot parameters tell it > 1. the root disk, 2. the partition type of the root disk, > 3. the root partition, 4. the filesystem type of the root filesystem. > Now the root disk can contain configuration data that causes the system > to look at specified disks in specified ways, or to do default things. > > With a system that was set up correctly, your nonsense partitions > would never have been found by the kernel, and I suppose mount by label > would not have encountered any problems.] I have substantial sympathy with your aside. But people don't want to have to configure things. They just want it to "work". I get this with md/raid all the time. "Just find all the arrays and assemble them". Like with partitions, that isn't always a well defined operation. But most of the time it is, and that is what people think they want. Ho hum. Thanks for your input. NeilBrown ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-12 23:50 Why aren't partitions limited to fit within the device? Neil Brown 2006-10-13 1:02 ` Andries Brouwer @ 2006-10-13 15:07 ` Alan Cox 2006-10-13 18:16 ` jdow 2006-10-16 0:08 ` Neil Brown 2006-10-15 8:29 ` Ville Herva 2 siblings, 2 replies; 20+ messages in thread From: Alan Cox @ 2006-10-13 15:07 UTC (permalink / raw) To: Neil Brown; +Cc: linux-kernel, aeb, Jens Axboe Ar Gwe, 2006-10-13 am 09:50 +1000, ysgrifennodd Neil Brown: > So: Is there any good reason to not clip the partitions to fit > within the device - and discard those that are completely beyond > the end of the device?? Its close but not quite the right approach > The patch at the end of the mail does that. Is it OK to submit this > to mainline? No I think not. Any partition which is partly outside the disk should be ignored entirely, that ensures it doesn't accidentally get mounted and trashed by an HPA or similar mixup. I agree it should be fixed, I just don't think your fix is actually a real fix in this case. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-13 15:07 ` Alan Cox @ 2006-10-13 18:16 ` jdow 2006-10-16 0:08 ` Neil Brown 1 sibling, 0 replies; 20+ messages in thread From: jdow @ 2006-10-13 18:16 UTC (permalink / raw) To: Alan Cox, Neil Brown; +Cc: linux-kernel, aeb, Jens Axboe From: "Alan Cox" <alan@lxorguk.ukuu.org.uk> > Ar Gwe, 2006-10-13 am 09:50 +1000, ysgrifennodd Neil Brown: >> So: Is there any good reason to not clip the partitions to fit >> within the device - and discard those that are completely beyond >> the end of the device?? > > Its close but not quite the right approach > >> The patch at the end of the mail does that. Is it OK to submit this >> to mainline? > > No I think not. Any partition which is partly outside the disk should be > ignored entirely, that ensures it doesn't accidentally get mounted and > trashed by an HPA or similar mixup. This is also a risk for users who have a partition that was poorly setup with an earlier version that allowed it to almost work. If there is data on that partition refusing to mount it can lead to massive data loss akin to a broken disk drive. <<jdow>> I'd propose allowing it to mount read-only and forcing read-only mode when attempts to mount read-write are made. That way the user never perceives a data loss situation and can take the appropriate steps to repair the partitioning error. {^_^} Joanne Dow ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-13 15:07 ` Alan Cox 2006-10-13 18:16 ` jdow @ 2006-10-16 0:08 ` Neil Brown 2006-10-16 1:22 ` Wakko Warner ` (4 more replies) 1 sibling, 5 replies; 20+ messages in thread From: Neil Brown @ 2006-10-16 0:08 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel, aeb, Jens Axboe On Friday October 13, alan@lxorguk.ukuu.org.uk wrote: > Ar Gwe, 2006-10-13 am 09:50 +1000, ysgrifennodd Neil Brown: > > So: Is there any good reason to not clip the partitions to fit > > within the device - and discard those that are completely beyond > > the end of the device?? > > Its close but not quite the right approach > > > The patch at the end of the mail does that. Is it OK to submit this > > to mainline? > > No I think not. Any partition which is partly outside the disk should be > ignored entirely, that ensures it doesn't accidentally get mounted and > trashed by an HPA or similar mixup. Hmmm.. So Alan things a partially-outside-this-disk partition shouldn't show up at all, and Andries thinks it should. And both give reasonably believable justifications. Maybe we need a kernel parameter? How about this? NeilBrown ----------------------------- Don't allow partitions to start or end beyond the end of the device. Corrupt partitions tables can cause wierd partitions that confuse programs. This is confusion that can be avoided. If a partition appears to start at or beyond the end of a device, we don't enable it. If it starts within the device but ends after the end, we clip it to fit within the device. Not enabling partitions does not affect partition numbering of subsequent partitions. This change applies to partitions found by fs/partitions/check.c and to partitions explicitly created via an ioctl. There is no uniform agreement on whether partitions that extend beyond the end of the device should be clipped or discarded. Discarding is safer as it makes corruption less likely. Clipping is more flexable and gives continued access to the partition. So provide a kernel-parameter which a 'safe' default. partitions=strict is the default partitions=relaxed means that partitions are clipped rather than rejected. This kernel parameters only applies to auto-detected partitions, not those set by ioctl. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./Documentation/kernel-parameters.txt | 8 ++++++++ ./block/ioctl.c | 6 ++++++ ./fs/partitions/check.c | 28 ++++++++++++++++++++++++++-- 3 files changed, 40 insertions(+), 2 deletions(-) diff .prev/Documentation/kernel-parameters.txt ./Documentation/kernel-parameters.txt --- .prev/Documentation/kernel-parameters.txt 2006-10-16 10:03:40.000000000 +1000 +++ ./Documentation/kernel-parameters.txt 2006-10-16 10:06:42.000000000 +1000 @@ -1148,6 +1148,14 @@ and is between 256 and 4096 characters. Currently this function knows 686a and 8231 chips. Format: [spp|ps2|epp|ecp|ecpepp] + partitions= How to interpret partition information that + could be corrupt. + 'strict' is the default. Partitions that + don't fit in the device are rejected. + 'relaxed' is an option. Partitions that start + within the device be end beyond the end are + clipped. + pas2= [HW,OSS] Format: <io>,<irq>,<dma>,<dma16>,<sb_io>,<sb_irq>,<sb_dma>,<sb_dma16> diff .prev/block/ioctl.c ./block/ioctl.c --- .prev/block/ioctl.c 2006-10-13 14:30:56.000000000 +1000 +++ ./block/ioctl.c 2006-10-16 09:54:40.000000000 +1000 @@ -42,6 +42,12 @@ static int blkpg_ioctl(struct block_devi || pstart < 0 || plength < 0) return -EINVAL; } + /* Does partition fit within device */ + if (start + length > get_capacity(disk)) { + if (start >= get_capacity(disk)) + return -EINVAL; + length = get_capacity(disk) - start; + } /* partition number in use? */ mutex_lock(&bdev->bd_mutex); if (disk->part[part - 1]) { diff .prev/fs/partitions/check.c ./fs/partitions/check.c --- .prev/fs/partitions/check.c 2006-10-13 14:30:56.000000000 +1000 +++ ./fs/partitions/check.c 2006-10-16 10:07:53.000000000 +1000 @@ -443,6 +443,8 @@ exit: } } +static int strict_partitions = 1; + int rescan_partitions(struct gendisk *disk, struct block_device *bdev) { struct parsed_partitions *state; @@ -466,8 +468,22 @@ int rescan_partitions(struct gendisk *di if (!size) continue; if (from + size > get_capacity(disk)) { - printk(" %s: p%d exceeds device capacity\n", - disk->disk_name, p); + if (strict_partitions) { + printk(KERN_INFO + " %s: p%d ends beyond end of device" + " - ignoring\n", disk->disk_name, p); + continue; + } + if (from >= get_capacity(disk)) { + printk(KERN_INFO + " %s: p%d starts beyond end of device" + " - ignoring\n", disk->disk_name, p); + continue; + } + printk(KERN_INFO + " %s: p%d exceeds device capacity - clipping\n", + disk->disk_name, p); + size = get_capacity(disk) - from; } add_partition(disk, p, from, size); #ifdef CONFIG_BLK_DEV_MD @@ -479,6 +495,14 @@ int rescan_partitions(struct gendisk *di return 0; } +static int __init part_setup(char *str) +{ + strict_partitions = strcmp(str, "relaxed"); + return 1; +} + +__setup("partitions=", part_setup); + unsigned char *read_dev_sector(struct block_device *bdev, sector_t n, Sector *p) { struct address_space *mapping = bdev->bd_inode->i_mapping; ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-16 0:08 ` Neil Brown @ 2006-10-16 1:22 ` Wakko Warner 2006-10-16 1:31 ` Randy Dunlap ` (3 subsequent siblings) 4 siblings, 0 replies; 20+ messages in thread From: Wakko Warner @ 2006-10-16 1:22 UTC (permalink / raw) To: Neil Brown; +Cc: Alan Cox, linux-kernel, aeb, Jens Axboe Neil Brown wrote: > On Friday October 13, alan@lxorguk.ukuu.org.uk wrote: > > No I think not. Any partition which is partly outside the disk should be > > ignored entirely, that ensures it doesn't accidentally get mounted and > > trashed by an HPA or similar mixup. > > Hmmm.. So Alan things a partially-outside-this-disk partition > shouldn't show up at all, and Andries thinks it should. > And both give reasonably believable justifications. > > Maybe we need a kernel parameter? How about this? How about something in /proc or /sys to force the partition to show or not? Maybe something like echo 1 > /sys/block/hda/hda4/visible ? Just an idea, my view would be not to introduce a new kernel parameter since you have to reboot just to see it. It's quite possible this would effect USB/Firewire storage devices. I have never been bitten by this problem however it would be nice not to have to reboot to solve one like this say on a slightly corrupted usb memory stick. A possibility of using words instead of numbers, in the above example could be one of: yes r/o no indicating full r/w, r/o, or not visible. I'm not so sure if the size of the partition that is visible should be the size according to the partition table or the size according to the start of the partition to the end of the device. I would probably choose to have the size to be the smaller number. I was bitten by a 2.4 to 2.6 conversion where the size of the disk was 1 sector larger than what I saw in 2.4, but it was a minor problem which was solved by masking off bit 0 of the size. -- Lab tests show that use of micro$oft causes cancer in lab animals Got Gas??? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-16 0:08 ` Neil Brown 2006-10-16 1:22 ` Wakko Warner @ 2006-10-16 1:31 ` Randy Dunlap 2006-10-16 4:09 ` Kyle Moffett ` (2 subsequent siblings) 4 siblings, 0 replies; 20+ messages in thread From: Randy Dunlap @ 2006-10-16 1:31 UTC (permalink / raw) To: Neil Brown; +Cc: Alan Cox, linux-kernel, aeb, Jens Axboe On Mon, 16 Oct 2006 10:08:50 +1000 Neil Brown wrote: > On Friday October 13, alan@lxorguk.ukuu.org.uk wrote: > > Ar Gwe, 2006-10-13 am 09:50 +1000, ysgrifennodd Neil Brown: > > > So: Is there any good reason to not clip the partitions to fit > > > within the device - and discard those that are completely beyond > > > the end of the device?? > > > > Its close but not quite the right approach > > > > > The patch at the end of the mail does that. Is it OK to submit this > > > to mainline? > > > > No I think not. Any partition which is partly outside the disk should be > > ignored entirely, that ensures it doesn't accidentally get mounted and > > trashed by an HPA or similar mixup. > > Hmmm.. So Alan things a partially-outside-this-disk partition > shouldn't show up at all, and Andries thinks it should. > And both give reasonably believable justifications. > > Maybe we need a kernel parameter? How about this? > > NeilBrown > > > ----------------------------- > Don't allow partitions to start or end beyond the end of the device. > > Corrupt partitions tables can cause wierd partitions that confuse > programs. This is confusion that can be avoided. > > If a partition appears to start at or beyond the end of a device, we > don't enable it. > If it starts within the device but ends after the end, we clip it to > fit within the device. > > Not enabling partitions does not affect partition numbering of > subsequent partitions. > > This change applies to partitions found by fs/partitions/check.c > and to partitions explicitly created via an ioctl. > > There is no uniform agreement on whether partitions that extend > beyond the end of the device should be clipped or discarded. > Discarding is safer as it makes corruption less likely. > Clipping is more flexable and gives continued access to the partition. > So provide a kernel-parameter which a 'safe' default. > > partitions=strict > is the default > partitions=relaxed > means that partitions are clipped rather than rejected. > This kernel parameters only applies to auto-detected partitions, > not those set by ioctl. > > > Signed-off-by: Neil Brown <neilb@suse.de> > > ### Diffstat output > ./Documentation/kernel-parameters.txt | 8 ++++++++ > ./block/ioctl.c | 6 ++++++ > ./fs/partitions/check.c | 28 ++++++++++++++++++++++++++-- > 3 files changed, 40 insertions(+), 2 deletions(-) > > diff .prev/Documentation/kernel-parameters.txt ./Documentation/kernel-parameters.txt > --- .prev/Documentation/kernel-parameters.txt 2006-10-16 10:03:40.000000000 +1000 > +++ ./Documentation/kernel-parameters.txt 2006-10-16 10:06:42.000000000 +1000 > @@ -1148,6 +1148,14 @@ and is between 256 and 4096 characters. > Currently this function knows 686a and 8231 chips. > Format: [spp|ps2|epp|ecp|ecpepp] > > + partitions= How to interpret partition information that > + could be corrupt. > + 'strict' is the default. Partitions that > + don't fit in the device are rejected. > + 'relaxed' is an option. Partitions that start > + within the device be end beyond the end are s/be/but/ ?? > + clipped. > + --- ~Randy ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-16 0:08 ` Neil Brown 2006-10-16 1:22 ` Wakko Warner 2006-10-16 1:31 ` Randy Dunlap @ 2006-10-16 4:09 ` Kyle Moffett 2006-10-16 6:40 ` Andries Brouwer 2006-10-16 7:28 ` Xavier Bestel 4 siblings, 0 replies; 20+ messages in thread From: Kyle Moffett @ 2006-10-16 4:09 UTC (permalink / raw) To: Neil Brown; +Cc: Alan Cox, linux-kernel, aeb, Jens Axboe On Oct 15, 2006, at 20:08:50, Neil Brown wrote: > Hmmm.. So Alan things a partially-outside-this-disk partition > shouldn't show up at all, and Andries thinks it should. > And both give reasonably believable justifications. > > Maybe we need a kernel parameter? How about this? > > [...snip...] > So provide a kernel-parameter which a 'safe' default. > > partitions=strict > is the default > partitions=relaxed > means that partitions are clipped rather than rejected. > This kernel parameters only applies to auto-detected partitions, > not those set by ioctl. Perhaps it should also support partitions=none; so that those of us who want to use a small userspace program and device-mapper to do the partition discovery and access may do so without worrying about how the kernel perceives the partitions (if at all). It also makes it fairly trivial to add support for entirely new parition types without having to modify the kernel at all. Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-16 0:08 ` Neil Brown ` (2 preceding siblings ...) 2006-10-16 4:09 ` Kyle Moffett @ 2006-10-16 6:40 ` Andries Brouwer 2006-10-16 6:43 ` dean gaudet 2006-10-16 7:28 ` Xavier Bestel 4 siblings, 1 reply; 20+ messages in thread From: Andries Brouwer @ 2006-10-16 6:40 UTC (permalink / raw) To: Neil Brown; +Cc: Alan Cox, linux-kernel, Andries.Brouwer, Jens Axboe On Mon, Oct 16, 2006 at 10:08:50AM +1000, Neil Brown wrote: > Hmmm.. So Alan thinks a partially-outside-this-disk partition > shouldn't show up at all, and Andries thinks it should. > And both give reasonably believable justifications. > > Maybe we need a kernel parameter? If you introduce a kernel parameter, let it be one that tells the kernel to stay away from partitions, so that partitions can be added later by the partx ioctls. Now setting up an initrd or so is kind of tricky, not something ordinary users would want to do, so if it moreover is possible to specify the rootpttype that would allow an ordinary boot and leave all other block devices untouched. > Not enabling partitions does not affect partition numbering of > subsequent partitions. A funny effect might be that hda5 exists, hda6 does not, and hda7 exists again. Maybe unexpected for some software. Andries ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-16 6:40 ` Andries Brouwer @ 2006-10-16 6:43 ` dean gaudet 0 siblings, 0 replies; 20+ messages in thread From: dean gaudet @ 2006-10-16 6:43 UTC (permalink / raw) To: Andries Brouwer; +Cc: Neil Brown, Alan Cox, linux-kernel, Jens Axboe On Mon, 16 Oct 2006, Andries Brouwer wrote: > A funny effect might be that hda5 exists, hda6 does not, and hda7 exists again. > Maybe unexpected for some software. why would that be unexpected? that seems entirely normal these days with udev... # ls /dev/hda? /dev/hda1 /dev/hda2 /dev/hda5 -dean ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-16 0:08 ` Neil Brown ` (3 preceding siblings ...) 2006-10-16 6:40 ` Andries Brouwer @ 2006-10-16 7:28 ` Xavier Bestel 2006-10-16 10:54 ` Alan Cox 4 siblings, 1 reply; 20+ messages in thread From: Xavier Bestel @ 2006-10-16 7:28 UTC (permalink / raw) To: Neil Brown; +Cc: Alan Cox, linux-kernel, aeb, Jens Axboe On Mon, 2006-10-16 at 10:08 +1000, Neil Brown wrote: > On Friday October 13, alan@lxorguk.ukuu.org.uk wrote: > > Ar Gwe, 2006-10-13 am 09:50 +1000, ysgrifennodd Neil Brown: > > > So: Is there any good reason to not clip the partitions to fit > > > within the device - and discard those that are completely beyond > > > the end of the device?? > > > > Its close but not quite the right approach > > > > > The patch at the end of the mail does that. Is it OK to submit this > > > to mainline? > > > > No I think not. Any partition which is partly outside the disk should be > > ignored entirely, that ensures it doesn't accidentally get mounted and > > trashed by an HPA or similar mixup. > > Hmmm.. So Alan things a partially-outside-this-disk partition > shouldn't show up at all, and Andries thinks it should. > And both give reasonably believable justifications. Maybe the whole part table should be marked as "weird" to let userspace run a diagnostics/repair tool on the disk. Xav ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-16 7:28 ` Xavier Bestel @ 2006-10-16 10:54 ` Alan Cox 2006-10-17 6:05 ` Chris Wedgwood 0 siblings, 1 reply; 20+ messages in thread From: Alan Cox @ 2006-10-16 10:54 UTC (permalink / raw) To: Xavier Bestel; +Cc: Neil Brown, linux-kernel, aeb, Jens Axboe Ar Llu, 2006-10-16 am 09:28 +0200, ysgrifennodd Xavier Bestel: > > Hmmm.. So Alan things a partially-outside-this-disk partition > > shouldn't show up at all, and Andries thinks it should. > > And both give reasonably believable justifications. > > Maybe the whole part table should be marked as "weird" to let userspace > run a diagnostics/repair tool on the disk. I actually like the "read only" suggestion that was made. Allow data access but protect from damage. Both options will allow 'repair' of a broken partition table as the partition will show up in fdisk which accesses stuff directly, and when that causes a revalidate it will re-appear to the kernel. Incidentally the question of exactly what libata should do about HPA handling also needs sorting out. Alan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-16 10:54 ` Alan Cox @ 2006-10-17 6:05 ` Chris Wedgwood 0 siblings, 0 replies; 20+ messages in thread From: Chris Wedgwood @ 2006-10-17 6:05 UTC (permalink / raw) To: Alan Cox; +Cc: Xavier Bestel, Neil Brown, linux-kernel, aeb, Jens Axboe On Mon, Oct 16, 2006 at 11:54:32AM +0100, Alan Cox wrote: > Incidentally the question of exactly what libata should do about HPA > handling also needs sorting out. HPA isn't even uncommon, lots of laptops use it for recovery purposes it seems. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-12 23:50 Why aren't partitions limited to fit within the device? Neil Brown 2006-10-13 1:02 ` Andries Brouwer 2006-10-13 15:07 ` Alan Cox @ 2006-10-15 8:29 ` Ville Herva 2006-10-15 23:50 ` Neil Brown 2 siblings, 1 reply; 20+ messages in thread From: Ville Herva @ 2006-10-15 8:29 UTC (permalink / raw) To: Neil Brown; +Cc: linux-kernel, aeb, Jens Axboe On Fri, Oct 13, 2006 at 09:50:49AM +1000, you [Neil Brown] wrote: > > Hi, > I was looking into an issue that someone was having with raid5. > They made an md/raid5 out of 5 whole devices and by luck the data > that was written to the first block of the 5th device looked > slightly like a partition table. fdisk output below for the curious. > However some partitions were beyond the end of the device. That reminds me of an old long-standing mystery I had with a machine that had a RAID-5 of three whole devices. I wonder if there's ever a change the kernel partition detection code could _write_ on the disk, even when there's really no partition table? Below is a description of the problem. I'm afraid I only replicated in on 2.2 and 2.4, but it just might still be present in 2.6. Unfortunately, the actual raid device is now on different disks that have partition table on them. I'm just asking if this rings any bells, since I really spent a long time debugging it, and never found a real clue. ------------------------------------------------------------ The raid device is: md4 : active raid5 hdc[2] hdb[1] hdg[0] 156367872 blocks level 5, 16k chunk, algorithm 0 [3/3] [UUU] The kernel is 2.2.x + RAID-0.90 patch. Fs is ext2. After unmounting the filesystem, I can mount it again without problems. I can also raidstop the raid device in between and all is still fine: -> umount /dev/md4; mount /dev/md4 - no corruption -> umount /dev/md4; raidstop /dev/md4; raidstart /dev/md4; mount /dev/md4 - no corruption But after a reboot, the filesystem is corrupted - few bytes differ in the beginning of /dev/md4 between 1k and and 5k. cmp -l md4 afterboot/md4 1083 1 3 4641 35 0 4642 205 0 4643 10 0 bytepos after before boot boot I found out that the difference (corruption) is usually on three bytes on /dev/hdg, but sometimes on /dev/hdc, too. (/dev/md4 = hdb+hdc+hdg; hdb&hdc are on i810, hdg is on hpt370). First, I did umount /dev/md4 raidstop /dev/md4 head -c 50k /dev/hdg > /save/hdg reboot To rule out kernel raid autodetect and raid code in general, I booted 2.2.25 with "single init=/bin/bash raid=noautodetect". head -c50k /dev/hdg | cmp -l /save/hdg Three bytes differed: 4641 0 35 4642 0 205 4643 0 10 bytepos after before boot boot wrote the original stuff back: dd if=/save/hdg /dev/hdg sync hdparm -W0 /dev/hdg sync reboot Booted 2.2.25 with "single init=/bin/bash raid=noautodetect" again. Did head -c50k /dev/hdg | cmp -l /save/hdg Three same three bytes differed again. Wrote the stuff back, synced, did hdparm, and powered off. Still, the the bytes differed on next boot. Then I booted 2.4.21 with "single init=/bin/bash raid=noautodetect" (I happened to have 2.4.21 compiled with suitable drivers at hand). Wrote the same stuff back with dd, synced, turned ide cache off. Booted 2.4.21 with "single init=/bin/bash raid=noautodetect" again. Did the diff; the three bytes differed again. Note that sometimes few bytes on hdc differed, too. Usually it was just the three hdg bytes. So this is not a 2.2 kernel issue. I suspect it might not be a kernel issue at all. Unless it is a bug in kernel partition detection that is still present in 2.4.x, perhaps in 2.6. I tried to turn off the ide write cache with hdparm -W0, so it shouldn't be a write caching issue. If it's a bios issue, it's really a strange one, since it affects both disks on i810 ide and on hpt370. The disks have no partition table, though, which _could_ confuse the bios. Any ideas? Who could write to those three bytes, and why? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-15 8:29 ` Ville Herva @ 2006-10-15 23:50 ` Neil Brown 2006-10-16 5:27 ` Ville Herva 2006-10-16 6:02 ` Andries Brouwer 0 siblings, 2 replies; 20+ messages in thread From: Neil Brown @ 2006-10-15 23:50 UTC (permalink / raw) To: vherva; +Cc: linux-kernel, aeb, Jens Axboe On Sunday October 15, vherva@vianova.fi wrote: > On Fri, Oct 13, 2006 at 09:50:49AM +1000, you [Neil Brown] wrote: > > > > Hi, > > I was looking into an issue that someone was having with raid5. > > They made an md/raid5 out of 5 whole devices and by luck the data > > that was written to the first block of the 5th device looked > > slightly like a partition table. fdisk output below for the curious. > > However some partitions were beyond the end of the device. > > That reminds me of an old long-standing mystery I had with a machine that > had a RAID-5 of three whole devices. > > I wonder if there's ever a change the kernel partition detection code could > _write_ on the disk, even when there's really no partition table? No, kernel partition detection never writes. I can't imagine what could have been causing your very-interesting corruption. However as it was only demonstrated on old code and cannot be explored further, I suspect we just have to forget it and move one :-( NeilBrown ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-15 23:50 ` Neil Brown @ 2006-10-16 5:27 ` Ville Herva 2006-10-16 6:02 ` Andries Brouwer 1 sibling, 0 replies; 20+ messages in thread From: Ville Herva @ 2006-10-16 5:27 UTC (permalink / raw) To: Neil Brown; +Cc: linux-kernel, aeb, Jens Axboe On Mon, Oct 16, 2006 at 09:50:15AM +1000, you [Neil Brown] wrote: > On Sunday October 15, vherva@vianova.fi wrote: > > On Fri, Oct 13, 2006 at 09:50:49AM +1000, you [Neil Brown] wrote: > > > > > > Hi, > > > I was looking into an issue that someone was having with raid5. > > > They made an md/raid5 out of 5 whole devices and by luck the data > > > that was written to the first block of the 5th device looked > > > slightly like a partition table. fdisk output below for the curious. > > > However some partitions were beyond the end of the device. > > > > That reminds me of an old long-standing mystery I had with a machine that > > had a RAID-5 of three whole devices. > > > > I wonder if there's ever a change the kernel partition detection code could > > _write_ on the disk, even when there's really no partition table? > > No, kernel partition detection never writes. Well yeah, that sounded very improbable to begin with. But then again, there were very few things that could've done it apart from kernel early boot/late shutdown code, bios and cosmic radiation. > I can't imagine what could have been causing your very-interesting > corruption. However as it was only demonstrated on old code and > cannot be explored further, I suspect we just have to forget it and > move one :-( Yes, that's certainly the most sensible thing to do. Although it did happen with both 2.2 and 2.4 (never tried 2.6 unfortunately), but only on a single box, and I never heard of anybody else seeing anything similar, so it can't be a wide spread problem even it still were present. Thanks for the reply. -- v -- v@iki.fi ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-15 23:50 ` Neil Brown 2006-10-16 5:27 ` Ville Herva @ 2006-10-16 6:02 ` Andries Brouwer 2006-10-16 6:20 ` Ville Herva 2006-10-16 6:27 ` dean gaudet 1 sibling, 2 replies; 20+ messages in thread From: Andries Brouwer @ 2006-10-16 6:02 UTC (permalink / raw) To: Neil Brown; +Cc: vherva, linux-kernel, Andries.Brouwer, Jens Axboe On Mon, Oct 16, 2006 at 09:50:15AM +1000, Neil Brown wrote: > On Sunday October 15, vherva@vianova.fi wrote: > > I wonder if there's ever a change the kernel partition detection code could > > _write_ on the disk, even when there's really no partition table? > > No, kernel partition detection never writes. There is something else that writes, however, that I have gotten complaints about. (But I have not investigated.) People doing forensics take a copy of a disk and want to preserve that copy as-is, never changing a single bit, only looking at it. But it is reported that also when a partition is mounted read-only, the journaling code of ext3 will write to the journal. Andries ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-16 6:02 ` Andries Brouwer @ 2006-10-16 6:20 ` Ville Herva 2006-10-16 6:27 ` dean gaudet 1 sibling, 0 replies; 20+ messages in thread From: Ville Herva @ 2006-10-16 6:20 UTC (permalink / raw) To: Andries Brouwer; +Cc: Neil Brown, linux-kernel On Mon, Oct 16, 2006 at 08:02:27AM +0200, you [Andries Brouwer] wrote: > On Mon, Oct 16, 2006 at 09:50:15AM +1000, Neil Brown wrote: > > On Sunday October 15, vherva@vianova.fi wrote: > > > > I wonder if there's ever a change the kernel partition detection code could > > > _write_ on the disk, even when there's really no partition table? > > > > No, kernel partition detection never writes. > > There is something else that writes, however, that I have gotten complaints about. > (But I have not investigated.) > People doing forensics take a copy of a disk and want to preserve > that copy as-is, never changing a single bit, only looking at it. > But it is reported that also when a partition is mounted read-only, > the journaling code of ext3 will write to the journal. Yes, that's (sort of) known. As in "mentioned on lkml" - perhaps even in some documentation. In this case, the fs was ext2, though, and the problem occurred even when the fs was never mounted and the raid device was never started. -- v -- v@iki.fi ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Why aren't partitions limited to fit within the device? 2006-10-16 6:02 ` Andries Brouwer 2006-10-16 6:20 ` Ville Herva @ 2006-10-16 6:27 ` dean gaudet 1 sibling, 0 replies; 20+ messages in thread From: dean gaudet @ 2006-10-16 6:27 UTC (permalink / raw) To: Andries Brouwer; +Cc: Neil Brown, vherva, linux-kernel, Jens Axboe On Mon, 16 Oct 2006, Andries Brouwer wrote: > On Mon, Oct 16, 2006 at 09:50:15AM +1000, Neil Brown wrote: > > On Sunday October 15, vherva@vianova.fi wrote: > > > > I wonder if there's ever a change the kernel partition detection code could > > > _write_ on the disk, even when there's really no partition table? > > > > No, kernel partition detection never writes. > > There is something else that writes, however, that I have gotten complaints about. > (But I have not investigated.) > People doing forensics take a copy of a disk and want to preserve > that copy as-is, never changing a single bit, only looking at it. > But it is reported that also when a partition is mounted read-only, > the journaling code of ext3 will write to the journal. xfs has a similar problem -- on umount it writes even if the filesystem was mounted read-only. for example here's a test with a small loopback device, note the two WRITEs to loop0 near the bottom of the block_dump. # echo 1 >/proc/sys/vm/block_dump; mount -o ro /dev/loop/0 /mnt; sleep 1; umount /mnt; sleep 1; echo 0 >/proc/sys/vm/block_dump <7>mount(14602): READ block 0 on loop0 <7>mount(14602): READ block 8 on loop0 <7>mount(14602): READ block 16 on loop0 <7>mount(14602): READ block 24 on loop0 <7>mount(14602): READ block 32 on loop0 <7>mount(14602): READ block 40 on loop0 <7>mount(14602): READ block 48 on loop0 <7>mount(14602): READ block 56 on loop0 <7>mount(14602): READ block 64 on loop0 <7>mount(14602): READ block 72 on loop0 <7>mount(14602): READ block 80 on loop0 <7>mount(14602): READ block 88 on loop0 <7>mount(14602): READ block 96 on loop0 <7>mount(14602): READ block 104 on loop0 <7>mount(14602): READ block 112 on loop0 <7>mount(14602): READ block 120 on loop0 <7>mount(14602): READ block 128 on loop0 <7>mount(14602): READ block 136 on loop0 <7>mount(14602): READ block 144 on loop0 <7>mount(14602): READ block 152 on loop0 <7>mount(14602): READ block 160 on loop0 <7>mount(14602): READ block 168 on loop0 <7>mount(14602): READ block 176 on loop0 <7>mount(14602): READ block 184 on loop0 <7>mount(14602): READ block 192 on loop0 <7>mount(14602): READ block 200 on loop0 <7>mount(14602): READ block 208 on loop0 <7>mount(14602): READ block 216 on loop0 <7>mount(14602): READ block 224 on loop0 <7>mount(14602): READ block 232 on loop0 <7>mount(14602): READ block 240 on loop0 <7>mount(14602): READ block 248 on loop0 <7>mount(14602): READ block 0 on loop0 <7>mount(14602): READ block 524280 on loop0 <5>XFS mounting filesystem loop0 <7>mount(14602): READ block 262176 on loop0 <7>mount(14602): READ block 271775 on loop0 <7>mount(14602): READ block 266975 on loop0 <7>mount(14602): READ block 264575 on loop0 <7>mount(14602): READ block 263375 on loop0 <7>mount(14602): READ block 262775 on loop0 <7>mount(14602): READ block 262475 on loop0 <7>mount(14602): READ block 262325 on loop0 <7>mount(14602): READ block 262250 on loop0 <7>mount(14602): READ block 262213 on loop0 <7>mount(14602): READ block 262194 on loop0 <7>mount(14602): READ block 262185 on loop0 <7>mount(14602): READ block 262180 on loop0 <7>mount(14602): READ block 262178 on loop0 <7>mount(14602): READ block 262179 on loop0 <7>mount(14602): READ block 262176 on loop0 <7>mount(14602): READ block 262176 on loop0 <7>mount(14602): READ block 262179 on loop0 <7>mount(14602): READ block 262178 on loop0 <7>mount(14602): READ block 262179 on loop0 <7>mount(14602): READ block 64 on loop0 <7>Ending clean XFS mount for filesystem: loop0 <7>mount(14602): dirtied inode 8880357 (blkid.tab-rOdAVQ) on md3 <7>mount(14602): dirtied inode 8880459 (?) on md3 <7>md3_raid1(821): WRITE block 147539744 on sdb4 <7>md3_raid1(821): WRITE block 147539744 on sda4 <7>md3_raid1(821): WRITE block 147539760 on sdb4 <7>md3_raid1(821): WRITE block 147539760 on sda4 <7>md3_raid1(821): WRITE block 147539768 on sdb4 <7>md3_raid1(821): WRITE block 147539768 on sda4 <7>md3_raid1(821): WRITE block 147539720 on sdb4 <7>md3_raid1(821): WRITE block 147539720 on sda4 <7>md3_raid1(821): WRITE block 147539784 on sdb4 <7>md3_raid1(821): WRITE block 147539784 on sda4 <7>umount(14607): WRITE block 0 on loop0 <7>loop0(14289): dirtied inode 19 (test) on md3 <7>umount(14607): WRITE block 0 on loop0 <7>umount(14607): dirtied inode 8880877 (mtab.tmp) on md3 <7>zsh(4125): dirtied inode 4026531942 (block_dump) on proc -dean ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2006-10-17 6:05 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-10-12 23:50 Why aren't partitions limited to fit within the device? Neil Brown 2006-10-13 1:02 ` Andries Brouwer 2006-10-13 1:31 ` Neil Brown 2006-10-13 15:07 ` Alan Cox 2006-10-13 18:16 ` jdow 2006-10-16 0:08 ` Neil Brown 2006-10-16 1:22 ` Wakko Warner 2006-10-16 1:31 ` Randy Dunlap 2006-10-16 4:09 ` Kyle Moffett 2006-10-16 6:40 ` Andries Brouwer 2006-10-16 6:43 ` dean gaudet 2006-10-16 7:28 ` Xavier Bestel 2006-10-16 10:54 ` Alan Cox 2006-10-17 6:05 ` Chris Wedgwood 2006-10-15 8:29 ` Ville Herva 2006-10-15 23:50 ` Neil Brown 2006-10-16 5:27 ` Ville Herva 2006-10-16 6:02 ` Andries Brouwer 2006-10-16 6:20 ` Ville Herva 2006-10-16 6:27 ` dean gaudet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox