From: Steven Dake <sdake@mvista.com>
To: Bryan Henderson <hbryan@us.ibm.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>,
Douglas Gilbert <dougg@torque.net>,
Christoph Hellwig <hch@infradead.org>,
Joel Becker <Joel.Becker@oracle.com>,
Kurt Garloff <kurt@garloff.de>,
linux-kernel@kernel.vger.org,
Linux SCSI list <linux-scsi@vger.kernel.org>
Subject: Re: New model for managing dev_t's for partitionable block devices
Date: Wed, 29 Jan 2003 09:45:38 -0700 [thread overview]
Message-ID: <3E380532.2010900@mvista.com> (raw)
In-Reply-To: <OFD9CDB238.908E6C3C-ON87256CBD.00070A30-88256CBD.00095D4B@us.ibm.com>
Bryan Henderson wrote:
>
>
>
>
>>the device
>>mapper code could be used to provide partition devices in another
>>major/group of majors.
>>
>>
>
>If I understand what you're saying, this has been discussed before. I
>don't know what the device mapper code is, but it's actually quite elegant
>if it is a regular device driver that derives multiple logical disk drives
>from a single physical one in the same way that the md device driver
>derives a single logical disk drive from multiple physical ones. The
>layering is cleaner that way.
>
>
this is exactly how lvm works.
>The last time I remember this discussed, it was as a solution to the
>problem of a device driver presuming to access at initialization time a
>partition map that didn't really exist. I don't remember the details, but
>this particular device wasn't ready to handle data reads until some time
>after initialization. You ought to be able to initialize a device that you
>plan to use only as a raw device without Linux attempting to make
>partitions on it.
>
>As I recall, there weren't any fundamental objections to this.
>
>
>
>>partitions could be dynamically allocated out of the minor list
>>
>>
>
>Doesn't this exacerbate the Linux SCSI drive name binding problem? It's
>bad enough that when you remove your /dev/sda and reboot, your /dev/sdc
>becomes /dev/sdb. With this, it sounds like when you delete a partition on
>/dev/sda, your partitions on /dev/sdb change names.
>
>
This is a problem with hotswap of course, and shouldn't be solved by the
kernel putting the same device always in the same major/minor. A
userspace application should query the OS and build the device nodes
based upon scsi serial number, FC port WWN, or access path
(host/channel/id/lun). The current "MAKEDEV" works fine for people with
and ide disk and cdrom, but for real systems with lots of disks and
hotswap capabilities, static naming just doesn't work (as you have
said). :) Devfs solves the naming problem by using access path
automatically within the OS. Downside of this methodology is that
access permissions are not persistent between reboots (which is one
significant limitation of devfs). There is a utility called scsidev
which does the above of building device nodes based upon serial number
instead of dumb /dev/sda.
>
>
>>As an example, Lets assume we want 4096 total disks with 16384 total
>>partitions (4 partitions per disk, where it is likely to be less):
>>
>>
>
>We should keep in mind that as a practical matter, someone with 4096
>physical disks is unlikely to be partitioning at all. Partitions are for
>the poor person who has only a handful of physical disks and wants to
>divide his data into more pieces than that. Also note that if you have
>4096 "physical" devices, they probably aren't very physical at all --
>there's some subsystem on the other end of the SCSI link that carves
>variable-size devices out of a pool of storage. Hence, even less reason
>for Linux to partition them.
>
>
>
>
I agree using partitions is unlikely with large amounts of disks.
Someone should be using LVM to manage those disks if they have a large
amount. Unfortunately even though no partitions are needed, 4096 disks
still require 16 dev_t minors for each device. This is a significant
waste of space. The user could hack their kernel to remove the
partitions entirely, which someone has already designed a patch to do.
This isn't general purpose enough to be useable by the linux user.
What is needed is a compromise, described above, limiting the number of
partitions to some sane amount, but allowing significantly more disks
for the power user.
Thanks for your comments.
-steve
next prev parent reply other threads:[~2003-01-29 16:45 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-01-21 21:56 Fwd: 32bit dev_t Douglas Gilbert
2003-01-22 17:18 ` Patrick Mansfield
2003-01-22 23:55 ` Tim Pepper
2003-01-23 0:51 ` Joel Becker
2003-01-23 18:31 ` Kurt Garloff
2003-01-23 22:18 ` Christoph Hellwig
2003-01-24 8:20 ` Kurt Garloff
2003-01-24 18:29 ` Joel Becker
2003-01-27 22:51 ` Christoph Hellwig
2003-01-28 11:21 ` Alan Cox
2003-01-28 11:28 ` Christoph Hellwig
2003-01-28 15:19 ` Kurt Garloff
2003-01-28 16:33 ` Bryan Henderson
2003-01-28 18:22 ` Alan Cox
2003-01-28 17:09 ` New model for managing dev_t's for partitionable block devices Steven Dake
2003-01-29 1:41 ` Bryan Henderson
2003-01-29 16:45 ` Steven Dake [this message]
2003-01-29 17:38 ` Andries Brouwer
2003-01-29 18:00 ` Kurt Garloff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3E380532.2010900@mvista.com \
--to=sdake@mvista.com \
--cc=Joel.Becker@oracle.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=dougg@torque.net \
--cc=hbryan@us.ibm.com \
--cc=hch@infradead.org \
--cc=kurt@garloff.de \
--cc=linux-kernel@kernel.vger.org \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox