All of lore.kernel.org
 help / color / mirror / Atom feed
* On re-working the major/minor system
  2001-12-07 12:08 ` Alan Cox
@ 2001-12-07 20:51   ` Erik Andersen
  2001-12-07 21:21     ` H. Peter Anvin
  0 siblings, 1 reply; 13+ messages in thread
From: Erik Andersen @ 2001-12-07 20:51 UTC (permalink / raw)
  To: Alan Cox; +Cc: dalecki, Linus Torvalds, linux-kernel

On Fri Dec 07, 2001 at 12:08:35PM +0000, Alan Cox wrote:
> > > major/minors for old stuff still end up leaking into user space and
> > > mattering there. I'm not sure the best option for that
> > 
> > Thta's no problem. But they should be used as hash values no the
> > syscall implementation level and nowhere else.
> 
> We have apps that "know" about specific major/minors that need changing and
> will take time - also some of them are closed source so unfixable.

Right.  Tons of apps have illicit insider knowledge of kernel
major/minor representation and NEED IT to do their job.  Try
running 'ls -l' on a device node.  Wow, it prints out major and
minor number.  You can pack up a tarball containing all of /dev
so tar has to has insider major/minor knowledge too -- as does
the structure of every existant tarball!  Check out, for example,
Section 10.1.1 (page 210) of the IEEE Std. 1003.1b-1993 (POSIX)
and you will see every tarball in existance stores 8 chars for
the major, and 8 chars for the minor....

So we have POSIX, ls, tar, du, mknod, and mount and tons of other
apps all with illicit insider knowledge of what a dev_t looks
like.   A couple of months ago I patched up mkfs.jffs2 so it
could create device nodes on the target filesystem that don't
really exist in the source directory (avoids the need to be root
when building filesystems).

Right now, you will find that a zillion user space apps currently
have little snippets of code looking like:

    /* FIXME:  MKDEV uses illicit insider knowledge of kernel 
     * major/minor representation...  */
    #define MINORBITS       8
    #define MKDEV(ma,mi)    (((ma) << MINORBITS) | (mi))

To currently, to do pretty much anything nifty related to devices
in usespace, usespace has to peek under the kernel's skirt to
know how to change a major and minor number into a dev_t and/or
to sanely populate a struct stat.

To change things, we 1) need some sortof sane interface by which
userspace can refer sensibly to devices without resorting to evil
illicit macros and 2) we certainly need some sort of a static
mapping such that existing devices end up mapping to the same
thing they always did or 3) we will need a flag day where we say
that all pre-2.5.x created tarballs and user space apps are
declared broken...

 -Erik

--
Erik B. Andersen             http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On re-working the major/minor system
  2001-12-07 20:51   ` On re-working the major/minor system Erik Andersen
@ 2001-12-07 21:21     ` H. Peter Anvin
  2001-12-07 21:55       ` Erik Andersen
  0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2001-12-07 21:21 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <20011207135100.A17683@codepoet.org>
By author:    Erik Andersen <andersen@codepoet.org>
In newsgroup: linux.dev.kernel
> 
> Right.  Tons of apps have illicit insider knowledge of kernel
> major/minor representation and NEED IT to do their job.  Try
> running 'ls -l' on a device node.  Wow, it prints out major and
> minor number.  You can pack up a tarball containing all of /dev
> so tar has to has insider major/minor knowledge too -- as does
> the structure of every existant tarball!  Check out, for example,
> Section 10.1.1 (page 210) of the IEEE Std. 1003.1b-1993 (POSIX)
> and you will see every tarball in existance stores 8 chars for
> the major, and 8 chars for the minor....
> 

Actually, it's not "tons of apps", it's in the C library itself.

These things are defined in <sys/sysmacros.h> and anyone who uses
anything else should be taken out and shot.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On re-working the major/minor system
  2001-12-07 21:21     ` H. Peter Anvin
@ 2001-12-07 21:55       ` Erik Andersen
  2001-12-07 22:04         ` H. Peter Anvin
  2001-12-09 12:06         ` Kai Henningsen
  0 siblings, 2 replies; 13+ messages in thread
From: Erik Andersen @ 2001-12-07 21:55 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

On Fri Dec 07, 2001 at 01:21:58PM -0800, H. Peter Anvin wrote:
> Followup to:  <20011207135100.A17683@codepoet.org>
> By author:    Erik Andersen <andersen@codepoet.org>
> In newsgroup: linux.dev.kernel
> > 
> > Right.  Tons of apps have illicit insider knowledge of kernel
> > major/minor representation and NEED IT to do their job.  Try
> > running 'ls -l' on a device node.  Wow, it prints out major and
> > minor number.  You can pack up a tarball containing all of /dev
> > so tar has to has insider major/minor knowledge too -- as does
> > the structure of every existant tarball!  Check out, for example,
> > Section 10.1.1 (page 210) of the IEEE Std. 1003.1b-1993 (POSIX)
> > and you will see every tarball in existance stores 8 chars for
> > the major, and 8 chars for the minor....
> > 
> 
> Actually, it's not "tons of apps", it's in the C library itself.

The C library, and the POSIX standard, etc, etc.

> These things are defined in <sys/sysmacros.h> and anyone who uses
> anything else should be taken out and shot.

Ok, so we go through, change sys/sysmacros.h, tar.h, cpio.h, and
any other offending header file.  And guess what?  Not only has
nothing changed (since those are macros, not functions), but you
just broke every older .deb and .rpm in existance on your updated
system.

In sys/sysmacros.h it defines major() and minor() as macros, so
just dropping in an updated C library binary isn't going to do
squat until all of userspace gets recompiled.  And tar.h and
cpio.h define long standing (well over 10 years now) binary
structures.  We can't just go changing this stuff, since now when
a dev_t is some magic cookie, if I go to install something from
my old Debian 1.2 CD or my old RedHat 4.0 CD, my system will puke
trying to install using cookies that in fact are old 8/8 split
device nodes and not cookies at all.

 -Erik

--
Erik B. Andersen             http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On re-working the major/minor system
  2001-12-07 21:55       ` Erik Andersen
@ 2001-12-07 22:04         ` H. Peter Anvin
  2001-12-07 23:07           ` Erik Andersen
  2001-12-09 12:06         ` Kai Henningsen
  1 sibling, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2001-12-07 22:04 UTC (permalink / raw)
  To: andersen; +Cc: linux-kernel

Erik Andersen wrote:

> 
> Ok, so we go through, change sys/sysmacros.h, tar.h, cpio.h, and
> any other offending header file.  And guess what?  Not only has
> nothing changed (since those are macros, not functions), but you
> just broke every older .deb and .rpm in existance on your updated
> system.
> 
> In sys/sysmacros.h it defines major() and minor() as macros, so
> just dropping in an updated C library binary isn't going to do
> squat until all of userspace gets recompiled.  And tar.h and
> cpio.h define long standing (well over 10 years now) binary
> structures.  We can't just go changing this stuff, since now when
> a dev_t is some magic cookie, if I go to install something from
> my old Debian 1.2 CD or my old RedHat 4.0 CD, my system will puke
> trying to install using cookies that in fact are old 8/8 split
> device nodes and not cookies at all.
> 


It's clear a painful change is needed.  **We don't have a choice.**
However, the fewer places we have to make source code changes the better.

What we agreed upon when this was discussed last year was the following:

dev_t is extended to a 12:20 (32-bit size.)  I personally would rather
have seen a 64-bit size (32:32) but was outvoted :(

New major 0 is reserved, except that dev_t == 0 remains the code for "no
device".  The unnamed device major becomes major 256.

If (dev_t & ~0xFFFF) == 0, the dev_t is interpreted as an old-format
dev_t, and is interpreted according to the following algorithm:

	if ( dev && (dev & ~0xFFFF) == 0 ) {
		major = (dev >> 8) ? (dev >> 8) : 256;
		minor = dev & 0xFF;
	} else {
		major = dev >> 20;
		minor = dev & 0xFFFFF;
	}

	-hpa





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On re-working the major/minor system
  2001-12-07 22:04         ` H. Peter Anvin
@ 2001-12-07 23:07           ` Erik Andersen
  2001-12-07 23:12             ` H. Peter Anvin
  0 siblings, 1 reply; 13+ messages in thread
From: Erik Andersen @ 2001-12-07 23:07 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

On Fri Dec 07, 2001 at 02:04:42PM -0800, H. Peter Anvin wrote:
> 
> It's clear a painful change is needed.  **We don't have a choice.**
> However, the fewer places we have to make source code changes the better.

Sure.  I'm not arguing again the change.  Just making sure
everyone 100% understands that we have just thown any prayer of
binary compatibility with anything less then 2.5.x....

But lets look on the bright side though.  Since we are going to
be having a flag day _anyways_ we may as well make the most of
it.  I can think of 20 things off the top of my head that are
being retained in the name of binary cmpatibilty that can easily
move to the trash bucket.  :)

For example, I would _love_ for Linux to standardize syscall
numbers across all architectures, guarantee that userspace gets
the exact same stack setup for all arches, we might as well fixup
proc, etc, etc, etc.


> What we agreed upon when this was discussed last year was the following:
> 
> dev_t is extended to a 12:20 (32-bit size.)  I personally would rather
> have seen a 64-bit size (32:32) but was outvoted :(
> 
> New major 0 is reserved, except that dev_t == 0 remains the code for "no
> device".  The unnamed device major becomes major 256.
> 
> If (dev_t & ~0xFFFF) == 0, the dev_t is interpreted as an old-format
> dev_t, and is interpreted according to the following algorithm:
> 
> 	if ( dev && (dev & ~0xFFFF) == 0 ) {
> 		major = (dev >> 8) ? (dev >> 8) : 256;
> 		minor = dev & 0xFF;
> 	} else {
> 		major = dev >> 20;
> 		minor = dev & 0xFFFFF;
> 	}

That works, and should prevent most major problems.  Hmm.  At
least for cpio there are 6 chars worth of device info in there,
so we coule easily go to 48 bits without RPM problems.  Or redhat
could fix rpm to use tarballs like debs do, and then we could go
to 64 bit devices no problem.

 -Erik

--
Erik B. Andersen             http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On re-working the major/minor system
  2001-12-07 23:07           ` Erik Andersen
@ 2001-12-07 23:12             ` H. Peter Anvin
  2001-12-08 11:42               ` Alan Cox
  0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2001-12-07 23:12 UTC (permalink / raw)
  To: andersen; +Cc: linux-kernel

Erik Andersen wrote:

> On Fri Dec 07, 2001 at 02:04:42PM -0800, H. Peter Anvin wrote:
> 
>>It's clear a painful change is needed.  **We don't have a choice.**
>>However, the fewer places we have to make source code changes the better.
>>
> 
> Sure.  I'm not arguing again the change.  Just making sure
> everyone 100% understands that we have just thown any prayer of
> binary compatibility with anything less then 2.5.x....
> 
> But lets look on the bright side though.  Since we are going to
> be having a flag day _anyways_ we may as well make the most of
> it.  I can think of 20 things off the top of my head that are
> being retained in the name of binary cmpatibilty that can easily
> move to the trash bucket.  :)
> 
> For example, I would _love_ for Linux to standardize syscall
> numbers across all architectures, guarantee that userspace gets
> the exact same stack setup for all arches, we might as well fixup
> proc, etc, etc, etc.
> 


Not going to happen.  Linux deliberately choose against that, because in
Linux, syscall numbers are generally (except x86) compatible with the
dominant vendor Unix on the platform.

> 
> That works, and should prevent most major problems.  Hmm.  At
> least for cpio there are 6 chars worth of device info in there,
> so we coule easily go to 48 bits without RPM problems.  Or redhat
> could fix rpm to use tarballs like debs do, and then we could go
> to 64 bit devices no problem.
> 


The big stubling block seems to be NFSv2.

	-hpa



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On re-working the major/minor system
  2001-12-07 23:12             ` H. Peter Anvin
@ 2001-12-08 11:42               ` Alan Cox
  2001-12-08 20:37                 ` H. Peter Anvin
  0 siblings, 1 reply; 13+ messages in thread
From: Alan Cox @ 2001-12-08 11:42 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: andersen, linux-kernel

> > That works, and should prevent most major problems.  Hmm.  At
> > least for cpio there are 6 chars worth of device info in there,
> > so we coule easily go to 48 bits without RPM problems.  Or redhat
> > could fix rpm to use tarballs like debs do, and then we could go

RPM can't easily use tarballs. Too much of a tar ball isnt rigidly defined so
you can cryptographically sign it.

> > to 64 bit devices no problem.
> 
> The big stubling block seems to be NFSv2.

Well 2.5 isnt going to be able to support NFS without a magic daemon
maintained translation table - so that when the kernel randomly changes the
major/minor number of an exported file system (eg a USB reconnect or even plain
boring shutdown/reboot) it can keep consistent file handles.

If you have a file handle table surely you can remap every NFS file handle
through that down to 32bits. For device files the problem doesn't matter 
because at the kernel meeting Linus said those were going to change in a way
that meant devices over NFS are a lost cause and clients would have to use
devfs

Alan



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On re-working the major/minor system
@ 2001-12-08 17:55 Andries.Brouwer
  0 siblings, 0 replies; 13+ messages in thread
From: Andries.Brouwer @ 2001-12-08 17:55 UTC (permalink / raw)
  To: andersen; +Cc: alan, dalecki, linux-kernel, torvalds

    From: Erik Andersen <andersen@codepoet.org>

    So we have POSIX, ls, tar, du, mknod, and mount and tons of other
    apps all with illicit insider knowledge of what a dev_t looks
    like.

    To currently, to do pretty much anything nifty related to devices
    in usespace, usespace has to peek under the kernel's skirt to
    know how to change a major and minor number into a dev_t and/or
    to sanely populate a struct stat.

    To change things, we 1) need some sortof sane interface by which
    userspace can refer sensibly to devices without resorting to evil
    illicit macros and 2) we certainly need some sort of a static
    mapping such that existing devices end up mapping to the same
    thing they always did or 3) we will need a flag day where we say
    that all pre-2.5.x created tarballs and user space apps are
    declared broken...

No flag day required. These things have been discussed
many times already, and there are easy and good solutions.

Code like

	dev_t dev;
	u64 d = dev;
	int major, minor;

	if (d & ~0xffffffff) {
		major = (d >> 32);
		minor = (d & 0xffffffff);
	} else if (d & ~0xffff) {
		major = (d >> 16);
		minor = (d & 0xffff);
	} else {
		major = (d >> 8);
		minor = (d & 0xff);
	}

will handle dev_t fine, regardless of whether it is 16, 32 or 64 bits.
You see that change of the size of dev_t does not change the values
of major and minor found in your tarballs.
We already use such code for isofs.

Andries

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On re-working the major/minor system
  2001-12-08 11:42               ` Alan Cox
@ 2001-12-08 20:37                 ` H. Peter Anvin
  0 siblings, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2001-12-08 20:37 UTC (permalink / raw)
  To: Alan Cox; +Cc: andersen, linux-kernel

Alan Cox wrote:

>>>That works, and should prevent most major problems.  Hmm.  At
>>>least for cpio there are 6 chars worth of device info in there,
>>>so we coule easily go to 48 bits without RPM problems.  Or redhat
>>>could fix rpm to use tarballs like debs do, and then we could go
>>>
> 
> RPM can't easily use tarballs. Too much of a tar ball isnt rigidly defined so
> you can cryptographically sign it.
> 


Why does that matter?  You're signing a *specific instance* of tar, not 
the generic format.


> 
>>>to 64 bit devices no problem.
>>>
>>The big stubling block seems to be NFSv2.
> 
> Well 2.5 isnt going to be able to support NFS without a magic daemon
> maintained translation table - so that when the kernel randomly changes the
> major/minor number of an exported file system (eg a USB reconnect or even plain
> boring shutdown/reboot) it can keep consistent file handles.
> 
> If you have a file handle table surely you can remap every NFS file handle
> through that down to 32bits. For device files the problem doesn't matter 
> because at the kernel meeting Linus said those were going to change in a way
> that meant devices over NFS are a lost cause and clients would have to use
> devfs
> 


Yeah, I know what Linus said at the kernel summit.  As far as I could 
tell he rejected anything that seemed like a sensible approach from here 
to there, but that's just my $0.02...

	-hpa



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On re-working the major/minor system
  2001-12-07 21:55       ` Erik Andersen
  2001-12-07 22:04         ` H. Peter Anvin
@ 2001-12-09 12:06         ` Kai Henningsen
  2001-12-09 21:57           ` H. Peter Anvin
  1 sibling, 1 reply; 13+ messages in thread
From: Kai Henningsen @ 2001-12-09 12:06 UTC (permalink / raw)
  To: linux-kernel

andersen@codepoet.org (Erik Andersen)  wrote on 07.12.01 in <20011207145535.A18152@codepoet.org>:

> The C library, and the POSIX standard, etc, etc.

I think you'll find that there is *NOTHING* in either the C standard,  
POSIX, or the Austin future-{POSIX,UNIX} standard that knows about major  
or minor numbers.

MfG Kai

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On re-working the major/minor system
@ 2001-12-09 21:37 Andries.Brouwer
  0 siblings, 0 replies; 13+ messages in thread
From: Andries.Brouwer @ 2001-12-09 21:37 UTC (permalink / raw)
  To: kaih, linux-kernel

    From: kaih@khms.westfalen.de (Kai Henningsen)

    > The C library, and the POSIX standard, etc, etc.

    I think you'll find that there is *NOTHING* in either the C standard,  
    POSIX, or the Austin future-{POSIX,UNIX} standard that knows about major  
    or minor numbers.

The Austin draft turned into POSIX 1003.1-2001 yesterday or so.

There is not much, but a few traces can be found.
For example, the pax archive format uses 8-byte devmajor and devminor fields.

(But to reassure others: no, this standard does not specify
major and minor in ls output, but just says
"If the file is a character special or block special file, the size of
 the file may be replaced with implementation-defined information
 associated with the device in question.")

Andries



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On re-working the major/minor system
  2001-12-09 12:06         ` Kai Henningsen
@ 2001-12-09 21:57           ` H. Peter Anvin
  2001-12-11 20:45             ` Kai Henningsen
  0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2001-12-09 21:57 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <8EWhHLVmw-B@khms.westfalen.de>
By author:    kaih@khms.westfalen.de (Kai Henningsen)
In newsgroup: linux.dev.kernel
> 
> > The C library, and the POSIX standard, etc, etc.
> 
> I think you'll find that there is *NOTHING* in either the C standard,  
> POSIX, or the Austin future-{POSIX,UNIX} standard that knows about major  
> or minor numbers.
> 

It's not "future" anymore... Austin is now IEEE 1003.1-2001 and thus
the new POSIX standard.

Anyway, look for things like tar, cpio, ISO 9660 and that class of
standards.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On re-working the major/minor system
  2001-12-09 21:57           ` H. Peter Anvin
@ 2001-12-11 20:45             ` Kai Henningsen
  0 siblings, 0 replies; 13+ messages in thread
From: Kai Henningsen @ 2001-12-11 20:45 UTC (permalink / raw)
  To: linux-kernel

hpa@zytor.com (H. Peter Anvin)  wrote on 09.12.01 in <9v0mo1$ms$1@cesium.transmeta.com>:

> By author:    kaih@khms.westfalen.de (Kai Henningsen)

> > > The C library, and the POSIX standard, etc, etc.
> >
> > I think you'll find that there is *NOTHING* in either the C standard,
> > POSIX, or the Austin future-{POSIX,UNIX} standard that knows about major
> > or minor numbers.
> >
>
> It's not "future" anymore... Austin is now IEEE 1003.1-2001 and thus
> the new POSIX standard.

As of this Friday, yes.

> Anyway, look for things like tar, cpio, ISO 9660 and that class of
> standards.

Well, at least in Austin there is neither tar, cpio, nor 9660.

You are, however, right insofar as there's pax, which for ustar format has  
devmajor and devminor fields of 8 octets each, which contain unspecified  
information. (cpio format just has the rdev field.)


MfG Kai

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2001-12-11 20:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-08 17:55 On re-working the major/minor system Andries.Brouwer
  -- strict thread matches above, loose matches on Subject: below --
2001-12-09 21:37 Andries.Brouwer
2001-12-07 10:56 Linux/Pro -- clusters Martin Dalecki
2001-12-07 12:08 ` Alan Cox
2001-12-07 20:51   ` On re-working the major/minor system Erik Andersen
2001-12-07 21:21     ` H. Peter Anvin
2001-12-07 21:55       ` Erik Andersen
2001-12-07 22:04         ` H. Peter Anvin
2001-12-07 23:07           ` Erik Andersen
2001-12-07 23:12             ` H. Peter Anvin
2001-12-08 11:42               ` Alan Cox
2001-12-08 20:37                 ` H. Peter Anvin
2001-12-09 12:06         ` Kai Henningsen
2001-12-09 21:57           ` H. Peter Anvin
2001-12-11 20:45             ` Kai Henningsen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.