public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFD] Explicitly documenting patch submission
@ 2004-05-23  6:46 Linus Torvalds
  2004-05-23  7:41 ` Neil Brown
                   ` (10 more replies)
  0 siblings, 11 replies; 90+ messages in thread
From: Linus Torvalds @ 2004-05-23  6:46 UTC (permalink / raw)
  To: Kernel Mailing List


Hola!

This is a request for discussion..

Some of you may have heard of this crazy company called SCO (aka "Smoking
Crack Organization") who seem to have a hard time believing that open
source works better than their five engineers do. They've apparently made
a couple of outlandish claims about where our source code comes from,
including claiming to own code that was clearly written by me over a
decade ago.

People have been pretty good (understatement of the year) at debunking
those claims, but the fact is that part of that debunking involved
searching kernel mailing list archives from 1992 etc. Not much fun.

For example, in the case of "ctype.h", what made it so clear that it was
original work was the horrible bugs it contained originally, and since we
obviously don't do bugs any more (right?), we should probably plan on
having other ways to document the origin of the code.

So, to avoid these kinds of issues ten years from now, I'm suggesting that 
we put in more of a process to explicitly document not only where a patch 
comes from (which we do actually already document pretty well in the 
changelogs), but the path it came through. 

Why the full path, and not just originator?

These days, most of the patches in the kernel don't actually get sent
directly to me. That not just wouldn't scale, but the fact is, there's a
lot of subsystems I have no clue about, and thus no way of judging how
good the patch is. So I end up seeing mostly the maintainers of the
subsystem, and when a bug happens, what I want to see is the maintainer
name, not a random developer who I don't even know if he is active any
more. So at least for me, the _chain_ is actually mostly more important
than the actual originator.

There is also another issue, namely the fact than when I (or anybody else,
for that matter) get an emailed patch, the only thing I can see directly
is the sender information, and that's the part I trust. When Andrew sends
me a patch, I trust it because it comes from him - even if the original
author may be somebody I don't know. So the _path_ the patch came in
through actually documents that chain of trust - we all tend to know the
"next hop", but we do _not_ necessarily have direct knowledge of the full
chain.

So what I'm suggesting is that we start "signing off" on patches, to show 
the path it has come through, and to document that chain of trust.  It 
also allows middle parties to edit the patch without somehow "losing" 
their names - quite often the patch that reaches the final kernel is not 
exactly the same as the original one, as it has gone through a few layers 
of people.

The plan is to make this very light-weight, and to fit in with how we 
already pass patches around - just add the sign-off to the end of the 
explanation part of the patch. That sign-off would be just a single line 
at the end (possibly after _other_ peoples sign-offs), saying:

	Signed-off-by: Random J Developer <random@developer.org>

To keep the rules as simple as possible, and yet making it clear what it
means to sign off on the patch, I've been discussing a "Developer's
Certificate of Origin" with a random collection of other kernel
developers (mainly subsystem maintainers).  This would basically be what
a developer (or a maintainer that passes through a patch) signs up for
when he signs off, so that the downstream (upstream?) developers know
that it's all ok:

	Developer's Certificate of Origin 1.0

	By making a contribution to this project, I certify that:

	(a) The contribution was created in whole or in part by me and I
            have the right to submit it under the open source license
	    indicated in the file; or

	(b) The contribution is based upon previous work that, to the best
	    of my knowledge, is covered under an appropriate open source
	    license and I have the right under that license to submit that
	    work with modifications, whether created in whole or in part
	    by me, under the same open source license (unless I am
	    permitted to submit under a different license), as indicated
	    in the file; or

	(c) The contribution was provided directly to me by some other
	    person who certified (a), (b) or (c) and I have not modified
	    it.

This basically allows people to sign off on other peoples patches, as long
as they see that the previous entry in the chain has been signed off on.  
And at the same time it makes the "personal trust" explicit to people who
don't necessarily understand how these things work. 

The above also allows for companies that have "release criteria" to have
the company "release person" sign off on a patch, so that a company can
easily incorporate their own internal release procedures and see that all
the patches have gone through the right channel. At the same time it is
meant to _not_ cause anybody to have to change how they work (ie there is
no "extra paperwork" at any point).

Comments, improvements, ideas? And yes, I know about digital signatures
etc, and that is _not_ what this is about. This is not about proving
authorship - it's about documenting the process. This does not replace or
preclude things like PGP-signed emails, this is _documenting_ how we work,
so that we can show people who don't understand the open source process.

			Linus


^ permalink raw reply	[flat|nested] 90+ messages in thread
* Re: [RFD] Explicitly documenting patch submission
@ 2004-05-23 23:19 Shane Shrybman
  0 siblings, 0 replies; 90+ messages in thread
From: Shane Shrybman @ 2004-05-23 23:19 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel

Hi Linus,

Since your intention is to produce a clearly documented path on where
each patch came from so that in the event the "Crack Smokers" come at
you for "stealing" code you have something to back up the community's
claims of authorship. I am wondering if your proposal would be adequate
legal protection.

I am definitely not a lawyer, but it would be a tragedy if your proposal
was adopted and in 10-15 years it was challenged and found not "to hold
water" in the courts. I can just imagine some lawyer making an argument
that this documentation trail is digital and therefore could be altered
without leaving a trace or some other argument that lessens the
integrity and legal value of the patch path information. 

Have you consulted with some liars about the legal fortitude of your
proposal?

What sort of legal protection will this provide in the event that it is
needed?

Just a thought.

Regards,

Shane


^ permalink raw reply	[flat|nested] 90+ messages in thread
[parent not found: <1YUY7-6fF-11@gated-at.bofh.it>]
* Re: [RFD] Explicitly documenting patch submission
@ 2004-05-24 23:05 Albert Cahalan
  2004-05-25  3:50 ` Linus Torvalds
  0 siblings, 1 reply; 90+ messages in thread
From: Albert Cahalan @ 2004-05-24 23:05 UTC (permalink / raw)
  To: linux-kernel mailing list; +Cc: Linus Torvalds

[this didn't have the right subject before, sorry]

Linus Torvalds writes:

> (Seriously, while nobody has actually complained about
> the suggested rules, I don't think anybody should feel
> compelled to do the sign-off before we've had more
> time to let people argue over it. People who feel 
> comfortable with the suggestion are obviously
> encouraged to start asap, though).

I had been hoping someone had just forged your email
address. :-/  You're not known for bureaucracy.

The wordy mix-case aspect is kind of annoying, and for
all that we don't get to differentiate actions.
I count:

1. came up with the design ideas
2. wrote the original patch
3. reviewed and passed on
4. modified
5. blindly passed on

Maybe "blindly passed on" needs nothing. So I'm
thinking, if we must bother with all this...

designed:
authored:
reviewed:
modified:

Add "pirated:" if you like, so that searching for
pirated code is easier than checking the evil bit.



^ permalink raw reply	[flat|nested] 90+ messages in thread
[parent not found: <1ZBgK-68x-3@gated-at.bofh.it>]
[parent not found: <20040525110000.27463.19462.Mailman@lists.us.dell.com>]
* Re: [RFD] Explicitly documenting patch submission
@ 2004-05-27  6:20 Larry McVoy
  2004-05-27  8:04 ` Andrew Morton
  0 siblings, 1 reply; 90+ messages in thread
From: Larry McVoy @ 2004-05-27  6:20 UTC (permalink / raw)
  To: linux-kernel

I just read the whole thread and I can't help but wonder if you aren't
trying to solve the 5% problem while avoiding the 95% problem.  Right now,
because of how patches are fanned in through maintainers, lots and lots
of patches are going into the SCM system (BK and/or CVS since that is
derived from BK) as authored by a handful of people.  Just go look at
the stats: http://linux.bkbits.net:8080/linux-2.5/stats?nav=index.html
As productive as Andrew is I find it difficult to believe he has
personally authored more than 5000 patches.  He hasn't, he doesn't
pretend to have done so but we are not getting the authorship right.

Solve that problem and you are lightyears closer to having an audit trail.
You currently aren't recording the original author and you are trying
to record all the people who touched the patch along the way.  If you
can't get the easy part right what makes you think you are going to get
the hard part right?

Before the obligatory BK flames start up, note this is a problem that
you would have with any SCM system.  The problem has nothing to do with
which SCM system you use, it has to do with recording authorship.

I think it's great that you are looking for a better audit trail but
I think it is strange that you are trying to get a perfect audit trail
when you don't even have the basics in place.  What was it that you said,
"Perfect is the enemy of good", right?  In my opinion the 99% part of
the problem space is who wrote the patch, not who passed it on.  If, and
that's a big if, you get to the point where you have proper authorship
recorded and then you still want to record the path it took, that's a
different matter.  The way you are going about it I think you may end
up with nothing by trying to be so perfect.  If I'm wrong, what's wrong
with fixing things so that you get the authorship right and then extend
to get the full path right?

This leaves aside the issue that patches can get applied multiple times
(and do all the time, I think we've counted thousands or tens of thousands
of this in the kernel history).

For what it is worth, we've actually thought through what you are trying
to do long ago and calculated the amount of metadata you'd end up carrying
around and found it to be way way way way too large for an SCM system to
justify.  It's unlikely we'd ever want full audit trails in BK because
patches tend to flow through multiple trees and get merged with other 
patches, etc.  The thing we found useful was who wrote the patch and in
what context.

We did change BK a few revs back to record both the importer and the
patch author when people use your import scripts (bk import -temail)
so we have a 2 deep audit trail already.  More than that seems like
overkill.

The more I think about it the more I wonder what problem it is you are 
trying to solve with the A->B->C->D->Linus audit trail.  Legally, the
issue is going to be with A more than anyone else.  What am I missing?
-- 
---
Larry McVoy                lm at bitmover.com           http://www.bitkeeper.com

^ permalink raw reply	[flat|nested] 90+ messages in thread
[parent not found: <A6974D8E5F98D511BB910002A50A6647615FD265@hdsmsx403.hd.intel.com>]

end of thread, other threads:[~2004-06-10 19:52 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-23  6:46 [RFD] Explicitly documenting patch submission Linus Torvalds
2004-05-23  7:41 ` Neil Brown
2004-05-23  8:02 ` Arjan van de Ven
2004-05-23 15:25   ` Greg KH
2004-05-23 15:35     ` Arjan van de Ven
2004-05-23 15:42       ` Greg KH
2004-05-23 18:03       ` Matt Mackall
2004-05-23 15:38     ` Ian Stirling
2004-05-23 15:44       ` Greg KH
2004-05-23 16:01       ` Linus Torvalds
2004-05-23 15:53   ` Linus Torvalds
2004-05-23 16:33 ` Horst von Brand
2004-05-23 17:06   ` Linus Torvalds
2004-05-23 17:32     ` Roman Zippel
2004-05-23 17:55 ` Joe Perches
2004-05-23 19:00   ` Jeff Garzik
2004-05-23 19:12     ` Joe Perches
2004-05-23 21:41   ` Francois Romieu
2004-05-23 19:01 ` Davide Libenzi
2004-05-23 19:20   ` Linus Torvalds
2004-05-25 15:20     ` La Monte H.P. Yarroll
2004-05-25 21:16       ` H. Peter Anvin
2004-05-25  6:32 ` Daniel Phillips
2004-05-25 18:11   ` Paul Jackson
2004-05-25  7:06 ` Arjan van de Ven
2004-05-25 15:32   ` Steven Cole
2004-05-25 16:02     ` Bradley Hook
2004-05-25 18:51       ` La Monte H.P. Yarroll
2004-05-25 19:44         ` Bradley Hook
2004-05-26  4:16         ` Daniel Phillips
2004-05-25 13:11 ` Ben Collins
2004-05-25 17:15   ` Linus Torvalds
2004-05-25 17:18     ` Ben Collins
2004-05-25 18:02       ` Dave Jones
2004-05-25 18:06         ` Ben Collins
2004-05-25 18:51           ` Linus Torvalds
2004-05-25 15:00 ` raven
2004-05-25 15:44 ` La Monte H.P. Yarroll
2004-05-25 16:25   ` Linus Torvalds
2004-05-25 16:43     ` La Monte H.P. Yarroll
2004-05-25 17:40   ` Valdis.Kletnieks
2004-05-25 17:52     ` Linus Torvalds
2004-05-25 16:42 ` J. Bruce Fields
2004-05-25 17:05   ` Linus Torvalds
2004-05-25 18:08     ` Andy Isaacson
2004-05-25 20:10       ` Matt Mackall
2004-06-10 12:58         ` Pavel Machek
  -- strict thread matches above, loose matches on Subject: below --
2004-05-23 23:19 Shane Shrybman
     [not found] <1YUY7-6fF-11@gated-at.bofh.it>
2004-05-24 19:57 ` Andi Kleen
2004-05-24 20:07   ` Davide Libenzi
2004-05-24 20:19     ` Joe Perches
2004-05-24 20:45     ` Linus Torvalds
2004-05-24 21:16       ` Davide Libenzi
2004-05-24 21:38         ` Linus Torvalds
2004-05-25  0:41       ` Francis J. A. Pinteric
2004-05-25  1:56         ` viro
2004-05-24 20:31   ` Linus Torvalds
2004-05-24 22:01     ` Andi Kleen
2004-05-24 22:14       ` Linus Torvalds
2004-05-24 20:50   ` Thomas Gleixner
2004-05-24 21:05     ` Linus Torvalds
2004-05-24 21:20       ` Thomas Gleixner
2004-06-10  8:00         ` Pavel Machek
2004-05-25  3:49       ` Matt Mackall
2004-05-25  4:02         ` Linus Torvalds
2004-05-25 11:11           ` Giuseppe Bilotta
2004-05-25 13:48             ` Steven Cole
2004-05-25 14:12             ` La Monte H.P. Yarroll
2004-05-24 21:19   ` Horst von Brand
2004-05-24 23:05 Albert Cahalan
2004-05-25  3:50 ` Linus Torvalds
2004-05-25 19:28   ` Horst von Brand
     [not found] <1ZBgK-68x-3@gated-at.bofh.it>
2004-05-25  6:43 ` Kai Henningsen
     [not found] <20040525110000.27463.19462.Mailman@lists.us.dell.com>
2004-05-25 15:03 ` Justin Michael
2004-05-27  6:20 Larry McVoy
2004-05-27  8:04 ` Andrew Morton
2004-05-27 14:51   ` Larry McVoy
2004-05-27 15:18     ` Jörn Engel
2004-05-27 16:13     ` Jon Smirl
2004-05-27 21:09   ` La Monte H.P. Yarroll
2004-05-27 21:46     ` Theodore Ts'o
2004-05-28 13:24       ` Larry McVoy
2004-05-28 15:07         ` Theodore Ts'o
2004-05-28 15:19           ` Dave Jones
2004-05-28 15:27             ` Larry McVoy
2004-05-28 15:35               ` Dave Jones
2004-05-28 17:11             ` Theodore Ts'o
2004-05-28 17:16               ` Larry McVoy
2004-05-28 15:24           ` Larry McVoy
     [not found] <A6974D8E5F98D511BB910002A50A6647615FD265@hdsmsx403.hd.intel.com>
2004-06-03  6:38 ` Len Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox