Linux userland API discussions

Linux userland API discussions
 help / color / mirror / Atom feed

* Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis
From: Eric W. Biederman @ 2014-12-09 19:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett,
	stable, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kenton Varda, LSM, Michael Kerrisk-manpages, Richard Weinberger,
	Casey Schaufler, Andrew Morton
In-Reply-To: <87a92xn2io.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>

ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) writes:

> Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:
>
>>
>> This text was actually my suggested comment text.
>
> Now I see.
>
>> If you put smp_rmb() in this function with a comment like that, then I
>> think it will all make sense and be obviously correct (even with most
>> of the other barriers removed).
>
> Right.
>
> Given that we have to be careful when using these things anyway what
> I was hoping to achieve with the barriers appears impossible, and
> confusing so I will see about just adding barriers where we need them
> for real.  Sigh.

Doh.  The code has been entirely too clever.

There are no need for atomics or other cleverness, I just need to
generalize id_map_mutex.  I knew that had to be a trivially correct
way of handling this mess.

Eric

^ permalink raw reply

* selftests: question about git repos containing selftest.
From: Young, David @ 2014-12-09 19:22 UTC (permalink / raw)
  To: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Greetings,

I'm trying out the new kernel selftests and was looking for the "bleeding edge" repository to clone.  According to the wiki https://kselftest.wiki.kernel.org/ this is the appropriate repo:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git

However, I am finding newer and more frequent changes are in linux-next.
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

Which should I clone for the most fresh changes?

-David Young

^ permalink raw reply

* Re: [RFC] lsm: namespace hooks
From: Eric W. Biederman @ 2014-12-09 16:13 UTC (permalink / raw)
  To: Lukasz Pawelczyk
  Cc: Vladimir Davydov, Miklos Szeredi, Lukasz Pawelczyk, Oleg Nesterov,
	David Howells, Mark Rustad, Juri Lelli, Richard Weinberger,
	Daeseok Youn, Ingo Molnar, Jeff Kirsher, David Rientjes,
	Alex Thorlton, Matthew Dempsky, Kees Cook, Nikolay Aleksandrov,
	Dario Faggioli, Al Viro, James Morris, open list:ABI/API,
	Linux Containers, LKML
In-Reply-To: <1417524193.1899.2.camel@samsung.com>

Lukasz Pawelczyk <l.pawelczyk@samsung.com> writes:

> On czw, 2014-11-27 at 18:38 +0100, Lukasz Pawelczyk wrote:
>> Right now the major issue I see is that LSM by itself is not defined how
>> it's going to behave. It's up to a specific LSM module.
>> 
>> E.g. within the Smack namespace filling the map is a privileged
>> operation. So by tying them up you cripple the ability to create a fully
>> working user namespace as an unprivileged process.
>
> Entertaining the idea that LSM namespace would be tied to user namespace
> (as you suggested) how do you see the limitation I described above?

If they are tied it means you wind up in a situation where there are no
labels you can set.

In general setting the uid and gid maps is also a privileged operations.

I really don't know what makes sense to do with lsms and namespaces
generically, but I do know that your lsm namespace patche were awkwards
and weird and seemed to be taking things in the wrong direction.

Eric

^ permalink raw reply

* Re: [tpmdd-devel] [PATCH v9 8/8] tpm: TPM 2.0 FIFO Interface
From: Jarkko Sakkinen @ 2014-12-09 15:47 UTC (permalink / raw)
  To: Peter Huewe
  Cc: christophe.ricard-Re5JQEeQqe8AvxtiuMwx3w,
	josh.triplett-ral2JQCrhuEAvxtiuMwx3w,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Ashley Lai,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Will Arthur,
	tpmdd-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	jason.gunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	trousers-tech-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
In-Reply-To: <trinity-0fea8cea-4011-4d04-af4a-e7583a7184b4-1417791661803@3capp-gmx-bs52>

On Fri, Dec 05, 2014 at 04:01:01PM +0100, Peter Huewe wrote:
> > 
> > > Am Donnerstag, 4. Dezember 2014, 06:55:18 schrieb Jarkko Sakkinen:
> > > > From: Will Arthur <will.c.arthur-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > > >
> > > > Detect TPM 2.0 by using the extended STS (STS3) register. For TPM 2.0,
> > > > instead of calling tpm_get_timeouts(), assign duration and timeout
> > > > values defined in the TPM 2.0 PTP specification.
> > > >
> > > > Signed-off-by: Will Arthur <will.c.arthur-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > > > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > 
> > > >
> > > > + sts3 = ioread8(chip->vendor.iobase + TPM_STS3(1));
> > > > + if ((sts3 & TPM_STS3_TPM2_FAM) == TPM_STS3_TPM2_FAM)
> > > > + chip->flags = TPM_CHIP_FLAG_TPM2;
> > > > +
> > > >
> > 
> > >
> > > When loading tpm_tis force=1 with my tpm1.2 chip on a machine without bios
> > > integration, it fets detected as a TPM2.0 chip :/
> > >
> > > sudo rmmod tpm_tis
> > > # modprobe tpm_tis force=1
> > > modprobe: ERROR: could not insert 'tpm_tis': No such device
> > > # dmesg
> > > [ 263.903828] tpm_tis tpm_tis: 2.0 TPM (device-id 0xB, rev-id 16)
> > > [ 263.948049] tpm_tis tpm_tis: A TPM error (10) occurred continue selftest
> > > [ 263.948120] tpm_tis tpm_tis: TPM self test failed
> > >
> > >
> > > sts3 is reported as 0xff from my TPM1.2
> > >
> > 
> > 
> > Hmm,
> > my TPM2.0 chip also reports sts3 as 0xff (when loading with force=1 on a
> > machine without bios integration)
> > 
> > [ 307.095344] sts3 ff
> > [ 307.095366] tpm_tis tpm_tis: 2.0 TPM (device-id 0x1A, rev-id 16)
> > [ 307.140047] tpm_tis tpm_tis: A TPM error (256) occurred continue selftest
> > [ 307.140056] tpm_tis tpm_tis: TPM self test failed
> 
> 
> You are reading "sts3" - before requesting the locality and thus
> it returns 0xff for a TPM20 chip as well.
> --> You have to have an active locality first.
> 
> 
> For a TPM2.0 0xFF is not a valid value (if active locality is
> set), since reading commandCancel and resetEstablishment bit
> always return 0 on reads (according to spec).
> 
> --> 0xFF should be treated as a TPM1.2 (older tpms with TIS 1.2)
> --> 0x04 should be treated as TPM 2.0
> --> 0x08 should be treated as TPM1.2 (newer tpms with TIS1.3 enhanced)

Correct. I discussed with some people and verified the reason to be
such that if firmware does nothing, the locality is unopened. I have
access today to similar setup and can fix this regression and verify
my fix.

Thanks for pointing this out!

> Thanks,
> Peter

/Jarkko

^ permalink raw reply

* Fwd: Fwd: [PATCH] arch: uapi: asm: mman.h: Let MADV_FREE have same value for all architectures
From: Chen Gang @ 2014-12-09 14:51 UTC (permalink / raw)
  To: dietlibc-a6ha8lKT7ZQ
  Cc: Darrick J. Wong, linux-api-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org
In-Reply-To: <54870B36.3000901-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Hello dietlibc members:

When I sent a Linux kernel patch for uapi, some members suggest me to
Cc dietlibc members, so I forward the related mail to dietlibc mailing
list.

Please check, and welcome any ideas, suggestions, or completions.

Thanks.

-------- Forwarded Message --------
Subject: [PATCH] arch: uapi: asm: mman.h: Let MADV_FREE have same value for all architectures
Date: Fri, 05 Dec 2014 06:58:29 +0800
From: Chen Gang <gang.chen.5i5j-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, rth-hL46jP5Bxq7R7s880joybQ@public.gmane.org <rth-hL46jP5Bxq7R7s880joybQ@public.gmane.org>, ink-biIs/Y0ymYJMZLIVYojuPNP0rXTJTi09@public.gmane.org <ink-biIs/Y0ymYJMZLIVYojuPNP0rXTJTi09@public.gmane.org>, mattst88-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <mattst88-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, Ralf Baechle <ralf-6z/3iImG2C8G8FEW9MqTrA@public.gmane.org>, jejb-6jwH94ZQLHl74goWV3ctuw@public.gmane.org <jejb-6jwH94ZQLHl74goWV3ctuw@public.gmane.org>, deller-Mmb7MZpHnFY@public.gmane.org <deller-Mmb7MZpHnFY@public.gmane.org>, chris-YvXeqwSYzG2sTnJN9+BGXg@public.gmane.org <chris-YvXeqwSYzG2sTnJN9+BGXg@public.gmane.org>, jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>, minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
CC: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

For uapi, need try to let all macros have same value, and MADV_FREE is
added into main branch recently, so need redefine MADV_FREE for it.

At present, '8' can be shared with all architectures, so redefine it to
'8'.

Signed-off-by: Chen Gang <gang.chen.5i5j-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 arch/alpha/include/uapi/asm/mman.h     | 2 +-
 arch/mips/include/uapi/asm/mman.h      | 2 +-
 arch/parisc/include/uapi/asm/mman.h    | 2 +-
 arch/xtensa/include/uapi/asm/mman.h    | 2 +-
 include/uapi/asm-generic/mman-common.h | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index 836fbd4..0b8a5de 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -44,9 +44,9 @@
 #define MADV_WILLNEED	3		/* will need these pages */
 #define	MADV_SPACEAVAIL	5		/* ensure resources are available */
 #define MADV_DONTNEED	6		/* don't need these pages */
-#define MADV_FREE	7		/* free pages only if memory pressure */
 
 /* common/generic parameters */
+#define MADV_FREE	8		/* free pages only if memory pressure */
 #define MADV_REMOVE	9		/* remove these pages & resources */
 #define MADV_DONTFORK	10		/* don't inherit across fork */
 #define MADV_DOFORK	11		/* do inherit across fork */
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index 106e741..d247f54 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -67,9 +67,9 @@
 #define MADV_SEQUENTIAL 2		/* expect sequential page references */
 #define MADV_WILLNEED	3		/* will need these pages */
 #define MADV_DONTNEED	4		/* don't need these pages */
-#define MADV_FREE	5		/* free pages only if memory pressure */
 
 /* common parameters: try to keep these consistent across architectures */
+#define MADV_FREE	8		/* free pages only if memory pressure */
 #define MADV_REMOVE	9		/* remove these pages & resources */
 #define MADV_DONTFORK	10		/* don't inherit across fork */
 #define MADV_DOFORK	11		/* do inherit across fork */
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index 6cb8db7..700d83f 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -40,9 +40,9 @@
 #define MADV_SPACEAVAIL 5               /* insure that resources are reserved */
 #define MADV_VPS_PURGE  6               /* Purge pages from VM page cache */
 #define MADV_VPS_INHERIT 7              /* Inherit parents page size */
-#define MADV_FREE	8		/* free pages only if memory pressure */
 
 /* common/generic parameters */
+#define MADV_FREE	8		/* free pages only if memory pressure */
 #define MADV_REMOVE	9		/* remove these pages & resources */
 #define MADV_DONTFORK	10		/* don't inherit across fork */
 #define MADV_DOFORK	11		/* do inherit across fork */
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index 1b19f25..77eaca4 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -80,9 +80,9 @@
 #define MADV_SEQUENTIAL	2		/* expect sequential page references */
 #define MADV_WILLNEED	3		/* will need these pages */
 #define MADV_DONTNEED	4		/* don't need these pages */
-#define MADV_FREE	5		/* free pages only if memory pressure */
 
 /* common parameters: try to keep these consistent across architectures */
+#define MADV_FREE	8		/* free pages only if memory pressure */
 #define MADV_REMOVE	9		/* remove these pages & resources */
 #define MADV_DONTFORK	10		/* don't inherit across fork */
 #define MADV_DOFORK	11		/* do inherit across fork */
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index 7a94102..8695959 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -34,9 +34,9 @@
 #define MADV_SEQUENTIAL	2		/* expect sequential page references */
 #define MADV_WILLNEED	3		/* will need these pages */
 #define MADV_DONTNEED	4		/* don't need these pages */
-#define MADV_FREE	5		/* free pages only if memory pressure */
 
 /* common parameters: try to keep these consistent across architectures */
+#define MADV_FREE	8		/* free pages only if memory pressure */
 #define MADV_REMOVE	9		/* remove these pages & resources */
 #define MADV_DONTFORK	10		/* don't inherit across fork */
 #define MADV_DOFORK	11		/* do inherit across fork */
-- 
1.9.3

^ permalink raw reply related

* Fwd: [PATCH] arch: uapi: asm: mman.h: Let MADV_FREE have same value for all architectures
From: Chen Gang @ 2014-12-09 14:46 UTC (permalink / raw)
  To: dietlibc-a6ha8lKT7ZQ
  Cc: Darrick J. Wong, linux-api-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org
In-Reply-To: <5480E715.3020900-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Hello dietlibc members:

When I sent a Linux kernel patch for uapi, some members suggest me to
Cc dietlibc members, so I forward the related mail to dietlibc mailing
list.

Please check, and welcome any ideas, suggestions, or completions.

Thanks.

-------- Forwarded Message --------
Subject: [PATCH] arch: uapi: asm: mman.h: Let MADV_FREE have same value for all architectures
Date: Fri, 05 Dec 2014 06:58:29 +0800
From: Chen Gang <gang.chen.5i5j-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, rth-hL46jP5Bxq7R7s880joybQ@public.gmane.org <rth-hL46jP5Bxq7R7s880joybQ@public.gmane.org>, ink-biIs/Y0ymYJMZLIVYojuPNP0rXTJTi09@public.gmane.org <ink-biIs/Y0ymYJMZLIVYojuPNP0rXTJTi09@public.gmane.org>, mattst88-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <mattst88-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, Ralf Baechle <ralf-6z/3iImG2C8G8FEW9MqTrA@public.gmane.org>, jejb-6jwH94ZQLHl74goWV3ctuw@public.gmane.org <jejb-6jwH94ZQLHl74goWV3ctuw@public.gmane.org>, deller-Mmb7MZpHnFY@public.gmane.org <deller-Mmb7MZpHnFY@public.gmane.org>, chris-YvXeqwSYzG2sTnJN9+BGXg@public.gmane.org <chris-YvXeqwSYzG2sTnJN9+BGXg@public.gmane.org>, jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <jcmvbkbc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>, minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
CC: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

For uapi, need try to let all macros have same value, and MADV_FREE is
added into main branch recently, so need redefine MADV_FREE for it.

At present, '8' can be shared with all architectures, so redefine it to
'8'.

Signed-off-by: Chen Gang <gang.chen.5i5j-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 arch/alpha/include/uapi/asm/mman.h     | 2 +-
 arch/mips/include/uapi/asm/mman.h      | 2 +-
 arch/parisc/include/uapi/asm/mman.h    | 2 +-
 arch/xtensa/include/uapi/asm/mman.h    | 2 +-
 include/uapi/asm-generic/mman-common.h | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index 836fbd4..0b8a5de 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -44,9 +44,9 @@
 #define MADV_WILLNEED	3		/* will need these pages */
 #define	MADV_SPACEAVAIL	5		/* ensure resources are available */
 #define MADV_DONTNEED	6		/* don't need these pages */
-#define MADV_FREE	7		/* free pages only if memory pressure */
 
 /* common/generic parameters */
+#define MADV_FREE	8		/* free pages only if memory pressure */
 #define MADV_REMOVE	9		/* remove these pages & resources */
 #define MADV_DONTFORK	10		/* don't inherit across fork */
 #define MADV_DOFORK	11		/* do inherit across fork */
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index 106e741..d247f54 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -67,9 +67,9 @@
 #define MADV_SEQUENTIAL 2		/* expect sequential page references */
 #define MADV_WILLNEED	3		/* will need these pages */
 #define MADV_DONTNEED	4		/* don't need these pages */
-#define MADV_FREE	5		/* free pages only if memory pressure */
 
 /* common parameters: try to keep these consistent across architectures */
+#define MADV_FREE	8		/* free pages only if memory pressure */
 #define MADV_REMOVE	9		/* remove these pages & resources */
 #define MADV_DONTFORK	10		/* don't inherit across fork */
 #define MADV_DOFORK	11		/* do inherit across fork */
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index 6cb8db7..700d83f 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -40,9 +40,9 @@
 #define MADV_SPACEAVAIL 5               /* insure that resources are reserved */
 #define MADV_VPS_PURGE  6               /* Purge pages from VM page cache */
 #define MADV_VPS_INHERIT 7              /* Inherit parents page size */
-#define MADV_FREE	8		/* free pages only if memory pressure */
 
 /* common/generic parameters */
+#define MADV_FREE	8		/* free pages only if memory pressure */
 #define MADV_REMOVE	9		/* remove these pages & resources */
 #define MADV_DONTFORK	10		/* don't inherit across fork */
 #define MADV_DOFORK	11		/* do inherit across fork */
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index 1b19f25..77eaca4 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -80,9 +80,9 @@
 #define MADV_SEQUENTIAL	2		/* expect sequential page references */
 #define MADV_WILLNEED	3		/* will need these pages */
 #define MADV_DONTNEED	4		/* don't need these pages */
-#define MADV_FREE	5		/* free pages only if memory pressure */
 
 /* common parameters: try to keep these consistent across architectures */
+#define MADV_FREE	8		/* free pages only if memory pressure */
 #define MADV_REMOVE	9		/* remove these pages & resources */
 #define MADV_DONTFORK	10		/* don't inherit across fork */
 #define MADV_DOFORK	11		/* do inherit across fork */
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index 7a94102..8695959 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -34,9 +34,9 @@
 #define MADV_SEQUENTIAL	2		/* expect sequential page references */
 #define MADV_WILLNEED	3		/* will need these pages */
 #define MADV_DONTNEED	4		/* don't need these pages */
-#define MADV_FREE	5		/* free pages only if memory pressure */
 
 /* common parameters: try to keep these consistent across architectures */
+#define MADV_FREE	8		/* free pages only if memory pressure */
 #define MADV_REMOVE	9		/* remove these pages & resources */
 #define MADV_DONTFORK	10		/* don't inherit across fork */
 #define MADV_DOFORK	11		/* do inherit across fork */
-- 
1.9.3

^ permalink raw reply related

* [PATCH v10 04/20] vfio: amba: VFIO support for AMBA devices
From: Antonios Motakis @ 2014-12-09 14:18 UTC (permalink / raw)
  To: kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA
  Cc: will.deacon-5wv7dgnIgG8, tech-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	eric.auger-QSEj5FYQhm4dnm+yROfE0A,
	kim.phillips-KZfg59tc24xl57MIdRCFDg, marc.zyngier-5wv7dgnIgG8,
	Antonios Motakis, open list, open list:VFIO DRIVER,
	open list:ABI/API
In-Reply-To: <1417109580-10505-5-git-send-email-a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>

Add support for discovering AMBA devices with VFIO and handle them
similarly to Linux platform devices.

Signed-off-by: Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
---
 drivers/vfio/platform/vfio_amba.c | 115 ++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h         |   1 +
 2 files changed, 116 insertions(+)
 create mode 100644 drivers/vfio/platform/vfio_amba.c

diff --git a/drivers/vfio/platform/vfio_amba.c b/drivers/vfio/platform/vfio_amba.c
new file mode 100644
index 0000000..ff0331f
--- /dev/null
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -0,0 +1,115 @@
+/*
+ * Copyright (C) 2013 - Virtual Open Systems
+ * Author: Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vfio.h>
+#include <linux/amba/bus.h>
+
+#include "vfio_platform_private.h"
+
+#define DRIVER_VERSION  "0.10"
+#define DRIVER_AUTHOR   "Antonios Motakis <a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>"
+#define DRIVER_DESC     "VFIO for AMBA devices - User Level meta-driver"
+
+/* probing devices from the AMBA bus */
+
+static struct resource *get_amba_resource(struct vfio_platform_device *vdev,
+					  int i)
+{
+	struct amba_device *adev = (struct amba_device *) vdev->opaque;
+
+	if (i == 0)
+		return &adev->res;
+
+	return NULL;
+}
+
+static int get_amba_irq(struct vfio_platform_device *vdev, int i)
+{
+	struct amba_device *adev = (struct amba_device *) vdev->opaque;
+	int ret = 0;
+
+	if (i < AMBA_NR_IRQS)
+		ret = adev->irq[i];
+
+	/* zero is an unset IRQ for AMBA devices */
+	return ret ? ret : -ENXIO;
+}
+
+static int vfio_amba_probe(struct amba_device *adev, const struct amba_id *id)
+{
+	struct vfio_platform_device *vdev;
+	int ret;
+
+	vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
+	if (!vdev)
+		return -ENOMEM;
+
+	vdev->name = kasprintf(GFP_KERNEL, "vfio-amba-%08x", adev->periphid);
+	if (!vdev->name) {
+		kfree(vdev);
+		return -ENOMEM;
+	}
+
+	vdev->opaque = (void *) adev;
+	vdev->flags = VFIO_DEVICE_FLAGS_AMBA;
+	vdev->get_resource = get_amba_resource;
+	vdev->get_irq = get_amba_irq;
+
+	ret = vfio_platform_probe_common(vdev, &adev->dev);
+	if (ret) {
+		kfree(vdev->name);
+		kfree(vdev);
+	}
+
+	return ret;
+}
+
+static int vfio_amba_remove(struct amba_device *adev)
+{
+	struct vfio_platform_device *vdev;
+
+	vdev = vfio_platform_remove_common(&adev->dev);
+	if (vdev) {
+		kfree(vdev->name);
+		kfree(vdev);
+		return 0;
+	}
+
+	return -EINVAL;
+}
+
+static struct amba_id pl330_ids[] = {
+	{ 0, 0 },
+};
+
+MODULE_DEVICE_TABLE(amba, pl330_ids);
+
+static struct amba_driver vfio_amba_driver = {
+	.probe = vfio_amba_probe,
+	.remove = vfio_amba_remove,
+	.id_table = pl330_ids,
+	.drv = {
+		.name = "vfio-amba",
+		.owner = THIS_MODULE,
+	},
+};
+
+module_amba_driver(vfio_amba_driver);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 4e93a97..544d3d8 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -160,6 +160,7 @@ struct vfio_device_info {
 #define VFIO_DEVICE_FLAGS_RESET	(1 << 0)	/* Device supports reset */
 #define VFIO_DEVICE_FLAGS_PCI	(1 << 1)	/* vfio-pci device */
 #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)	/* vfio-platform device */
+#define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)	/* vfio-amba device */
 	__u32	num_regions;	/* Max region index + 1 */
 	__u32	num_irqs;	/* Max IRQ index + 1 */
 };
-- 
2.1.3

^ permalink raw reply related

* Re: [PATCH v5] media: platform: add VPFE capture driver support for AM437X
From: Hans Verkuil @ 2014-12-09 11:13 UTC (permalink / raw)
  To: Lad, Prabhakar, LMML, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-api
  Cc: LKML, Hans Verkuil
In-Reply-To: <1418077306-2493-1-git-send-email-prabhakar.csengg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On 12/08/14 23:21, Lad, Prabhakar wrote:
> From: Benoit Parrot <bparrot-l0cyMroinI0@public.gmane.org>
> 
> This patch adds Video Processing Front End (VPFE) driver for
> AM437X family of devices
> Driver supports the following:
> - V4L2 API using MMAP buffer access based on videobuf2 api
> - Asynchronous sensor/decoder sub device registration
> - DT support
> 
> Signed-off-by: Benoit Parrot <bparrot-l0cyMroinI0@public.gmane.org>
> Signed-off-by: Darren Etheridge <detheridge-l0cyMroinI0@public.gmane.org>
> Signed-off-by: Lad, Prabhakar <prabhakar.csengg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  Changes for v5:
>  1: Fixed review comments pointed out by Hans, fixing race condition.
> 
>  v4l2-compliance output:

Thanks!

<snip>

>  ----------------------
> 
> diff --git a/drivers/media/platform/am437x/am437x-vpfe.c b/drivers/media/platform/am437x/am437x-vpfe.c
> new file mode 100644
> index 0000000..c2b29a2
> --- /dev/null
> +++ b/drivers/media/platform/am437x/am437x-vpfe.c
> +/*
> + * vpfe_release : This function is based on the vb2_fop_release
> + * helper function.
> + * It has been augmented to handle module power management,
> + * by disabling/enabling h/w module fcntl clock when necessary.
> + */
> +static int vpfe_release(struct file *file)
> +{
> +	struct vpfe_device *vpfe = video_drvdata(file);
> +	bool close = v4l2_fh_is_singular_file(file);
> +	int ret;
> +
> +	vpfe_dbg(2, vpfe, "vpfe_release\n");
> +
> +	mutex_lock(&vpfe->lock);

The v4l2_fh_is_singular_file() call should be inside the lock as well.
So:

	close = v4l2_fh_is_singular_file(file);

> +
> +	ret = _vb2_fop_release(file, NULL);
> +	if (close)
> +		vpfe_ccdc_close(&vpfe->ccdc, vpfe->pdev);
> +
> +	mutex_unlock(&vpfe->lock);
> +
> +	return ret;
> +}
> +
> +/*
> + * vpfe_open : This function is based on the v4l2_fh_open helper function.
> + * It has been augmented to handle module power management,
> + * by disabling/enabling h/w module fcntl clock when necessary.
> + */
> +static int vpfe_open(struct file *file)
> +{
> +	struct vpfe_device *vpfe = video_drvdata(file);
> +	int ret;
> +
> +	ret = v4l2_fh_open(file);

Same here, v4l2_fh_open should be inside the lock.

> +	if (ret) {
> +		vpfe_err(vpfe, "v4l2_fh_open failed\n");
> +		return ret;
> +	}
> +
> +	mutex_lock(&vpfe->lock);

So:

	ret = v4l2_fh_open(file);
	if (ret) {
		vpfe_err(vpfe, "v4l2_fh_open failed\n");
		goto unlock;
	}

	
> +
> +	if (!v4l2_fh_is_singular_file(file))
> +		goto unlock;
> +
> +	if (vpfe_initialize_device(vpfe)) {
> +		v4l2_fh_release(file);
> +		ret = -ENODEV;
> +	}
> +
> +unlock:
> +	mutex_unlock(&vpfe->lock);
> +	return ret;
> +}

Regards,

	Hans

^ permalink raw reply

* [v8 5/5] ext4: cleanup inode flag definitions
From: Li Xi @ 2014-12-09  5:22 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, tytso-3s7WtUTddSA,
	adilger-m1MBpc4rdrD3fQ9qLvQP4Q, jack-AlSwsSmVLrQ,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, hch-wEGCiKHe2LqWVfeAwA7xHQ,
	dmonakhov-GEFAQzZX7r8dnm+yROfE0A
In-Reply-To: <1418102548-5469-1-git-send-email-lixi-LfVdkaOWEx8@public.gmane.org>

The inode flags defined in uapi/linux/fs.h were migrated from
ext4.h. This patch changes the inode flag definitions in ext4.h
to VFS definitions to make the gaps between them clearer.

Signed-off-by: Li Xi <lixi-LfVdkaOWEx8@public.gmane.org>
---
 fs/ext4/ext4.h |   50 +++++++++++++++++++++++++-------------------------
 1 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 43a2a88..bcc04c0 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -353,33 +353,33 @@ struct flex_groups {
 /*
  * Inode flags
  */
-#define	EXT4_SECRM_FL			0x00000001 /* Secure deletion */
-#define	EXT4_UNRM_FL			0x00000002 /* Undelete */
-#define	EXT4_COMPR_FL			0x00000004 /* Compress file */
-#define EXT4_SYNC_FL			0x00000008 /* Synchronous updates */
-#define EXT4_IMMUTABLE_FL		0x00000010 /* Immutable file */
-#define EXT4_APPEND_FL			0x00000020 /* writes to file may only append */
-#define EXT4_NODUMP_FL			0x00000040 /* do not dump file */
-#define EXT4_NOATIME_FL			0x00000080 /* do not update atime */
+#define	EXT4_SECRM_FL			FS_SECRM_FL        /* Secure deletion */
+#define	EXT4_UNRM_FL			FS_UNRM_FL         /* Undelete */
+#define	EXT4_COMPR_FL			FS_COMPR_FL        /* Compress file */
+#define EXT4_SYNC_FL			FS_SYNC_FL         /* Synchronous updates */
+#define EXT4_IMMUTABLE_FL		FS_IMMUTABLE_FL    /* Immutable file */
+#define EXT4_APPEND_FL			FS_APPEND_FL       /* writes to file may only append */
+#define EXT4_NODUMP_FL			FS_NODUMP_FL       /* do not dump file */
+#define EXT4_NOATIME_FL			FS_NOATIME_FL      /* do not update atime */
 /* Reserved for compression usage... */
-#define EXT4_DIRTY_FL			0x00000100
-#define EXT4_COMPRBLK_FL		0x00000200 /* One or more compressed clusters */
-#define EXT4_NOCOMPR_FL			0x00000400 /* Don't compress */
-#define EXT4_ECOMPR_FL			0x00000800 /* Compression error */
+#define EXT4_DIRTY_FL			FS_DIRTY_FL
+#define EXT4_COMPRBLK_FL		FS_COMPRBLK_FL     /* One or more compressed clusters */
+#define EXT4_NOCOMPR_FL			FS_NOCOMP_FL       /* Don't compress */
+#define EXT4_ECOMPR_FL			FS_ECOMPR_FL       /* Compression error */
 /* End compression flags --- maybe not all used */
-#define EXT4_INDEX_FL			0x00001000 /* hash-indexed directory */
-#define EXT4_IMAGIC_FL			0x00002000 /* AFS directory */
-#define EXT4_JOURNAL_DATA_FL		0x00004000 /* file data should be journaled */
-#define EXT4_NOTAIL_FL			0x00008000 /* file tail should not be merged */
-#define EXT4_DIRSYNC_FL			0x00010000 /* dirsync behaviour (directories only) */
-#define EXT4_TOPDIR_FL			0x00020000 /* Top of directory hierarchies*/
-#define EXT4_HUGE_FILE_FL               0x00040000 /* Set to each huge file */
-#define EXT4_EXTENTS_FL			0x00080000 /* Inode uses extents */
-#define EXT4_EA_INODE_FL	        0x00200000 /* Inode used for large EA */
-#define EXT4_EOFBLOCKS_FL		0x00400000 /* Blocks allocated beyond EOF */
-#define EXT4_INLINE_DATA_FL		0x10000000 /* Inode has inline data. */
-#define EXT4_PROJINHERIT_FL		FS_PROJINHERIT_FL /* Create with parents projid */
-#define EXT4_RESERVED_FL		0x80000000 /* reserved for ext4 lib */
+#define EXT4_INDEX_FL			FS_INDEX_FL        /* hash-indexed directory */
+#define EXT4_IMAGIC_FL			FS_IMAGIC_FL       /* AFS directory */
+#define EXT4_JOURNAL_DATA_FL		FS_JOURNAL_DATA_FL /* file data should be journaled */
+#define EXT4_NOTAIL_FL			FS_NOTAIL_FL       /* file tail should not be merged */
+#define EXT4_DIRSYNC_FL			FS_DIRSYNC_FL      /* dirsync behaviour (directories only) */
+#define EXT4_TOPDIR_FL			FS_TOPDIR_FL       /* Top of directory hierarchies*/
+#define EXT4_HUGE_FILE_FL               0x00040000         /* Set to each huge file */
+#define EXT4_EXTENTS_FL			FS_EXTENT_FL       /* Inode uses extents */
+#define EXT4_EA_INODE_FL	        0x00200000         /* Inode used for large EA */
+#define EXT4_EOFBLOCKS_FL		0x00400000         /* Blocks allocated beyond EOF */
+#define EXT4_INLINE_DATA_FL		0x10000000         /* Inode has inline data. */
+#define EXT4_PROJINHERIT_FL		FS_PROJINHERIT_FL  /* Create with parents projid */
+#define EXT4_RESERVED_FL		FS_RESERVED_FL     /* reserved for ext4 lib */
 
 #define EXT4_FL_USER_VISIBLE		0x204BDFFF /* User visible flags */
 #define EXT4_FL_USER_MODIFIABLE		0x204380FF /* User modifiable flags */
-- 
1.7.1

^ permalink raw reply related

* [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support
From: Li Xi @ 2014-12-09  5:22 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4, linux-api, tytso, adilger, jack, viro,
	hch, dmonakhov
In-Reply-To: <1418102548-5469-1-git-send-email-lixi@ddn.com>

This patch adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR ioctl interface
support for ext4. The interface is kept consistent with
XFS_IOC_FSGETXATTR/XFS_IOC_FSGETXATTR.

Signed-off-by: Li Xi <lixi@ddn.com>
---
 fs/ext4/ext4.h          |  111 ++++++++++++++++
 fs/ext4/ioctl.c         |  330 +++++++++++++++++++++++++++++++++--------------
 fs/xfs/xfs_fs.h         |   47 +++-----
 include/uapi/linux/fs.h |   58 ++++++++
 4 files changed, 418 insertions(+), 128 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 136e18c..43a2a88 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -384,6 +384,115 @@ struct flex_groups {
 #define EXT4_FL_USER_VISIBLE		0x204BDFFF /* User visible flags */
 #define EXT4_FL_USER_MODIFIABLE		0x204380FF /* User modifiable flags */
 
+/* Transfer internal flags to xflags */
+static inline __u32 ext4_iflags_to_xflags(unsigned long iflags)
+{
+	__u32 xflags = 0;
+
+	if (iflags & EXT4_SECRM_FL)
+		xflags |= FS_XFLAG_SECRM;
+	if (iflags & EXT4_UNRM_FL)
+		xflags |= FS_XFLAG_UNRM;
+	if (iflags & EXT4_COMPR_FL)
+		xflags |= FS_XFLAG_COMPR;
+	if (iflags & EXT4_SYNC_FL)
+		xflags |= FS_XFLAG_SYNC;
+	if (iflags & EXT4_IMMUTABLE_FL)
+		xflags |= FS_XFLAG_IMMUTABLE;
+	if (iflags & EXT4_APPEND_FL)
+		xflags |= FS_XFLAG_APPEND;
+	if (iflags & EXT4_NODUMP_FL)
+		xflags |= FS_XFLAG_NODUMP;
+	if (iflags & EXT4_NOATIME_FL)
+		xflags |= FS_XFLAG_NOATIME;
+	if (iflags & EXT4_COMPRBLK_FL)
+		xflags |= FS_XFLAG_COMPRBLK;
+	if (iflags & EXT4_NOCOMPR_FL)
+		xflags |= FS_XFLAG_NOCOMPR;
+	if (iflags & EXT4_ECOMPR_FL)
+		xflags |= FS_XFLAG_ECOMPR;
+	if (iflags & EXT4_INDEX_FL)
+		xflags |= FS_XFLAG_INDEX;
+	if (iflags & EXT4_IMAGIC_FL)
+		xflags |= FS_XFLAG_IMAGIC;
+	if (iflags & EXT4_JOURNAL_DATA_FL)
+		xflags |= FS_XFLAG_JOURNAL_DATA;
+	if (iflags & EXT4_NOTAIL_FL)
+		xflags |= FS_XFLAG_NOTAIL;
+	if (iflags & EXT4_DIRSYNC_FL)
+		xflags |= FS_XFLAG_DIRSYNC;
+	if (iflags & EXT4_TOPDIR_FL)
+		xflags |= FS_XFLAG_TOPDIR;
+	if (iflags & EXT4_HUGE_FILE_FL)
+		xflags |= FS_XFLAG_HUGE_FILE;
+	if (iflags & EXT4_EXTENTS_FL)
+		xflags |= FS_XFLAG_EXTENTS;
+	if (iflags & EXT4_EA_INODE_FL)
+		xflags |= FS_XFLAG_EA_INODE;
+	if (iflags & EXT4_EOFBLOCKS_FL)
+		xflags |= FS_XFLAG_EOFBLOCKS;
+	if (iflags & EXT4_INLINE_DATA_FL)
+		xflags |= FS_XFLAG_INLINE_DATA;
+	if (iflags & EXT4_PROJINHERIT_FL)
+		xflags |= FS_XFLAG_PROJINHERIT;
+	return xflags;
+}
+
+/* Transfer xflags flags to internal */
+static inline unsigned long ext4_xflags_to_iflags(__u32 xflags)
+{
+	unsigned long iflags = 0;
+
+	if (xflags & FS_XFLAG_SECRM)
+		iflags |= EXT4_SECRM_FL;
+	if (xflags & FS_XFLAG_UNRM)
+		iflags |= EXT4_UNRM_FL;
+	if (xflags & FS_XFLAG_COMPR)
+		iflags |= EXT4_COMPR_FL;
+	if (xflags & FS_XFLAG_SYNC)
+		iflags |= EXT4_SYNC_FL;
+	if (xflags & FS_XFLAG_IMMUTABLE)
+		iflags |= EXT4_IMMUTABLE_FL;
+	if (xflags & FS_XFLAG_APPEND)
+		iflags |= EXT4_APPEND_FL;
+	if (xflags & FS_XFLAG_NODUMP)
+		iflags |= EXT4_NODUMP_FL;
+	if (xflags & FS_XFLAG_NOATIME)
+		iflags |= EXT4_NOATIME_FL;
+	if (xflags & FS_XFLAG_COMPRBLK)
+		iflags |= EXT4_COMPRBLK_FL;
+	if (xflags & FS_XFLAG_NOCOMPR)
+		iflags |= EXT4_NOCOMPR_FL;
+	if (xflags & FS_XFLAG_ECOMPR)
+		iflags |= EXT4_ECOMPR_FL;
+	if (xflags & FS_XFLAG_INDEX)
+		iflags |= EXT4_INDEX_FL;
+	if (xflags & FS_XFLAG_IMAGIC)
+		iflags |= EXT4_IMAGIC_FL;
+	if (xflags & FS_XFLAG_JOURNAL_DATA)
+		iflags |= EXT4_JOURNAL_DATA_FL;
+	if (xflags & FS_XFLAG_IMAGIC)
+		iflags |= FS_XFLAG_NOTAIL;
+	if (xflags & FS_XFLAG_DIRSYNC)
+		iflags |= EXT4_DIRSYNC_FL;
+	if (xflags & FS_XFLAG_TOPDIR)
+		iflags |= EXT4_TOPDIR_FL;
+	if (xflags & FS_XFLAG_HUGE_FILE)
+		iflags |= EXT4_HUGE_FILE_FL;
+	if (xflags & FS_XFLAG_EXTENTS)
+		iflags |= EXT4_EXTENTS_FL;
+	if (xflags & FS_XFLAG_EA_INODE)
+		iflags |= EXT4_EA_INODE_FL;
+	if (xflags & FS_XFLAG_EOFBLOCKS)
+		iflags |= EXT4_EOFBLOCKS_FL;
+	if (xflags & FS_XFLAG_INLINE_DATA)
+		iflags |= EXT4_INLINE_DATA_FL;
+	if (xflags & FS_XFLAG_PROJINHERIT)
+		iflags |= EXT4_PROJINHERIT_FL;
+
+	return iflags;
+}
+
 /* Flags that should be inherited by new inodes from their parent. */
 #define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
 			   EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
@@ -606,6 +715,8 @@ enum {
 #define EXT4_IOC_RESIZE_FS		_IOW('f', 16, __u64)
 #define EXT4_IOC_SWAP_BOOT		_IO('f', 17)
 #define EXT4_IOC_PRECACHE_EXTENTS	_IO('f', 18)
+#define EXT4_IOC_FSGETXATTR		FS_IOC_FSGETXATTR
+#define EXT4_IOC_FSSETXATTR		FS_IOC_FSSETXATTR
 
 #if defined(__KERNEL__) && defined(CONFIG_COMPAT)
 /*
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index f58a0d1..8332476 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -14,6 +14,8 @@
 #include <linux/compat.h>
 #include <linux/mount.h>
 #include <linux/file.h>
+#include <linux/quotaops.h>
+#include <linux/quota.h>
 #include <asm/uaccess.h>
 #include "ext4_jbd2.h"
 #include "ext4.h"
@@ -196,126 +198,220 @@ journal_err_out:
 	return err;
 }
 
-long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static int ext4_ioctl_setflags(struct file *filp, unsigned int flags)
 {
 	struct inode *inode = file_inode(filp);
-	struct super_block *sb = inode->i_sb;
 	struct ext4_inode_info *ei = EXT4_I(inode);
-	unsigned int flags;
+	handle_t *handle = NULL;
+	int err, migrate = 0;
+	struct ext4_iloc iloc;
+	unsigned int oldflags, mask, i;
+	unsigned int jflag;
 
-	ext4_debug("cmd = %u, arg = %lu\n", cmd, arg);
+	if (!inode_owner_or_capable(inode))
+		return -EACCES;
 
-	switch (cmd) {
-	case EXT4_IOC_GETFLAGS:
-		ext4_get_inode_flags(ei);
-		flags = ei->i_flags & EXT4_FL_USER_VISIBLE;
-		return put_user(flags, (int __user *) arg);
-	case EXT4_IOC_SETFLAGS: {
-		handle_t *handle = NULL;
-		int err, migrate = 0;
-		struct ext4_iloc iloc;
-		unsigned int oldflags, mask, i;
-		unsigned int jflag;
+	err = mnt_want_write_file(filp);
+	if (err)
+		return err;
 
-		if (!inode_owner_or_capable(inode))
-			return -EACCES;
+	flags = ext4_mask_flags(inode->i_mode, flags);
 
-		if (get_user(flags, (int __user *) arg))
-			return -EFAULT;
+	err = -EPERM;
+	mutex_lock(&inode->i_mutex);
+	/* Is it quota file? Do not allow user to mess with it */
+	if (IS_NOQUOTA(inode))
+		goto flags_out;
 
-		err = mnt_want_write_file(filp);
-		if (err)
-			return err;
+	oldflags = ei->i_flags;
 
-		flags = ext4_mask_flags(inode->i_mode, flags);
+	/* The JOURNAL_DATA flag is modifiable only by root */
+	jflag = flags & EXT4_JOURNAL_DATA_FL;
 
-		err = -EPERM;
-		mutex_lock(&inode->i_mutex);
-		/* Is it quota file? Do not allow user to mess with it */
-		if (IS_NOQUOTA(inode))
+	/*
+	 * The IMMUTABLE and APPEND_ONLY flags can only be changed by
+	 * the relevant capability.
+	 *
+	 * This test looks nicer. Thanks to Pauline Middelink
+	 */
+	if ((flags ^ oldflags) & (EXT4_APPEND_FL | EXT4_IMMUTABLE_FL)) {
+		if (!capable(CAP_LINUX_IMMUTABLE))
 			goto flags_out;
+	}
 
-		oldflags = ei->i_flags;
-
-		/* The JOURNAL_DATA flag is modifiable only by root */
-		jflag = flags & EXT4_JOURNAL_DATA_FL;
-
-		/*
-		 * The IMMUTABLE and APPEND_ONLY flags can only be changed by
-		 * the relevant capability.
-		 *
-		 * This test looks nicer. Thanks to Pauline Middelink
-		 */
-		if ((flags ^ oldflags) & (EXT4_APPEND_FL | EXT4_IMMUTABLE_FL)) {
-			if (!capable(CAP_LINUX_IMMUTABLE))
-				goto flags_out;
-		}
-
-		/*
-		 * The JOURNAL_DATA flag can only be changed by
-		 * the relevant capability.
-		 */
-		if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL)) {
-			if (!capable(CAP_SYS_RESOURCE))
-				goto flags_out;
-		}
-		if ((flags ^ oldflags) & EXT4_EXTENTS_FL)
-			migrate = 1;
-
+	/*
+	 * The JOURNAL_DATA flag can only be changed by
+	 * the relevant capability.
+	 */
+	if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL)) {
+		if (!capable(CAP_SYS_RESOURCE))
+			goto flags_out;
+	}
+	if ((flags ^ oldflags) & EXT4_EXTENTS_FL)
+		migrate = 1;
 		if (flags & EXT4_EOFBLOCKS_FL) {
-			/* we don't support adding EOFBLOCKS flag */
-			if (!(oldflags & EXT4_EOFBLOCKS_FL)) {
-				err = -EOPNOTSUPP;
-				goto flags_out;
-			}
-		} else if (oldflags & EXT4_EOFBLOCKS_FL)
-			ext4_truncate(inode);
-
-		handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
-		if (IS_ERR(handle)) {
-			err = PTR_ERR(handle);
+		/* we don't support adding EOFBLOCKS flag */
+		if (!(oldflags & EXT4_EOFBLOCKS_FL)) {
+			err = -EOPNOTSUPP;
 			goto flags_out;
 		}
-		if (IS_SYNC(inode))
-			ext4_handle_sync(handle);
-		err = ext4_reserve_inode_write(handle, inode, &iloc);
-		if (err)
-			goto flags_err;
-
-		for (i = 0, mask = 1; i < 32; i++, mask <<= 1) {
-			if (!(mask & EXT4_FL_USER_MODIFIABLE))
-				continue;
-			if (mask & flags)
-				ext4_set_inode_flag(inode, i);
-			else
-				ext4_clear_inode_flag(inode, i);
-		}
+	} else if (oldflags & EXT4_EOFBLOCKS_FL)
+		ext4_truncate(inode);
 
-		ext4_set_inode_flags(inode);
-		inode->i_ctime = ext4_current_time(inode);
+	handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
+	if (IS_ERR(handle)) {
+		err = PTR_ERR(handle);
+		goto flags_out;
+	}
+	if (IS_SYNC(inode))
+		ext4_handle_sync(handle);
+	err = ext4_reserve_inode_write(handle, inode, &iloc);
+	if (err)
+		goto flags_err;
+
+	for (i = 0, mask = 1; i < 32; i++, mask <<= 1) {
+		if (!(mask & EXT4_FL_USER_MODIFIABLE))
+			continue;
+		if (mask & flags)
+			ext4_set_inode_flag(inode, i);
+		else
+			ext4_clear_inode_flag(inode, i);
+	}
 
-		err = ext4_mark_iloc_dirty(handle, inode, &iloc);
-flags_err:
-		ext4_journal_stop(handle);
-		if (err)
-			goto flags_out;
+	ext4_set_inode_flags(inode);
+	inode->i_ctime = ext4_current_time(inode);
 
-		if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL))
-			err = ext4_change_inode_journal_flag(inode, jflag);
-		if (err)
-			goto flags_out;
-		if (migrate) {
-			if (flags & EXT4_EXTENTS_FL)
-				err = ext4_ext_migrate(inode);
-			else
-				err = ext4_ind_migrate(inode);
-		}
+	err = ext4_mark_iloc_dirty(handle, inode, &iloc);
+flags_err:
+	ext4_journal_stop(handle);
+	if (err)
+		goto flags_out;
+
+	if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL))
+		err = ext4_change_inode_journal_flag(inode, jflag);
+	if (err)
+		goto flags_out;
+	if (migrate) {
+		if (flags & EXT4_EXTENTS_FL)
+			err = ext4_ext_migrate(inode);
+		else
+			err = ext4_ind_migrate(inode);
+	}
 
 flags_out:
-		mutex_unlock(&inode->i_mutex);
-		mnt_drop_write_file(filp);
+	mutex_unlock(&inode->i_mutex);
+	mnt_drop_write_file(filp);
+	return err;
+}
+
+static int ext4_ioctl_setproject(struct file *filp, __u32 projid)
+{
+	struct inode *inode = file_inode(filp);
+	struct super_block *sb = inode->i_sb;
+	struct ext4_inode_info *ei = EXT4_I(inode);
+	int err;
+	handle_t *handle;
+	kprojid_t kprojid;
+	struct ext4_iloc iloc;
+	struct ext4_inode *raw_inode;
+
+	struct dquot *transfer_to[EXT4_MAXQUOTAS] = { };
+
+	/* Make sure caller can change project. */
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (projid != EXT4_DEF_PROJID
+	    && !EXT4_HAS_RO_COMPAT_FEATURE(sb,
+			EXT4_FEATURE_RO_COMPAT_PROJECT))
+		return -EOPNOTSUPP;
+
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+			EXT4_FEATURE_RO_COMPAT_PROJECT)) {
+		BUG_ON(__kprojid_val(EXT4_I(inode)->i_projid)
+		       != EXT4_DEF_PROJID);
+		if (projid != EXT4_DEF_PROJID)
+			return -EOPNOTSUPP;
+		else
+			return 0;
+	}
+
+	kprojid = make_kprojid(&init_user_ns, (projid_t)projid);
+
+	if (projid_eq(kprojid, EXT4_I(inode)->i_projid))
+		return 0;
+
+	err = mnt_want_write_file(filp);
+	if (err)
 		return err;
+
+	err = -EPERM;
+	mutex_lock(&inode->i_mutex);
+	/* Is it quota file? Do not allow user to mess with it */
+	if (IS_NOQUOTA(inode))
+		goto project_out;
+
+	dquot_initialize(inode);
+
+	handle = ext4_journal_start(inode, EXT4_HT_QUOTA,
+		EXT4_QUOTA_INIT_BLOCKS(sb) +
+		EXT4_QUOTA_DEL_BLOCKS(sb) + 3);
+	if (IS_ERR(handle)) {
+		err = PTR_ERR(handle);
+		goto project_out;
+	}
+
+	err = ext4_reserve_inode_write(handle, inode, &iloc);
+	if (err)
+		goto project_stop;
+
+	raw_inode = ext4_raw_inode(&iloc);
+	if ((EXT4_INODE_SIZE(sb) <=
+	     EXT4_GOOD_OLD_INODE_SIZE) ||
+	    (!EXT4_FITS_IN_INODE(raw_inode, ei, i_projid))) {
+	    	err = -EFBIG;
+	    	goto project_stop;
 	}
+
+	transfer_to[PRJQUOTA] = dqget(sb, make_kqid_projid(kprojid));
+	if (!transfer_to[PRJQUOTA])
+		goto project_set;
+
+	err = __dquot_transfer(inode, transfer_to);
+	dqput(transfer_to[PRJQUOTA]);
+	if (err)
+		goto project_stop;
+
+project_set:
+	EXT4_I(inode)->i_projid = kprojid;
+	inode->i_ctime = ext4_current_time(inode);
+	err = ext4_mark_iloc_dirty(handle, inode, &iloc);
+project_stop:
+	ext4_journal_stop(handle);
+project_out:
+	mutex_unlock(&inode->i_mutex);
+	mnt_drop_write_file(filp);
+	return err;
+}
+
+long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+	struct inode *inode = file_inode(filp);
+	struct super_block *sb = inode->i_sb;
+	struct ext4_inode_info *ei = EXT4_I(inode);
+	unsigned int flags;
+
+	ext4_debug("cmd = %u, arg = %lu\n", cmd, arg);
+
+	switch (cmd) {
+	case EXT4_IOC_GETFLAGS:
+		ext4_get_inode_flags(ei);
+		flags = ei->i_flags & EXT4_FL_USER_VISIBLE;
+		return put_user(flags, (int __user *) arg);
+	case EXT4_IOC_SETFLAGS:
+		if (get_user(flags, (int __user *) arg))
+			return -EFAULT;
+		return ext4_ioctl_setflags(filp, flags);
 	case EXT4_IOC_GETVERSION:
 	case EXT4_IOC_GETVERSION_OLD:
 		return put_user(inode->i_generation, (int __user *) arg);
@@ -615,7 +711,45 @@ resizefs_out:
 	}
 	case EXT4_IOC_PRECACHE_EXTENTS:
 		return ext4_ext_precache(inode);
+	case EXT4_IOC_FSGETXATTR:
+	{
+		struct fsxattr fa;
+
+		memset(&fa, 0, sizeof(struct fsxattr));
+
+		ext4_get_inode_flags(ei);
+		fa.fsx_xflags = ext4_iflags_to_xflags(ei->i_flags & EXT4_FL_USER_VISIBLE);
+
+		if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+				EXT4_FEATURE_RO_COMPAT_PROJECT)) {
+			fa.fsx_projid = (__u32)from_kprojid(&init_user_ns,
+				EXT4_I(inode)->i_projid);
+		}
+
+		if (copy_to_user((struct fsxattr __user *)arg,
+				 &fa, sizeof(fa)))
+			return -EFAULT;
+		return 0;
+	}
+	case EXT4_IOC_FSSETXATTR:
+	{
+		struct fsxattr fa;
+		int err;
+
+		if (copy_from_user(&fa, (struct fsxattr __user *)arg,
+				   sizeof(fa)))
+			return -EFAULT;
 
+		err = ext4_ioctl_setflags(filp, ext4_xflags_to_iflags(fa.fsx_xflags));
+		if (err)
+			return err;
+
+		err = ext4_ioctl_setproject(filp, fa.fsx_projid);
+		if (err)
+			return err;
+
+		return 0;
+	}
 	default:
 		return -ENOTTY;
 	}
diff --git a/fs/xfs/xfs_fs.h b/fs/xfs/xfs_fs.h
index 18dc721..64c7ae6 100644
--- a/fs/xfs/xfs_fs.h
+++ b/fs/xfs/xfs_fs.h
@@ -36,38 +36,25 @@ struct dioattr {
 #endif
 
 /*
- * Structure for XFS_IOC_FSGETXATTR[A] and XFS_IOC_FSSETXATTR.
- */
-#ifndef HAVE_FSXATTR
-struct fsxattr {
-	__u32		fsx_xflags;	/* xflags field value (get/set) */
-	__u32		fsx_extsize;	/* extsize field value (get/set)*/
-	__u32		fsx_nextents;	/* nextents field value (get)	*/
-	__u32		fsx_projid;	/* project identifier (get/set) */
-	unsigned char	fsx_pad[12];
-};
-#endif
-
-/*
  * Flags for the bs_xflags/fsx_xflags field
  * There should be a one-to-one correspondence between these flags and the
  * XFS_DIFLAG_s.
  */
-#define XFS_XFLAG_REALTIME	0x00000001	/* data in realtime volume */
-#define XFS_XFLAG_PREALLOC	0x00000002	/* preallocated file extents */
-#define XFS_XFLAG_IMMUTABLE	0x00000008	/* file cannot be modified */
-#define XFS_XFLAG_APPEND	0x00000010	/* all writes append */
-#define XFS_XFLAG_SYNC		0x00000020	/* all writes synchronous */
-#define XFS_XFLAG_NOATIME	0x00000040	/* do not update access time */
-#define XFS_XFLAG_NODUMP	0x00000080	/* do not include in backups */
-#define XFS_XFLAG_RTINHERIT	0x00000100	/* create with rt bit set */
-#define XFS_XFLAG_PROJINHERIT	0x00000200	/* create with parents projid */
-#define XFS_XFLAG_NOSYMLINKS	0x00000400	/* disallow symlink creation */
-#define XFS_XFLAG_EXTSIZE	0x00000800	/* extent size allocator hint */
-#define XFS_XFLAG_EXTSZINHERIT	0x00001000	/* inherit inode extent size */
-#define XFS_XFLAG_NODEFRAG	0x00002000  	/* do not defragment */
-#define XFS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
-#define XFS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
+#define XFS_XFLAG_REALTIME	FS_XFLAG_REALTIME	/* data in realtime volume */
+#define XFS_XFLAG_PREALLOC	FS_XFLAG_PREALLOC	/* preallocated file extents */
+#define XFS_XFLAG_IMMUTABLE	FS_XFLAG_IMMUTABLE	/* file cannot be modified */
+#define XFS_XFLAG_APPEND	FS_XFLAG_APPEND		/* all writes append */
+#define XFS_XFLAG_SYNC		FS_XFLAG_SYNC		/* all writes synchronous */
+#define XFS_XFLAG_NOATIME	FS_XFLAG_NOATIME	/* do not update access time */
+#define XFS_XFLAG_NODUMP	FS_XFLAG_NODUMP		/* do not include in backups */
+#define XFS_XFLAG_RTINHERIT	FS_XFLAG_RTINHERIT	/* create with rt bit set */
+#define XFS_XFLAG_PROJINHERIT	FS_XFLAG_PROJINHERIT	/* create with parents projid */
+#define XFS_XFLAG_NOSYMLINKS	FS_XFLAG_NOSYMLINKS	/* disallow symlink creation */
+#define XFS_XFLAG_EXTSIZE	FS_XFLAG_EXTSIZE	/* extent size allocator hint */
+#define XFS_XFLAG_EXTSZINHERIT	FS_XFLAG_EXTSZINHERIT	/* inherit inode extent size */
+#define XFS_XFLAG_NODEFRAG	FS_XFLAG_NODEFRAG  	/* do not defragment */
+#define XFS_XFLAG_FILESTREAM	FS_XFLAG_FILESTREAM	/* use filestream allocator */
+#define XFS_XFLAG_HASATTR	FS_XFLAG_HASATTR	/* no DIFLAG for this	*/
 
 /*
  * Structure for XFS_IOC_GETBMAP.
@@ -503,8 +490,8 @@ typedef struct xfs_swapext
 #define XFS_IOC_ALLOCSP		_IOW ('X', 10, struct xfs_flock64)
 #define XFS_IOC_FREESP		_IOW ('X', 11, struct xfs_flock64)
 #define XFS_IOC_DIOINFO		_IOR ('X', 30, struct dioattr)
-#define XFS_IOC_FSGETXATTR	_IOR ('X', 31, struct fsxattr)
-#define XFS_IOC_FSSETXATTR	_IOW ('X', 32, struct fsxattr)
+#define XFS_IOC_FSGETXATTR	FS_IOC_FSGETXATTR
+#define XFS_IOC_FSSETXATTR	FS_IOC_FSSETXATTR
 #define XFS_IOC_ALLOCSP64	_IOW ('X', 36, struct xfs_flock64)
 #define XFS_IOC_FREESP64	_IOW ('X', 37, struct xfs_flock64)
 #define XFS_IOC_GETBMAP		_IOWR('X', 38, struct getbmap)
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index fcbf647..872fed5 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -58,6 +58,62 @@ struct inodes_stat_t {
 	long dummy[5];		/* padding for sysctl ABI compatibility */
 };
 
+/*
+ * Extend attribute flags. These should be or-ed together to figure out what
+ * is valid.
+ */
+#define FSX_XFLAGS	(1 << 0)
+#define FSX_EXTSIZE	(1 << 1)
+#define FSX_NEXTENTS	(1 << 2)
+#define FSX_PROJID	(1 << 3)
+
+/*
+ * Structure for FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR.
+ */
+struct fsxattr {
+	__u32		fsx_xflags;	/* xflags field value (get/set) */
+	__u32		fsx_extsize;	/* extsize field value (get/set)*/
+	__u32		fsx_nextents;	/* nextents field value (get)	*/
+	__u32		fsx_projid;	/* project identifier (get/set) */
+	unsigned char	fsx_pad[12];
+};
+
+/*
+ * Flags for the fsx_xflags field
+ */
+#define FS_XFLAG_REALTIME	0x00000001	/* data in realtime volume */
+#define FS_XFLAG_PREALLOC	0x00000002	/* preallocated file extents */
+#define FS_XFLAG_SECRM		0x00000004	/* secure deletion */
+#define FS_XFLAG_IMMUTABLE	0x00000008	/* file cannot be modified */
+#define FS_XFLAG_APPEND		0x00000010	/* all writes append */
+#define FS_XFLAG_SYNC		0x00000020	/* all writes synchronous */
+#define FS_XFLAG_NOATIME	0x00000040	/* do not update access time */
+#define FS_XFLAG_NODUMP		0x00000080	/* do not include in backups */
+#define FS_XFLAG_RTINHERIT	0x00000100	/* create with rt bit set */
+#define FS_XFLAG_PROJINHERIT	0x00000200	/* create with parents projid */
+#define FS_XFLAG_NOSYMLINKS	0x00000400	/* disallow symlink creation */
+#define FS_XFLAG_EXTSIZE	0x00000800	/* extent size allocator hint */
+#define FS_XFLAG_EXTSZINHERIT	0x00001000	/* inherit inode extent size */
+#define FS_XFLAG_NODEFRAG	0x00002000  	/* do not defragment */
+#define FS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
+#define FS_XFLAG_UNRM		0x00008000	/* undelete */
+#define FS_XFLAG_COMPR		0x00010000	/* compress file */
+#define FS_XFLAG_COMPRBLK	0x00020000	/* one or more compressed clusters */
+#define FS_XFLAG_NOCOMPR	0x00040000	/* don't compress */
+#define FS_XFLAG_ECOMPR		0x00080000	/* compression error */
+#define FS_XFLAG_INDEX		0x00100000	/* hash-indexed directory */
+#define FS_XFLAG_IMAGIC		0x00200000	/* AFS directory */
+#define FS_XFLAG_JOURNAL_DATA	0x00400000	/* file data should be journaled */
+#define FS_XFLAG_NOTAIL		0x00800000	/* file tail should not be merged */
+#define FS_XFLAG_DIRSYNC	0x01000000	/* dirsync behaviour (directories only) */
+#define FS_XFLAG_TOPDIR		0x02000000	/* top of directory hierarchies*/
+#define FS_XFLAG_HUGE_FILE	0x04000000	/* set to each huge file */
+#define FS_XFLAG_EXTENTS	0x08000000	/* inode uses extents */
+#define FS_XFLAG_EA_INODE	0x10000000	/* inode used for large EA */
+#define FS_XFLAG_EOFBLOCKS	0x20000000	/* blocks allocated beyond EOF */
+#define FS_XFLAG_INLINE_DATA	0x40000000	/* inode has inline data. */
+#define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this */
+
 
 #define NR_FILE  8192	/* this can well be larger on a larger system */
 
@@ -163,6 +219,8 @@ struct inodes_stat_t {
 #define	FS_IOC_GETVERSION		_IOR('v', 1, long)
 #define	FS_IOC_SETVERSION		_IOW('v', 2, long)
 #define FS_IOC_FIEMAP			_IOWR('f', 11, struct fiemap)
+#define FS_IOC_FSGETXATTR		_IOR('f', 31, struct fsxattr)
+#define FS_IOC_FSSETXATTR		_IOW('f', 32, struct fsxattr)
 #define FS_IOC32_GETFLAGS		_IOR('f', 1, int)
 #define FS_IOC32_SETFLAGS		_IOW('f', 2, int)
 #define FS_IOC32_GETVERSION		_IOR('v', 1, int)
-- 
1.7.1


^ permalink raw reply related

* [v8 3/5] ext4: adds project quota support
From: Li Xi @ 2014-12-09  5:22 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4, linux-api, tytso, adilger, jack, viro,
	hch, dmonakhov
In-Reply-To: <1418102548-5469-1-git-send-email-lixi@ddn.com>

This patch adds mount options for enabling/disabling project quota
accounting and enforcement. A new specific inode is also used for
project quota accounting.

Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ext4.h  |    8 +++-
 fs/ext4/super.c |   95 ++++++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 90 insertions(+), 13 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8bd1da9..136e18c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -208,6 +208,7 @@ struct ext4_io_submit {
 #define EXT4_UNDEL_DIR_INO	 6	/* Undelete directory inode */
 #define EXT4_RESIZE_INO		 7	/* Reserved group descriptors inode */
 #define EXT4_JOURNAL_INO	 8	/* Journal inode */
+#define EXT4_PRJ_QUOTA_INO	 9	/* Project quota inode */
 
 /* First non-reserved inode for old ext4 filesystems */
 #define EXT4_GOOD_OLD_FIRST_INO	11
@@ -982,6 +983,7 @@ struct ext4_inode_info {
 #define EXT4_MOUNT_DIOREAD_NOLOCK	0x400000 /* Enable support for dio read nolocking */
 #define EXT4_MOUNT_JOURNAL_CHECKSUM	0x800000 /* Journal checksums */
 #define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT	0x1000000 /* Journal Async Commit */
+#define EXT4_MOUNT_PRJQUOTA		0x2000000 /* Project quota support */
 #define EXT4_MOUNT_DELALLOC		0x8000000 /* Delalloc support */
 #define EXT4_MOUNT_DATA_ERR_ABORT	0x10000000 /* Abort on file data write */
 #define EXT4_MOUNT_BLOCK_VALIDITY	0x20000000 /* Block validity checking */
@@ -1157,7 +1159,8 @@ struct ext4_super_block {
 	__le32	s_grp_quota_inum;	/* inode for tracking group quota */
 	__le32	s_overhead_clusters;	/* overhead blocks/clusters in fs */
 	__le32	s_backup_bgs[2];	/* groups with sparse_super2 SBs */
-	__le32	s_reserved[106];	/* Padding to the end of the block */
+	__le32	s_prj_quota_inum;	/* inode for tracking project quota */
+	__le32	s_reserved[105];	/* Padding to the end of the block */
 	__le32	s_checksum;		/* crc32c(superblock) */
 };
 
@@ -1172,7 +1175,7 @@ struct ext4_super_block {
 #define EXT4_MF_FS_ABORTED	0x0002	/* Fatal error detected */
 
 /* Number of quota types we support */
-#define EXT4_MAXQUOTAS 2
+#define EXT4_MAXQUOTAS 3
 
 /*
  * fourth extended-fs super-block data in memory
@@ -1364,6 +1367,7 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
 		ino == EXT4_BOOT_LOADER_INO ||
 		ino == EXT4_JOURNAL_INO ||
 		ino == EXT4_RESIZE_INO ||
+		ino == EXT4_PRJ_QUOTA_INO ||
 		(ino >= EXT4_FIRST_INO(sb) &&
 		 ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count));
 }
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6b67795..f5d8ca2 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1035,8 +1035,8 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
 }
 
 #ifdef CONFIG_QUOTA
-#define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group")
-#define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
+static char *quotatypes[] = INITQFNAMES;
+#define QTYPE2NAME(t) (quotatypes[t])
 
 static int ext4_write_dquot(struct dquot *dquot);
 static int ext4_acquire_dquot(struct dquot *dquot);
@@ -1128,10 +1128,11 @@ enum {
 	Opt_journal_path, Opt_journal_checksum, Opt_journal_async_commit,
 	Opt_abort, Opt_data_journal, Opt_data_ordered, Opt_data_writeback,
 	Opt_data_err_abort, Opt_data_err_ignore,
-	Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
+	Opt_usrjquota, Opt_grpjquota, Opt_prjjquota,
+	Opt_offusrjquota, Opt_offgrpjquota, Opt_offprjjquota,
 	Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv1, Opt_quota,
 	Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
-	Opt_usrquota, Opt_grpquota, Opt_i_version,
+	Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
 	Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_mblk_io_submit,
 	Opt_nomblk_io_submit, Opt_block_validity, Opt_noblock_validity,
 	Opt_inode_readahead_blks, Opt_journal_ioprio,
@@ -1183,6 +1184,8 @@ static const match_table_t tokens = {
 	{Opt_usrjquota, "usrjquota=%s"},
 	{Opt_offgrpjquota, "grpjquota="},
 	{Opt_grpjquota, "grpjquota=%s"},
+	{Opt_offprjjquota, "prjjquota="},
+	{Opt_prjjquota, "prjjquota=%s"},
 	{Opt_jqfmt_vfsold, "jqfmt=vfsold"},
 	{Opt_jqfmt_vfsv0, "jqfmt=vfsv0"},
 	{Opt_jqfmt_vfsv1, "jqfmt=vfsv1"},
@@ -1190,6 +1193,7 @@ static const match_table_t tokens = {
 	{Opt_noquota, "noquota"},
 	{Opt_quota, "quota"},
 	{Opt_usrquota, "usrquota"},
+	{Opt_prjquota, "prjquota"},
 	{Opt_barrier, "barrier=%u"},
 	{Opt_barrier, "barrier"},
 	{Opt_nobarrier, "nobarrier"},
@@ -1404,12 +1408,17 @@ static const struct mount_opts {
 							MOPT_SET | MOPT_Q},
 	{Opt_grpquota, EXT4_MOUNT_QUOTA | EXT4_MOUNT_GRPQUOTA,
 							MOPT_SET | MOPT_Q},
+	{Opt_prjquota, EXT4_MOUNT_QUOTA | EXT4_MOUNT_PRJQUOTA,
+							MOPT_SET | MOPT_Q},
 	{Opt_noquota, (EXT4_MOUNT_QUOTA | EXT4_MOUNT_USRQUOTA |
-		       EXT4_MOUNT_GRPQUOTA), MOPT_CLEAR | MOPT_Q},
+		       EXT4_MOUNT_GRPQUOTA | EXT4_MOUNT_PRJQUOTA),
+							MOPT_CLEAR | MOPT_Q},
 	{Opt_usrjquota, 0, MOPT_Q},
 	{Opt_grpjquota, 0, MOPT_Q},
+	{Opt_prjjquota, 0, MOPT_Q},
 	{Opt_offusrjquota, 0, MOPT_Q},
 	{Opt_offgrpjquota, 0, MOPT_Q},
+	{Opt_offprjjquota, 0, MOPT_Q},
 	{Opt_jqfmt_vfsold, QFMT_VFS_OLD, MOPT_QFMT},
 	{Opt_jqfmt_vfsv0, QFMT_VFS_V0, MOPT_QFMT},
 	{Opt_jqfmt_vfsv1, QFMT_VFS_V1, MOPT_QFMT},
@@ -1432,10 +1441,14 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
 		return set_qf_name(sb, USRQUOTA, &args[0]);
 	else if (token == Opt_grpjquota)
 		return set_qf_name(sb, GRPQUOTA, &args[0]);
+	else if (token == Opt_prjjquota)
+		return set_qf_name(sb, PRJQUOTA, &args[0]);
 	else if (token == Opt_offusrjquota)
 		return clear_qf_name(sb, USRQUOTA);
 	else if (token == Opt_offgrpjquota)
 		return clear_qf_name(sb, GRPQUOTA);
+	else if (token == Opt_offprjjquota)
+		return clear_qf_name(sb, PRJQUOTA);
 #endif
 	switch (token) {
 	case Opt_noacl:
@@ -1661,19 +1674,28 @@ static int parse_options(char *options, struct super_block *sb,
 	}
 #ifdef CONFIG_QUOTA
 	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_QUOTA) &&
-	    (test_opt(sb, USRQUOTA) || test_opt(sb, GRPQUOTA))) {
+	    (test_opt(sb, USRQUOTA) ||
+	     test_opt(sb, GRPQUOTA) ||
+	     test_opt(sb, PRJQUOTA))) {
 		ext4_msg(sb, KERN_ERR, "Cannot set quota options when QUOTA "
 			 "feature is enabled");
 		return 0;
 	}
-	if (sbi->s_qf_names[USRQUOTA] || sbi->s_qf_names[GRPQUOTA]) {
+	if (sbi->s_qf_names[USRQUOTA] ||
+	    sbi->s_qf_names[GRPQUOTA] ||
+	    sbi->s_qf_names[PRJQUOTA]) {
 		if (test_opt(sb, USRQUOTA) && sbi->s_qf_names[USRQUOTA])
 			clear_opt(sb, USRQUOTA);
 
 		if (test_opt(sb, GRPQUOTA) && sbi->s_qf_names[GRPQUOTA])
 			clear_opt(sb, GRPQUOTA);
 
-		if (test_opt(sb, GRPQUOTA) || test_opt(sb, USRQUOTA)) {
+		if (test_opt(sb, PRJQUOTA) && sbi->s_qf_names[PRJQUOTA])
+			clear_opt(sb, PRJQUOTA);
+
+		if (test_opt(sb, GRPQUOTA) ||
+		    test_opt(sb, USRQUOTA) ||
+		    test_opt(sb, PRJQUOTA)) {
 			ext4_msg(sb, KERN_ERR, "old and new quota "
 					"format mixing");
 			return 0;
@@ -1733,6 +1755,9 @@ static inline void ext4_show_quota_options(struct seq_file *seq,
 
 	if (sbi->s_qf_names[GRPQUOTA])
 		seq_printf(seq, ",grpjquota=%s", sbi->s_qf_names[GRPQUOTA]);
+
+	if (sbi->s_qf_names[PRJQUOTA])
+		seq_printf(seq, ",prjjquota=%s", sbi->s_qf_names[PRJQUOTA]);
 #endif
 }
 
@@ -5037,6 +5062,46 @@ restore_opts:
 	return err;
 }
 
+static int ext4_statfs_project(struct super_block *sb,
+			       kprojid_t projid, struct kstatfs *buf)
+{
+	struct kqid qid;
+	struct dquot *dquot;
+	u64 limit;
+	u64 curblock;
+
+	qid = make_kqid_projid(projid);
+	dquot = dqget(sb, qid);
+	if (!dquot)
+		return -ESRCH;
+	spin_lock(&dq_data_lock);
+
+	limit = dquot->dq_dqb.dqb_bsoftlimit ?
+		dquot->dq_dqb.dqb_bsoftlimit :
+		dquot->dq_dqb.dqb_bhardlimit;
+	if (limit && buf->f_blocks * buf->f_bsize > limit) {
+		curblock = dquot->dq_dqb.dqb_curspace / buf->f_bsize;
+		buf->f_blocks = limit / buf->f_bsize;
+		buf->f_bfree = buf->f_bavail =
+			(buf->f_blocks > curblock) ?
+			 (buf->f_blocks - curblock) : 0;
+	}
+
+	limit = dquot->dq_dqb.dqb_isoftlimit ?
+		dquot->dq_dqb.dqb_isoftlimit :
+		dquot->dq_dqb.dqb_ihardlimit;
+	if (limit && buf->f_files > limit) {
+		buf->f_files = limit;
+		buf->f_ffree =
+			(buf->f_files > dquot->dq_dqb.dqb_curinodes) ?
+			 (buf->f_files - dquot->dq_dqb.dqb_curinodes) : 0;
+	}
+
+	spin_unlock(&dq_data_lock);
+	dqput(dquot);
+	return 0;
+}
+
 static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct super_block *sb = dentry->d_sb;
@@ -5045,6 +5110,7 @@ static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf)
 	ext4_fsblk_t overhead = 0, resv_blocks;
 	u64 fsid;
 	s64 bfree;
+	struct inode *inode = dentry->d_inode;
 	resv_blocks = EXT4_C2B(sbi, atomic64_read(&sbi->s_resv_clusters));
 
 	if (!test_opt(sb, MINIX_DF))
@@ -5069,6 +5135,9 @@ static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf)
 	buf->f_fsid.val[0] = fsid & 0xFFFFFFFFUL;
 	buf->f_fsid.val[1] = (fsid >> 32) & 0xFFFFFFFFUL;
 
+	if (ext4_test_inode_flag(inode, EXT4_INODE_PROJINHERIT) &&
+	    sb_has_quota_limits_enabled(sb, PRJQUOTA))
+		ext4_statfs_project(sb, EXT4_I(inode)->i_projid, buf);
 	return 0;
 }
 
@@ -5149,7 +5218,9 @@ static int ext4_mark_dquot_dirty(struct dquot *dquot)
 
 	/* Are we journaling quotas? */
 	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_QUOTA) ||
-	    sbi->s_qf_names[USRQUOTA] || sbi->s_qf_names[GRPQUOTA]) {
+	    sbi->s_qf_names[USRQUOTA] ||
+	    sbi->s_qf_names[GRPQUOTA] ||
+	    sbi->s_qf_names[PRJQUOTA]) {
 		dquot_mark_dquot_dirty(dquot);
 		return ext4_write_dquot(dquot);
 	} else {
@@ -5233,7 +5304,8 @@ static int ext4_quota_enable(struct super_block *sb, int type, int format_id,
 	struct inode *qf_inode;
 	unsigned long qf_inums[EXT4_MAXQUOTAS] = {
 		le32_to_cpu(EXT4_SB(sb)->s_es->s_usr_quota_inum),
-		le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum)
+		le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum),
+		le32_to_cpu(EXT4_SB(sb)->s_es->s_prj_quota_inum)
 	};
 
 	BUG_ON(!EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_QUOTA));
@@ -5261,7 +5333,8 @@ static int ext4_enable_quotas(struct super_block *sb)
 	int type, err = 0;
 	unsigned long qf_inums[EXT4_MAXQUOTAS] = {
 		le32_to_cpu(EXT4_SB(sb)->s_es->s_usr_quota_inum),
-		le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum)
+		le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum),
+		le32_to_cpu(EXT4_SB(sb)->s_es->s_prj_quota_inum)
 	};
 
 	sb_dqopt(sb)->flags |= DQUOT_QUOTA_SYS_FILE;
-- 
1.7.1


^ permalink raw reply related

* [v8 2/5] ext4: adds project ID support
From: Li Xi @ 2014-12-09  5:22 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4, linux-api, tytso, adilger, jack, viro,
	hch, dmonakhov
In-Reply-To: <1418102548-5469-1-git-send-email-lixi@ddn.com>

This patch adds a new internal field of ext4 inode to save project
identifier. Also a new flag EXT4_INODE_PROJINHERIT is added for
inheriting project ID from parent directory.

Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ext4.h          |   21 +++++++++++++++++----
 fs/ext4/ialloc.c        |    6 ++++++
 fs/ext4/inode.c         |   29 +++++++++++++++++++++++++++++
 fs/ext4/namei.c         |   17 +++++++++++++++++
 fs/ext4/super.c         |    1 +
 include/uapi/linux/fs.h |    1 +
 6 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 29c43e7..8bd1da9 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -377,16 +377,18 @@ struct flex_groups {
 #define EXT4_EA_INODE_FL	        0x00200000 /* Inode used for large EA */
 #define EXT4_EOFBLOCKS_FL		0x00400000 /* Blocks allocated beyond EOF */
 #define EXT4_INLINE_DATA_FL		0x10000000 /* Inode has inline data. */
+#define EXT4_PROJINHERIT_FL		FS_PROJINHERIT_FL /* Create with parents projid */
 #define EXT4_RESERVED_FL		0x80000000 /* reserved for ext4 lib */
 
-#define EXT4_FL_USER_VISIBLE		0x004BDFFF /* User visible flags */
-#define EXT4_FL_USER_MODIFIABLE		0x004380FF /* User modifiable flags */
+#define EXT4_FL_USER_VISIBLE		0x204BDFFF /* User visible flags */
+#define EXT4_FL_USER_MODIFIABLE		0x204380FF /* User modifiable flags */
 
 /* Flags that should be inherited by new inodes from their parent. */
 #define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
 			   EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
 			   EXT4_NOCOMPR_FL | EXT4_JOURNAL_DATA_FL |\
-			   EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL)
+			   EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL |\
+			   EXT4_PROJINHERIT_FL)
 
 /* Flags that are appropriate for regular files (all but dir-specific ones). */
 #define EXT4_REG_FLMASK (~(EXT4_DIRSYNC_FL | EXT4_TOPDIR_FL))
@@ -434,6 +436,7 @@ enum {
 	EXT4_INODE_EA_INODE	= 21,	/* Inode used for large EA */
 	EXT4_INODE_EOFBLOCKS	= 22,	/* Blocks allocated beyond EOF */
 	EXT4_INODE_INLINE_DATA	= 28,	/* Data in inode. */
+	EXT4_INODE_PROJINHERIT	= 29,	/* Create with parents projid */
 	EXT4_INODE_RESERVED	= 31,	/* reserved for ext4 lib */
 };
 
@@ -683,6 +686,7 @@ struct ext4_inode {
 	__le32  i_crtime;       /* File Creation time */
 	__le32  i_crtime_extra; /* extra FileCreationtime (nsec << 2 | epoch) */
 	__le32  i_version_hi;	/* high 32 bits for 64-bit version */
+	__le32  i_projid;	/* Project ID */
 };
 
 struct move_extent {
@@ -934,6 +938,7 @@ struct ext4_inode_info {
 
 	/* Precomputed uuid+inum+igen checksum for seeding inode checksums */
 	__u32 i_csum_seed;
+	kprojid_t i_projid;
 };
 
 /*
@@ -1518,6 +1523,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
  * GDT_CSUM bits are mutually exclusive.
  */
 #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM	0x0400
+#define EXT4_FEATURE_RO_COMPAT_PROJECT		0x1000 /* Project quota */
 
 #define EXT4_FEATURE_INCOMPAT_COMPRESSION	0x0001
 #define EXT4_FEATURE_INCOMPAT_FILETYPE		0x0002
@@ -1567,7 +1573,8 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
 					 EXT4_FEATURE_RO_COMPAT_HUGE_FILE |\
 					 EXT4_FEATURE_RO_COMPAT_BIGALLOC |\
 					 EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
-					 EXT4_FEATURE_RO_COMPAT_QUOTA)
+					 EXT4_FEATURE_RO_COMPAT_QUOTA |\
+					 EXT4_FEATURE_RO_COMPAT_PROJECT)
 
 /*
  * Default values for user and/or group using reserved blocks
@@ -1575,6 +1582,11 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
 #define	EXT4_DEF_RESUID		0
 #define	EXT4_DEF_RESGID		0
 
+/*
+ * Default project ID
+ */
+#define	EXT4_DEF_PROJID		0
+
 #define EXT4_DEF_INODE_READAHEAD_BLKS	32
 
 /*
@@ -2131,6 +2143,7 @@ extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
 			     loff_t lstart, loff_t lend);
 extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
 extern qsize_t *ext4_get_reserved_space(struct inode *inode);
+extern int ext4_get_projid(struct inode *inode, kprojid_t *projid);
 extern void ext4_da_update_reserve_space(struct inode *inode,
 					int used, int quota_claim);
 
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index ac644c3..fefb948 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -756,6 +756,12 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
 		inode->i_gid = dir->i_gid;
 	} else
 		inode_init_owner(inode, dir, mode);
+	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT) &&
+	    ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) {
+		ei->i_projid = EXT4_I(dir)->i_projid;
+	} else {
+		ei->i_projid = make_kprojid(&init_user_ns, EXT4_DEF_PROJID);
+	}
 	dquot_initialize(inode);
 
 	if (!goal)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5653fa4..29204d4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3863,6 +3863,14 @@ static inline void ext4_iget_extra_inode(struct inode *inode,
 		EXT4_I(inode)->i_inline_off = 0;
 }
 
+int ext4_get_projid(struct inode *inode, kprojid_t *projid)
+{
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb, EXT4_FEATURE_RO_COMPAT_PROJECT))
+		return -EOPNOTSUPP;
+	*projid = EXT4_I(inode)->i_projid;
+	return 0;
+}
+
 struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
 {
 	struct ext4_iloc iloc;
@@ -3874,6 +3882,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
 	int block;
 	uid_t i_uid;
 	gid_t i_gid;
+	projid_t i_projid;
 
 	inode = iget_locked(sb, ino);
 	if (!inode)
@@ -3923,12 +3932,18 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
 	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
 	i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
 	i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
+	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT))
+		i_projid = (projid_t)le32_to_cpu(raw_inode->i_projid);
+	else
+		i_projid = EXT4_DEF_PROJID;
+
 	if (!(test_opt(inode->i_sb, NO_UID32))) {
 		i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
 		i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
 	}
 	i_uid_write(inode, i_uid);
 	i_gid_write(inode, i_gid);
+	ei->i_projid = make_kprojid(&init_user_ns, i_projid);;
 	set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));
 
 	ext4_clear_state_flags(ei);	/* Only relevant on 32-bit archs */
@@ -4158,6 +4173,7 @@ static int ext4_do_update_inode(handle_t *handle,
 	int need_datasync = 0, set_large_file = 0;
 	uid_t i_uid;
 	gid_t i_gid;
+	projid_t i_projid;
 
 	spin_lock(&ei->i_raw_lock);
 
@@ -4170,6 +4186,7 @@ static int ext4_do_update_inode(handle_t *handle,
 	raw_inode->i_mode = cpu_to_le16(inode->i_mode);
 	i_uid = i_uid_read(inode);
 	i_gid = i_gid_read(inode);
+	i_projid = from_kprojid(&init_user_ns, ei->i_projid);
 	if (!(test_opt(inode->i_sb, NO_UID32))) {
 		raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
 		raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
@@ -4249,6 +4266,18 @@ static int ext4_do_update_inode(handle_t *handle,
 		}
 	}
 
+	BUG_ON(!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+			EXT4_FEATURE_RO_COMPAT_PROJECT) &&
+	       i_projid != EXT4_DEF_PROJID);
+	if (i_projid != EXT4_DEF_PROJID &&
+	    (EXT4_INODE_SIZE(inode->i_sb) <= EXT4_GOOD_OLD_INODE_SIZE ||
+	     (!EXT4_FITS_IN_INODE(raw_inode, ei, i_projid)))) {
+		spin_unlock(&ei->i_raw_lock);
+		err = -EFBIG;
+		goto out_brelse;
+	}
+	raw_inode->i_projid = cpu_to_le32(i_projid);
+
 	ext4_inode_csum_set(inode, raw_inode, ei);
 
 	spin_unlock(&ei->i_raw_lock);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 2291923..09e8e55 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2938,6 +2938,11 @@ static int ext4_link(struct dentry *old_dentry,
 	if (inode->i_nlink >= EXT4_LINK_MAX)
 		return -EMLINK;
 
+	if ((ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) &&
+	    (!projid_eq(EXT4_I(dir)->i_projid,
+			EXT4_I(old_dentry->d_inode)->i_projid)))
+		return -EXDEV;
+
 	dquot_initialize(dir);
 
 retry:
@@ -3217,6 +3222,11 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 	int credits;
 	u8 old_file_type;
 
+	if ((ext4_test_inode_flag(new_dir, EXT4_INODE_PROJINHERIT)) &&
+	    (!projid_eq(EXT4_I(new_dir)->i_projid,
+			EXT4_I(old_dentry->d_inode)->i_projid)))
+		return -EXDEV;
+
 	dquot_initialize(old.dir);
 	dquot_initialize(new.dir);
 
@@ -3395,6 +3405,13 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
 	u8 new_file_type;
 	int retval;
 
+	if ((ext4_test_inode_flag(new_dir, EXT4_INODE_PROJINHERIT)) &&
+	    ((!projid_eq(EXT4_I(new_dir)->i_projid,
+			 EXT4_I(old_dentry->d_inode)->i_projid)) ||
+	     (!projid_eq(EXT4_I(old_dir)->i_projid,
+			 EXT4_I(new_dentry->d_inode)->i_projid))))
+		return -EXDEV;
+
 	dquot_initialize(old.dir);
 	dquot_initialize(new.dir);
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4fca81c..6b67795 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1067,6 +1067,7 @@ static const struct dquot_operations ext4_quota_operations = {
 	.write_info	= ext4_write_info,
 	.alloc_dquot	= dquot_alloc,
 	.destroy_dquot	= dquot_destroy,
+	.get_projid	= ext4_get_projid,
 };
 
 static const struct quotactl_ops ext4_qctl_operations = {
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 3735fa0..fcbf647 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -195,6 +195,7 @@ struct inodes_stat_t {
 #define FS_EXTENT_FL			0x00080000 /* Extents */
 #define FS_DIRECTIO_FL			0x00100000 /* Use direct i/o */
 #define FS_NOCOW_FL			0x00800000 /* Do not cow file */
+#define FS_PROJINHERIT_FL		0x20000000 /* Create with parents projid */
 #define FS_RESERVED_FL			0x80000000 /* reserved for ext2 lib */
 
 #define FS_FL_USER_VISIBLE		0x0003DFFF /* User visible flags */
-- 
1.7.1


^ permalink raw reply related

* [v8 1/5] vfs: adds general codes to enforces project quota limits
From: Li Xi @ 2014-12-09  5:22 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4, linux-api, tytso, adilger, jack, viro,
	hch, dmonakhov
In-Reply-To: <1418102548-5469-1-git-send-email-lixi@ddn.com>

This patch adds support for a new quota type PRJQUOTA for project quota
enforcement. Also a new method get_projid() is added into dquot_operations
structure.

Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/quota/dquot.c           |   35 ++++++++++++++++++++++++++++++-----
 fs/quota/quota.c           |    8 ++++++--
 fs/quota/quotaio_v2.h      |    6 ++++--
 include/linux/quota.h      |    2 ++
 include/uapi/linux/quota.h |    6 ++++--
 5 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 6b45272..379ff77 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1154,8 +1154,8 @@ static int need_print_warning(struct dquot_warn *warn)
 			return uid_eq(current_fsuid(), warn->w_dq_id.uid);
 		case GRPQUOTA:
 			return in_group_p(warn->w_dq_id.gid);
-		case PRJQUOTA:	/* Never taken... Just make gcc happy */
-			return 0;
+		case PRJQUOTA:
+			return 1;
 	}
 	return 0;
 }
@@ -1394,6 +1394,9 @@ static void __dquot_initialize(struct inode *inode, int type)
 	/* First get references to structures we might need. */
 	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
 		struct kqid qid;
+		kprojid_t projid;
+		int rc;
+
 		got[cnt] = NULL;
 		if (type != -1 && cnt != type)
 			continue;
@@ -1404,6 +1407,10 @@ static void __dquot_initialize(struct inode *inode, int type)
 		 */
 		if (inode->i_dquot[cnt])
 			continue;
+
+		if (!sb_has_quota_active(sb, cnt))
+			continue;
+
 		init_needed = 1;
 
 		switch (cnt) {
@@ -1413,6 +1420,12 @@ static void __dquot_initialize(struct inode *inode, int type)
 		case GRPQUOTA:
 			qid = make_kqid_gid(inode->i_gid);
 			break;
+		case PRJQUOTA:
+			rc = inode->i_sb->dq_op->get_projid(inode, &projid);
+			if (rc)
+				continue;
+			qid = make_kqid_projid(projid);
+			break;
 		}
 		got[cnt] = dqget(sb, qid);
 	}
@@ -2156,7 +2169,8 @@ static int vfs_load_quota_inode(struct inode *inode, int type, int format_id,
 		error = -EROFS;
 		goto out_fmt;
 	}
-	if (!sb->s_op->quota_write || !sb->s_op->quota_read) {
+	if (!sb->s_op->quota_write || !sb->s_op->quota_read ||
+	    (type == PRJQUOTA && sb->dq_op->get_projid == NULL)) {
 		error = -EINVAL;
 		goto out_fmt;
 	}
@@ -2397,8 +2411,19 @@ static void do_get_dqblk(struct dquot *dquot, struct fs_disk_quota *di)
 
 	memset(di, 0, sizeof(*di));
 	di->d_version = FS_DQUOT_VERSION;
-	di->d_flags = dquot->dq_id.type == USRQUOTA ?
-			FS_USER_QUOTA : FS_GROUP_QUOTA;
+	switch (dquot->dq_id.type) {
+	case USRQUOTA:
+		di->d_flags = FS_USER_QUOTA;
+		break;
+	case GRPQUOTA:
+		di->d_flags = FS_GROUP_QUOTA;
+		break;
+	case PRJQUOTA:
+		di->d_flags = FS_PROJ_QUOTA;
+		break;
+	default:
+		BUG();
+	}
 	di->d_id = from_kqid_munged(current_user_ns(), dquot->dq_id);
 
 	spin_lock(&dq_data_lock);
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 7562164..795d694 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -30,11 +30,15 @@ static int check_quotactl_permission(struct super_block *sb, int type, int cmd,
 	case Q_XGETQSTATV:
 	case Q_XQUOTASYNC:
 		break;
-	/* allow to query information for dquots we "own" */
+	/*
+	 * allow to query information for dquots we "own"
+	 * always allow querying project quota
+	 */
 	case Q_GETQUOTA:
 	case Q_XGETQUOTA:
 		if ((type == USRQUOTA && uid_eq(current_euid(), make_kuid(current_user_ns(), id))) ||
-		    (type == GRPQUOTA && in_egroup_p(make_kgid(current_user_ns(), id))))
+		    (type == GRPQUOTA && in_egroup_p(make_kgid(current_user_ns(), id))) ||
+		    (type == PRJQUOTA))
 			break;
 		/*FALLTHROUGH*/
 	default:
diff --git a/fs/quota/quotaio_v2.h b/fs/quota/quotaio_v2.h
index f1966b4..4e95430 100644
--- a/fs/quota/quotaio_v2.h
+++ b/fs/quota/quotaio_v2.h
@@ -13,12 +13,14 @@
  */
 #define V2_INITQMAGICS {\
 	0xd9c01f11,	/* USRQUOTA */\
-	0xd9c01927	/* GRPQUOTA */\
+	0xd9c01927,	/* GRPQUOTA */\
+	0xd9c03f14,	/* PRJQUOTA */\
 }
 
 #define V2_INITQVERSIONS {\
 	1,		/* USRQUOTA */\
-	1		/* GRPQUOTA */\
+	1,		/* GRPQUOTA */\
+	1,		/* PRJQUOTA */\
 }
 
 /* First generic header */
diff --git a/include/linux/quota.h b/include/linux/quota.h
index 80d345a..f1b25f8 100644
--- a/include/linux/quota.h
+++ b/include/linux/quota.h
@@ -50,6 +50,7 @@
 
 #undef USRQUOTA
 #undef GRPQUOTA
+#undef PRJQUOTA
 enum quota_type {
 	USRQUOTA = 0,		/* element used for user quotas */
 	GRPQUOTA = 1,		/* element used for group quotas */
@@ -312,6 +313,7 @@ struct dquot_operations {
 	/* get reserved quota for delayed alloc, value returned is managed by
 	 * quota code only */
 	qsize_t *(*get_reserved_space) (struct inode *);
+	int (*get_projid) (struct inode *, kprojid_t *);/* Get project ID */
 };
 
 struct path;
diff --git a/include/uapi/linux/quota.h b/include/uapi/linux/quota.h
index 3b6cfbe..b2d9486 100644
--- a/include/uapi/linux/quota.h
+++ b/include/uapi/linux/quota.h
@@ -36,11 +36,12 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 
-#define __DQUOT_VERSION__	"dquot_6.5.2"
+#define __DQUOT_VERSION__	"dquot_6.6.0"
 
-#define MAXQUOTAS 2
+#define MAXQUOTAS 3
 #define USRQUOTA  0		/* element used for user quotas */
 #define GRPQUOTA  1		/* element used for group quotas */
+#define PRJQUOTA  2		/* element used for project quotas */
 
 /*
  * Definitions for the default names of the quotas files.
@@ -48,6 +49,7 @@
 #define INITQFNAMES { \
 	"user",    /* USRQUOTA */ \
 	"group",   /* GRPQUOTA */ \
+	"project", /* PRJQUOTA */ \
 	"undefined", \
 };
 
-- 
1.7.1


^ permalink raw reply related

* [v8 0/5] ext4: add project quota support
From: Li Xi @ 2014-12-09  5:22 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, tytso-3s7WtUTddSA,
	adilger-m1MBpc4rdrD3fQ9qLvQP4Q, jack-AlSwsSmVLrQ,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, hch-wEGCiKHe2LqWVfeAwA7xHQ,
	dmonakhov-GEFAQzZX7r8dnm+yROfE0A

The following patches propose an implementation of project quota
support for ext4. A project is an aggregate of unrelated inodes
which might scatter in different directories. Inodes that belong
to the same project possess an identical identification i.e.
'project ID', just like every inode has its user/group
identification. The following patches add project quota as
supplement to the former uer/group quota types.

The semantics of ext4 project quota is consistent with XFS. Each
directory can have EXT4_INODE_PROJINHERIT flag set. When the
EXT4_INODE_PROJINHERIT flag of a parent directory is not set, a
newly created inode under that directory will have a default project
ID (i.e. 0). And its EXT4_INODE_PROJINHERIT flag is not set either.
When this flag is set on a directory, following rules will be kept:

1) The newly created inode under that directory will inherit both
the EXT4_INODE_PROJINHERIT flag and the project ID from its parent
directory.

2) Hard-linking a inode with different project ID into that directory
will fail with errno EXDEV.

3) Renaming a inode with different project ID into that directory
will fail with errno EXDEV. However, 'mv' command will detect this
failure and copy the renamed inode to a new inode in the directory.
Thus, this new inode will inherit both the project ID and
EXT4_INODE_PROJINHERIT flag.

4) If the project quota of that ID is being enforced, statfs() on
that directory will take the quotas as another upper limits along
with the capacity of the file system, i.e. the total block/inode
number will be the minimum of the quota limits and file system
capacity.

Changelog:
* v8 <- v7:
 - Rebase to newest dev branch of ext4 repository (3.18.0_rc3).
* v7 <- v6:
 - Map ext4 inode flags to xflags of struct fsxattr;
 - Add patch to cleanup ext4 inode flag definitions.
* v6 <- v5:
 - Add project ID check for cross rename;
 - Remove patch of EXT4_IOC_GETPROJECT/EXT4_IOC_SETPROJECT ioctl
* v5 <- v4:
 - Check project feature when set/get project ID;
 - Do not check project feature for project quota;
 - Add support of FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR.
* v4 <- v3:
 - Do not check project feature when set/get project ID;
 - Use EXT4_MAXQUOTAS instead of MAXQUOTAS in ext4 patches;
 - Remove unnecessary change of fs/quota/dquot.c;
 - Remove CONFIG_QUOTA_PROJECT.
* v3 <- v2:
 - Add EXT4_INODE_PROJINHERIT semantics.
* v2 <- v1:
 - Add ioctl interface for setting/getting project;
 - Add EXT4_FEATURE_RO_COMPAT_PROJECT;
 - Add get_projid() method in struct dquot_operations;
 - Add error check of ext4_inode_projid_set/get().

v7: http://www.spinics.net/lists/linux-fsdevel/msg80404.html
v6: http://www.spinics.net/lists/linux-fsdevel/msg80022.html
v5: http://www.spinics.net/lists/linux-api/msg04840.html
v4: http://lwn.net/Articles/612972/
v3: http://www.spinics.net/lists/linux-ext4/msg45184.html
v2: http://www.spinics.net/lists/linux-ext4/msg44695.html
v1: http://article.gmane.org/gmane.comp.file-systems.ext4/45153

Any comments or feedbacks are appreciated.

Regards,
                                         - Li Xi

Li Xi (5):
  vfs: adds general codes to enforces project quota limits
  ext4: adds project ID support
  ext4: adds project quota support
  ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support
  ext4: cleanup inode flag definitions

 fs/ext4/ext4.h             |  190 +++++++++++++++++++++----
 fs/ext4/ialloc.c           |    6 +
 fs/ext4/inode.c            |   29 ++++
 fs/ext4/ioctl.c            |  330 +++++++++++++++++++++++++++++++-------------
 fs/ext4/namei.c            |   17 +++
 fs/ext4/super.c            |   96 +++++++++++--
 fs/quota/dquot.c           |   35 ++++-
 fs/quota/quota.c           |    8 +-
 fs/quota/quotaio_v2.h      |    6 +-
 fs/xfs/xfs_fs.h            |   47 +++----
 include/linux/quota.h      |    2 +
 include/uapi/linux/fs.h    |   59 ++++++++
 include/uapi/linux/quota.h |    6 +-
 13 files changed, 650 insertions(+), 181 deletions(-)

^ permalink raw reply

* Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis
From: Eric W. Biederman @ 2014-12-08 23:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man,
	linux-kernel@vger.kernel.org, LSM, Casey Schaufler,
	Serge E. Hallyn, Richard Weinberger, Kenton Varda, stable
In-Reply-To: <CALCETrXSScp77BUJR5NSTh5-RnEZ9rqELSGJBeEzgdQ-OtohuQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:

> On Mon, Dec 8, 2014 at 2:44 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>> Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:
>>
>>> On Mon, Dec 8, 2014 at 2:11 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>>>
>>>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>>>>
>>>>   A value of 0 means the setgroups system call is disabled in the
>>>
>>> "deny"
>>>
>>>>   current processes user namespace and can not be enabled in the
>>>>   future in this user namespace.
>>>>
>>>>   A value of 1 means the segtoups system call is enabled.
>>>>
>>>
>>> "allow"
>>>
>>>> - Descedent user namespaces inherit the value of setgroups from
>>>
>>> s/Descedent/Descendent/
>>
>> Bah.  I updated everything but the changelog comment.
>>
>>>> --- a/kernel/groups.c
>>>> +++ b/kernel/groups.c
>>>> @@ -222,6 +222,7 @@ bool may_setgroups(void)
>>>>          * the user namespace has been established.
>>>>          */
>>>>         return userns_gid_mappings_established(user_ns) &&
>>>> +               userns_setgroups_allowed(user_ns) &&
>>>>                 ns_capable(user_ns, CAP_SETGID);
>>>>  }
>>>
>>> Can you add a comment explaining the ordering?  For example:
>>
>> I need to think on what I can say to make it clear.
>> Perhaps: /* Careful the order of these checks is important. */
>>
>>> We need to check for a gid mapping before checking setgroups_allowed
>>> because an unprivileged user can create a userns with setgroups
>>> allowed, then disallow setgroups and add a mapping.  If we check in
>>> the opposite order, then we have a race: we could see that setgroups
>>> is allowed before the user clears the bit and then see that there is a
>>> gid mapping after the other thread is done.
>>
>
> This text was actually my suggested comment text.

Now I see.

> If you put smp_rmb() in this function with a comment like that, then I
> think it will all make sense and be obviously correct (even with most
> of the other barriers removed).

Right.

Given that we have to be careful when using these things anyway what
I was hoping to achieve with the barriers appears impossible, and
confusing so I will see about just adding barriers where we need them
for real.  Sigh.

Eric

^ permalink raw reply

* Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis
From: Andy Lutomirski @ 2014-12-08 22:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett,
	stable, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kenton Varda, LSM, Michael Kerrisk-manpages, Richard Weinberger,
	Casey Schaufler, Andrew Morton
In-Reply-To: <87ppbtn4mv.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>

On Mon, Dec 8, 2014 at 2:44 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:
>
>> On Mon, Dec 8, 2014 at 2:11 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>>
>>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>>>
>>>   A value of 0 means the setgroups system call is disabled in the
>>
>> "deny"
>>
>>>   current processes user namespace and can not be enabled in the
>>>   future in this user namespace.
>>>
>>>   A value of 1 means the segtoups system call is enabled.
>>>
>>
>> "allow"
>>
>>> - Descedent user namespaces inherit the value of setgroups from
>>
>> s/Descedent/Descendent/
>
> Bah.  I updated everything but the changelog comment.
>
>>> --- a/kernel/groups.c
>>> +++ b/kernel/groups.c
>>> @@ -222,6 +222,7 @@ bool may_setgroups(void)
>>>          * the user namespace has been established.
>>>          */
>>>         return userns_gid_mappings_established(user_ns) &&
>>> +               userns_setgroups_allowed(user_ns) &&
>>>                 ns_capable(user_ns, CAP_SETGID);
>>>  }
>>
>> Can you add a comment explaining the ordering?  For example:
>
> I need to think on what I can say to make it clear.
> Perhaps: /* Careful the order of these checks is important. */
>
>> We need to check for a gid mapping before checking setgroups_allowed
>> because an unprivileged user can create a userns with setgroups
>> allowed, then disallow setgroups and add a mapping.  If we check in
>> the opposite order, then we have a race: we could see that setgroups
>> is allowed before the user clears the bit and then see that there is a
>> gid mapping after the other thread is done.
>

This text was actually my suggested comment text.

If you put smp_rmb() in this function with a comment like that, then I
think it will all make sense and be obviously correct (even with most
of the other barriers removed).

--Andy

> Since these are independent atomic variables yes that ordering issue
> seems to be the case.
>
> For me it was the natural ordering of the checks so I didn't even bother
> to think about what happens when you reorder them.
>
> Eric



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished
From: Andy Lutomirski @ 2014-12-08 22:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-man, Kees Cook, Richard Weinberger, Linux Containers,
	Josh Triplett, stable,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kenton Varda, LSM, Michael Kerrisk-manpages, Linux API,
	Casey Schaufler, Andrew Morton
In-Reply-To: <874mt5ojfh.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>

On Mon, Dec 8, 2014 at 2:39 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> writes:
>
>> Am 08.12.2014 um 23:25 schrieb Andy Lutomirski:
>>> On Mon, Dec 8, 2014 at 2:17 PM, Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> wrote:
>>>> Am 08.12.2014 um 23:07 schrieb Eric W. Biederman:
>>>>>
>>>>> setgroups is unique in not needing a valid mapping before it can be called,
>>>>> in the case of setgroups(0, NULL) which drops all supplemental groups.
>>>>>
>>>>> The design of the user namespace assumes that CAP_SETGID can not actually
>>>>> be used until a gid mapping is established.  Therefore add a helper function
>>>>> to see if the user namespace gid mapping has been established and call
>>>>> that function in the setgroups permission check.
>>>>>
>>>>> This is part of the fix for CVE-2014-8989, being able to drop groups
>>>>> without privilege using user namespaces.
>>>>>
>>>>> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>>>>> ---
>>>>>  include/linux/user_namespace.h | 9 +++++++++
>>>>>  kernel/groups.c                | 7 ++++++-
>>>>>  2 files changed, 15 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>>>>> index e95372654f09..41cc26e5a350 100644
>>>>> --- a/include/linux/user_namespace.h
>>>>> +++ b/include/linux/user_namespace.h
>>>>> @@ -37,6 +37,15 @@ struct user_namespace {
>>>>>
>>>>>  extern struct user_namespace init_user_ns;
>>>>>
>>>>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
>>>>> +{
>>>>> +     bool established;
>>>>> +     smp_mb__before_atomic();
>>>>> +     established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
>>>>> +     smp_mb__after_atomic();
>>>>> +     return established;
>>>>> +}
>>>>> +
>>>>
>>>> Maybe this is a stupid question, but why do we need all this magic
>>>> around established =  ... ?
>>>> The purpose of this code is to check whether ns->gid_map.nr_extents != 0
>>>> in a lock-free manner?
>>>>
>>>
>>> See my other comment -- the ordering will matter at the end of the series.
>>
>> But ns->gid_map.nr_extents is not atomic, it is a plain u32.
>> This confuses me.
>
> Read Documentation/atomic_ops.txt a plain u32 is atomic by definiton.
>

I still don't understand why the helper changed to smp_mb__before_atomic.

> Which is a little bit convoluted.  However that is part of the of the
> gid mapping path and I optimized that as far as I humanly could so that
> calls like stat don't take a noticable slow donw.
>
> On this path we don't particularly care except that I am using an the
> existing data structure.

As an example, arm64 defines both smp_mb__before_atomic and
smp_mb__after_atomic as smp_mb(), which is heavier then smp_rmb(), and
there are two of them.  So I still like the explicit smp_rmb() better.

--Andy

>
> Eric
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

* Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis
From: Eric W. Biederman @ 2014-12-08 22:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett,
	stable, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kenton Varda, LSM, Michael Kerrisk-manpages, Richard Weinberger,
	Casey Schaufler, Andrew Morton
In-Reply-To: <CALCETrU-o5mPr1jCaLXDuuF6J2N470zAtx=8Fa-SjF=ZpdE8mQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:

> On Mon, Dec 8, 2014 at 2:11 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>
>> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>>
>>   A value of 0 means the setgroups system call is disabled in the
>
> "deny"
>
>>   current processes user namespace and can not be enabled in the
>>   future in this user namespace.
>>
>>   A value of 1 means the segtoups system call is enabled.
>>
>
> "allow"
>
>> - Descedent user namespaces inherit the value of setgroups from
>
> s/Descedent/Descendent/

Bah.  I updated everything but the changelog comment.

>> --- a/kernel/groups.c
>> +++ b/kernel/groups.c
>> @@ -222,6 +222,7 @@ bool may_setgroups(void)
>>          * the user namespace has been established.
>>          */
>>         return userns_gid_mappings_established(user_ns) &&
>> +               userns_setgroups_allowed(user_ns) &&
>>                 ns_capable(user_ns, CAP_SETGID);
>>  }
>
> Can you add a comment explaining the ordering?  For example:

I need to think on what I can say to make it clear.
Perhaps: /* Careful the order of these checks is important. */

> We need to check for a gid mapping before checking setgroups_allowed
> because an unprivileged user can create a userns with setgroups
> allowed, then disallow setgroups and add a mapping.  If we check in
> the opposite order, then we have a race: we could see that setgroups
> is allowed before the user clears the bit and then see that there is a
> gid mapping after the other thread is done.

Since these are independent atomic variables yes that ordering issue
seems to be the case.

For me it was the natural ordering of the checks so I didn't even bother
to think about what happens when you reorder them.

Eric

^ permalink raw reply

* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished
From: Eric W. Biederman @ 2014-12-08 22:39 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett,
	stable, Andy Lutomirski, Kenton Varda, LSM,
	Michael Kerrisk-manpages, Casey Schaufler, Andrew Morton,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <548625E3.6020400-/L3Ra7n9ekc@public.gmane.org>

Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> writes:

> Am 08.12.2014 um 23:25 schrieb Andy Lutomirski:
>> On Mon, Dec 8, 2014 at 2:17 PM, Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> wrote:
>>> Am 08.12.2014 um 23:07 schrieb Eric W. Biederman:
>>>>
>>>> setgroups is unique in not needing a valid mapping before it can be called,
>>>> in the case of setgroups(0, NULL) which drops all supplemental groups.
>>>>
>>>> The design of the user namespace assumes that CAP_SETGID can not actually
>>>> be used until a gid mapping is established.  Therefore add a helper function
>>>> to see if the user namespace gid mapping has been established and call
>>>> that function in the setgroups permission check.
>>>>
>>>> This is part of the fix for CVE-2014-8989, being able to drop groups
>>>> without privilege using user namespaces.
>>>>
>>>> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>>>> ---
>>>>  include/linux/user_namespace.h | 9 +++++++++
>>>>  kernel/groups.c                | 7 ++++++-
>>>>  2 files changed, 15 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>>>> index e95372654f09..41cc26e5a350 100644
>>>> --- a/include/linux/user_namespace.h
>>>> +++ b/include/linux/user_namespace.h
>>>> @@ -37,6 +37,15 @@ struct user_namespace {
>>>>
>>>>  extern struct user_namespace init_user_ns;
>>>>
>>>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
>>>> +{
>>>> +     bool established;
>>>> +     smp_mb__before_atomic();
>>>> +     established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
>>>> +     smp_mb__after_atomic();
>>>> +     return established;
>>>> +}
>>>> +
>>>
>>> Maybe this is a stupid question, but why do we need all this magic
>>> around established =  ... ?
>>> The purpose of this code is to check whether ns->gid_map.nr_extents != 0
>>> in a lock-free manner?
>>>
>> 
>> See my other comment -- the ordering will matter at the end of the series.
>
> But ns->gid_map.nr_extents is not atomic, it is a plain u32.
> This confuses me.

Read Documentation/atomic_ops.txt a plain u32 is atomic by definiton.

Which is a little bit convoluted.  However that is part of the of the
gid mapping path and I optimized that as far as I humanly could so that
calls like stat don't take a noticable slow donw.

On this path we don't particularly care except that I am using an the
existing data structure.

Eric

^ permalink raw reply

* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished
From: Andy Lutomirski @ 2014-12-08 22:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett,
	stable, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kenton Varda, LSM, Michael Kerrisk-manpages, Richard Weinberger,
	Casey Schaufler, Andrew Morton
In-Reply-To: <87h9x5ok0h.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>

On Mon, Dec 8, 2014 at 2:26 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:
>
>> On Mon, Dec 8, 2014 at 2:07 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>>
>>> setgroups is unique in not needing a valid mapping before it can be called,
>>> in the case of setgroups(0, NULL) which drops all supplemental groups.
>>>
>>> The design of the user namespace assumes that CAP_SETGID can not actually
>>> be used until a gid mapping is established.  Therefore add a helper function
>>> to see if the user namespace gid mapping has been established and call
>>> that function in the setgroups permission check.
>>>
>>> This is part of the fix for CVE-2014-8989, being able to drop groups
>>> without privilege using user namespaces.
>>>
>>> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>>> ---
>>>  include/linux/user_namespace.h | 9 +++++++++
>>>  kernel/groups.c                | 7 ++++++-
>>>  2 files changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>>> index e95372654f09..41cc26e5a350 100644
>>> --- a/include/linux/user_namespace.h
>>> +++ b/include/linux/user_namespace.h
>>> @@ -37,6 +37,15 @@ struct user_namespace {
>>>
>>>  extern struct user_namespace init_user_ns;
>>>
>>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
>>> +{
>>> +       bool established;
>>> +       smp_mb__before_atomic();
>>> +       established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
>>> +       smp_mb__after_atomic();
>>> +       return established;
>>> +}
>>
>> I don't think this works on all platforms.  ACCESS_ONCE is not atomic
>> in the smp_mb__before_atomic sense.
>
> Documentation/atomic_ops.txt documents ACCESS_ONCE as being equivalent
> to atomic_read() and atomic_set().  smp_mb__before_atomic and
> smp_mb__after_atomic() are Documented as working with atomic_read and
> atomic_set.  Maybe it is a stretch to use them but it doesn't seem like
> much of a stretch.

I don't fully understand the design there.  I think this is an attempt
to work around the fact that test_bit is fully atomic on x86 but not
elsewhere.

>
> Further at this point I don't know that any barriers are strictly
> needed, beyond the ACCESS_ONCE.  However since x86 does all of the
> ordering in hardware that I need I am not going to find any bugs that
> don't require a barrier.
>
> All I really want is the same level of barriers I would get if I used a
> spin-lock protected data structure so I don't need to worry about
> crazy smp issues that happen when the hardware decides it is safe to
> reorder things.

Use smp_rmb(), I think.  It'll be obviously correct, and the
performance impact really doesn't matter.

Also, on platforms where this stuff matters, the barrier in
smp_mb__whatever will be a full fence, whereas smp_rmb may be lighter
weight.

--Andy

>
> Eric
>
>
>>> +
>>>  #ifdef CONFIG_USER_NS
>>>
>>>  static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
>>> diff --git a/kernel/groups.c b/kernel/groups.c
>>> index 02d8a251c476..e0335e44f76a 100644
>>> --- a/kernel/groups.c
>>> +++ b/kernel/groups.c
>>> @@ -6,6 +6,7 @@
>>>  #include <linux/slab.h>
>>>  #include <linux/security.h>
>>>  #include <linux/syscalls.h>
>>> +#include <linux/user_namespace.h>
>>>  #include <asm/uaccess.h>
>>>
>>>  /* init to 2 - one for init_task, one to ensure it is never freed */
>>> @@ -217,7 +218,11 @@ bool may_setgroups(void)
>>>  {
>>>         struct user_namespace *user_ns = current_user_ns();
>>>
>>> -       return ns_capable(user_ns, CAP_SETGID);
>>> +       /* It is not safe to use setgroups until a gid mapping in
>>> +        * the user namespace has been established.
>>> +        */
>>> +       return userns_gid_mappings_established(user_ns) &&
>>> +               ns_capable(user_ns, CAP_SETGID);
>>>  }
>>>
>>>  /*
>>> --
>>> 1.9.1
>>>
>>
>> --Andy



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished
From: Richard Weinberger @ 2014-12-08 22:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Eric W. Biederman, Linux Containers, Josh Triplett, Andrew Morton,
	Kees Cook, Michael Kerrisk-manpages, Linux API, linux-man,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, LSM,
	Casey Schaufler, Serge E. Hallyn, Kenton Varda, stable
In-Reply-To: <CALCETrXSG5QN8J3GtZjLdV6T7j_uaMG=fyTDt27vEK0NpGs9qg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Am 08.12.2014 um 23:25 schrieb Andy Lutomirski:
> On Mon, Dec 8, 2014 at 2:17 PM, Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> wrote:
>> Am 08.12.2014 um 23:07 schrieb Eric W. Biederman:
>>>
>>> setgroups is unique in not needing a valid mapping before it can be called,
>>> in the case of setgroups(0, NULL) which drops all supplemental groups.
>>>
>>> The design of the user namespace assumes that CAP_SETGID can not actually
>>> be used until a gid mapping is established.  Therefore add a helper function
>>> to see if the user namespace gid mapping has been established and call
>>> that function in the setgroups permission check.
>>>
>>> This is part of the fix for CVE-2014-8989, being able to drop groups
>>> without privilege using user namespaces.
>>>
>>> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>>> ---
>>>  include/linux/user_namespace.h | 9 +++++++++
>>>  kernel/groups.c                | 7 ++++++-
>>>  2 files changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>>> index e95372654f09..41cc26e5a350 100644
>>> --- a/include/linux/user_namespace.h
>>> +++ b/include/linux/user_namespace.h
>>> @@ -37,6 +37,15 @@ struct user_namespace {
>>>
>>>  extern struct user_namespace init_user_ns;
>>>
>>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
>>> +{
>>> +     bool established;
>>> +     smp_mb__before_atomic();
>>> +     established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
>>> +     smp_mb__after_atomic();
>>> +     return established;
>>> +}
>>> +
>>
>> Maybe this is a stupid question, but why do we need all this magic
>> around established =  ... ?
>> The purpose of this code is to check whether ns->gid_map.nr_extents != 0
>> in a lock-free manner?
>>
> 
> See my other comment -- the ordering will matter at the end of the series.

But ns->gid_map.nr_extents is not atomic, it is a plain u32.
This confuses me.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished
From: Eric W. Biederman @ 2014-12-08 22:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett,
	stable, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kenton Varda, LSM, Michael Kerrisk-manpages, Richard Weinberger,
	Casey Schaufler, Andrew Morton
In-Reply-To: <CALCETrVczpN8k_yenjGnocv4jYyNy+EXEW7h5NOhqy2XMAB=_Q@mail.gmail.com>

Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:

> On Mon, Dec 8, 2014 at 2:07 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>
>> setgroups is unique in not needing a valid mapping before it can be called,
>> in the case of setgroups(0, NULL) which drops all supplemental groups.
>>
>> The design of the user namespace assumes that CAP_SETGID can not actually
>> be used until a gid mapping is established.  Therefore add a helper function
>> to see if the user namespace gid mapping has been established and call
>> that function in the setgroups permission check.
>>
>> This is part of the fix for CVE-2014-8989, being able to drop groups
>> without privilege using user namespaces.
>>
>> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> ---
>>  include/linux/user_namespace.h | 9 +++++++++
>>  kernel/groups.c                | 7 ++++++-
>>  2 files changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> index e95372654f09..41cc26e5a350 100644
>> --- a/include/linux/user_namespace.h
>> +++ b/include/linux/user_namespace.h
>> @@ -37,6 +37,15 @@ struct user_namespace {
>>
>>  extern struct user_namespace init_user_ns;
>>
>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
>> +{
>> +       bool established;
>> +       smp_mb__before_atomic();
>> +       established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
>> +       smp_mb__after_atomic();
>> +       return established;
>> +}
>
> I don't think this works on all platforms.  ACCESS_ONCE is not atomic
> in the smp_mb__before_atomic sense.

Documentation/atomic_ops.txt documents ACCESS_ONCE as being equivalent
to atomic_read() and atomic_set().  smp_mb__before_atomic and
smp_mb__after_atomic() are Documented as working with atomic_read and
atomic_set.  Maybe it is a stretch to use them but it doesn't seem like
much of a stretch.

Further at this point I don't know that any barriers are strictly
needed, beyond the ACCESS_ONCE.  However since x86 does all of the
ordering in hardware that I need I am not going to find any bugs that
don't require a barrier.

All I really want is the same level of barriers I would get if I used a
spin-lock protected data structure so I don't need to worry about
crazy smp issues that happen when the hardware decides it is safe to
reorder things.

Eric


>> +
>>  #ifdef CONFIG_USER_NS
>>
>>  static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
>> diff --git a/kernel/groups.c b/kernel/groups.c
>> index 02d8a251c476..e0335e44f76a 100644
>> --- a/kernel/groups.c
>> +++ b/kernel/groups.c
>> @@ -6,6 +6,7 @@
>>  #include <linux/slab.h>
>>  #include <linux/security.h>
>>  #include <linux/syscalls.h>
>> +#include <linux/user_namespace.h>
>>  #include <asm/uaccess.h>
>>
>>  /* init to 2 - one for init_task, one to ensure it is never freed */
>> @@ -217,7 +218,11 @@ bool may_setgroups(void)
>>  {
>>         struct user_namespace *user_ns = current_user_ns();
>>
>> -       return ns_capable(user_ns, CAP_SETGID);
>> +       /* It is not safe to use setgroups until a gid mapping in
>> +        * the user namespace has been established.
>> +        */
>> +       return userns_gid_mappings_established(user_ns) &&
>> +               ns_capable(user_ns, CAP_SETGID);
>>  }
>>
>>  /*
>> --
>> 1.9.1
>>
>
> --Andy

^ permalink raw reply

* Re: [CFT][PATCH 7/7] userns: Allow setting gid_maps without privilege when setgroups is disabled
From: Andy Lutomirski @ 2014-12-08 22:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett,
	stable, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kenton Varda, LSM, Michael Kerrisk-manpages, Richard Weinberger,
	Casey Schaufler, Andrew Morton
In-Reply-To: <87egs9pz5u.fsf_-_-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>

On Mon, Dec 8, 2014 at 2:14 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>
> Now that setgroups can be disabled and not reenabled, setting gid_map
> without privielge can now be enabled when setgroups is disabled.
>
> This restores most of the functionality that was lost when unprivilege

unprivileged.

> setting of gid_map was removed.  Applications that use this
> functionality will need to check to see if they use setgroups or
> init_groups, and if they don't they can be fixed by simply
> disabling of setgroups before they run.

"disabling setgroups before writing to gid_map"?

The code is:

Reviewed-by: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>

>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> ---
>  kernel/user_namespace.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 3d128f91ced3..459c7f647072 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -828,6 +828,11 @@ static bool new_idmap_permitted(const struct file *file,
>                         kuid_t uid = make_kuid(ns->parent, id);
>                         if (uid_eq(uid, cred->euid))
>                                 return true;
> +               } else if (cap_setid == CAP_SETGID) {
> +                       kgid_t gid = make_kgid(ns->parent, id);
> +                       if (!userns_setgroups_allowed(ns) &&
> +                           gid_eq(gid, cred->egid))
> +                               return true;
>                 }
>         }
>
> --
> 1.9.1
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

* Re: [CFT][PATCH 2/7] userns: Don't allow setgroups until a gid mapping has been setablished
From: Andy Lutomirski @ 2014-12-08 22:25 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett,
	stable, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kenton Varda, LSM, Michael Kerrisk-manpages, Casey Schaufler,
	Andrew Morton, Eric W. Biederman
In-Reply-To: <5486237D.4060304-/L3Ra7n9ekc@public.gmane.org>

On Mon, Dec 8, 2014 at 2:17 PM, Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> wrote:
> Am 08.12.2014 um 23:07 schrieb Eric W. Biederman:
>>
>> setgroups is unique in not needing a valid mapping before it can be called,
>> in the case of setgroups(0, NULL) which drops all supplemental groups.
>>
>> The design of the user namespace assumes that CAP_SETGID can not actually
>> be used until a gid mapping is established.  Therefore add a helper function
>> to see if the user namespace gid mapping has been established and call
>> that function in the setgroups permission check.
>>
>> This is part of the fix for CVE-2014-8989, being able to drop groups
>> without privilege using user namespaces.
>>
>> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> ---
>>  include/linux/user_namespace.h | 9 +++++++++
>>  kernel/groups.c                | 7 ++++++-
>>  2 files changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> index e95372654f09..41cc26e5a350 100644
>> --- a/include/linux/user_namespace.h
>> +++ b/include/linux/user_namespace.h
>> @@ -37,6 +37,15 @@ struct user_namespace {
>>
>>  extern struct user_namespace init_user_ns;
>>
>> +static inline bool userns_gid_mappings_established(const struct user_namespace *ns)
>> +{
>> +     bool established;
>> +     smp_mb__before_atomic();
>> +     established = ACCESS_ONCE(ns->gid_map.nr_extents) != 0;
>> +     smp_mb__after_atomic();
>> +     return established;
>> +}
>> +
>
> Maybe this is a stupid question, but why do we need all this magic
> around established =  ... ?
> The purpose of this code is to check whether ns->gid_map.nr_extents != 0
> in a lock-free manner?
>

See my other comment -- the ordering will matter at the end of the series.

It might be nicer to do this differently: in may_setgroups, do:

if (!userns_gid_mappings_established)
  return false;

/* User code can start with setgroups allowed, disallow it, and then
add a mapping.  We need to prevent a race that could cause this
function to return true. */
smp_rmb();

if (!userns_setgroups_allowed)
  return false;

--Andy

> Thanks,
> //richard



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

* Re: [CFT][PATCH 6/7] userns: Add a knob to disable setgroups on a per user namespace basis
From: Andy Lutomirski @ 2014-12-08 22:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett,
	stable, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kenton Varda, LSM, Michael Kerrisk-manpages, Richard Weinberger,
	Casey Schaufler, Andrew Morton
In-Reply-To: <87mw6xpzb0.fsf_-_-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>

On Mon, Dec 8, 2014 at 2:11 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>
> - Expose the knob to user space through a proc file /proc/<pid>/setgroups
>
>   A value of 0 means the setgroups system call is disabled in the

"deny"

>   current processes user namespace and can not be enabled in the
>   future in this user namespace.
>
>   A value of 1 means the segtoups system call is enabled.
>

"allow"

> - Descedent user namespaces inherit the value of setgroups from

s/Descedent/Descendent/

> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -222,6 +222,7 @@ bool may_setgroups(void)
>          * the user namespace has been established.
>          */
>         return userns_gid_mappings_established(user_ns) &&
> +               userns_setgroups_allowed(user_ns) &&
>                 ns_capable(user_ns, CAP_SETGID);
>  }

Can you add a comment explaining the ordering?  For example:

We need to check for a gid mapping before checking setgroups_allowed
because an unprivileged user can create a userns with setgroups
allowed, then disallow setgroups and add a mapping.  If we check in
the opposite order, then we have a race: we could see that setgroups
is allowed before the user clears the bit and then see that there is a
gid mapping after the other thread is done.

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox