* [PATCH -V8 17/26] richacl: Permission check algorithm
From: Aneesh Kumar K.V @ 2011-10-23 17:43 UTC (permalink / raw)
To: agruen, bfields, akpm, viro, dhowells
Cc: aneesh.kumar, linux-fsdevel, linux-nfs, linux-kernel
In-Reply-To: <1319391835-5829-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: Andreas Gruenbacher <agruen@kernel.org>
As in the standard POSIX file permission model, each process is the
owner, group, or other file class. A process is
- in the owner file class if it owns the file,
- in the group file class if it is in the file's owning group or it
matches any of the user or group entries, and
- in the other file class otherwise.
Each file class is associated with a file mask.
A richacl grants a requested access if the NFSv4 acl in the richacl
grants the requested permissions (according to the NFSv4 permission
check algorithm) and the file mask that applies to the process includes
the requested permissions.
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/richacl_base.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++
include/linux/richacl.h | 2 +
2 files changed, 101 insertions(+), 0 deletions(-)
diff --git a/fs/richacl_base.c b/fs/richacl_base.c
index 41ff82d..ab9139e 100644
--- a/fs/richacl_base.c
+++ b/fs/richacl_base.c
@@ -352,3 +352,102 @@ richacl_chmod(struct richacl *acl, mode_t mode)
return clone;
}
EXPORT_SYMBOL_GPL(richacl_chmod);
+
+/**
+ * richacl_permission - richacl permission check algorithm
+ * @inode: inode to check
+ * @acl: rich acl of the inode
+ * @mask: requested access (ACE4_* bitmask)
+ *
+ * Checks if the current process is granted @mask flags in @acl.
+ */
+int
+richacl_permission(struct inode *inode, const struct richacl *acl,
+ unsigned int mask)
+{
+ const struct richace *ace;
+ unsigned int requested = mask, denied = 0;
+ int in_owning_group = in_group_p(inode->i_gid);
+ int in_owner_or_group_class = in_owning_group;
+
+ /*
+ * We don't need to know which class the process is in when the acl is
+ * not masked.
+ */
+ if (!(acl->a_flags & ACL4_MASKED))
+ in_owner_or_group_class = 1;
+
+ /*
+ * A process is
+ * - in the owner file class if it owns the file,
+ * - in the group file class if it is in the file's owning group or
+ * it matches any of the user or group entries, and
+ * - in the other file class otherwise.
+ */
+
+ /*
+ * Check if the acl grants the requested access and determine which
+ * file class the process is in.
+ */
+ richacl_for_each_entry(ace, acl) {
+ unsigned int ace_mask = ace->e_mask;
+
+ if (richace_is_inherit_only(ace))
+ continue;
+ if (richace_is_owner(ace)) {
+ if (current_fsuid() != inode->i_uid)
+ continue;
+ goto is_owner;
+ } else if (richace_is_group(ace)) {
+ if (!in_owning_group)
+ continue;
+ } else if (richace_is_unix_id(ace)) {
+ if (ace->e_flags & ACE4_IDENTIFIER_GROUP) {
+ if (!in_group_p(ace->e_id))
+ continue;
+ } else {
+ if (current_fsuid() != ace->e_id)
+ continue;
+ }
+ } else
+ goto is_everyone;
+
+is_owner:
+ /* The process is in the owner or group file class. */
+ in_owner_or_group_class = 1;
+
+is_everyone:
+ /* Check which mask flags the ACE allows or denies. */
+ if (richace_is_deny(ace))
+ denied |= ace_mask & mask;
+ mask &= ~ace_mask;
+
+ /*
+ * Keep going until we know which file class
+ * the process is in.
+ */
+ if (!mask && in_owner_or_group_class)
+ break;
+ }
+ denied |= mask;
+
+ if (acl->a_flags & ACL4_MASKED) {
+ unsigned int file_mask;
+
+ /*
+ * The file class a process is in determines which file mask
+ * applies. Check if that file mask also grants the requested
+ * access.
+ */
+ if (current_fsuid() == inode->i_uid)
+ file_mask = acl->a_owner_mask;
+ else if (in_owner_or_group_class)
+ file_mask = acl->a_group_mask;
+ else
+ file_mask = acl->a_other_mask;
+ denied |= requested & ~file_mask;
+ }
+
+ return denied ? -EACCES : 0;
+}
+EXPORT_SYMBOL_GPL(richacl_permission);
diff --git a/include/linux/richacl.h b/include/linux/richacl.h
index a1ff8a3..d43700a 100644
--- a/include/linux/richacl.h
+++ b/include/linux/richacl.h
@@ -282,5 +282,7 @@ extern unsigned int richacl_mode_to_mask(mode_t);
extern unsigned int richacl_want_to_mask(unsigned int);
extern void richacl_compute_max_masks(struct richacl *);
extern struct richacl *richacl_chmod(struct richacl *, mode_t);
+extern int richacl_permission(struct inode *, const struct richacl *,
+ unsigned int);
#endif /* __RICHACL_H */
--
1.7.5.4
^ permalink raw reply related
* [PATCH -V8 16/26] richacl: Update the file masks in chmod()
From: Aneesh Kumar K.V @ 2011-10-23 17:43 UTC (permalink / raw)
To: agruen, bfields, akpm, viro, dhowells
Cc: aneesh.kumar, linux-fsdevel, linux-nfs, linux-kernel
In-Reply-To: <1319391835-5829-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: Andreas Gruenbacher <agruen@kernel.org>
Doing a chmod() sets the file mode, which includes the file permission
bits. When a file has a richacl, the permissions that the richacl
grants need to be limited to what the new file permission bits allow.
This is done by setting the file masks in the richacl to what the file
permission bits map to. The richacl access check algorithm takes the
file masks into account, which ensures that the richacl cannot grant too
many permissions.
It is possible to explicitly add permissions to the file masks which go
beyond what the file permission bits can grant (like the ACE4_WRITE_ACL
permission). The POSIX.1 standard calls this an alternate file access
control mechanism. A subsequent chmod() would ensure that those
permissions are disabled again.
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/richacl_base.c | 40 ++++++++++++++++++++++++++++++++++++++++
include/linux/richacl.h | 1 +
2 files changed, 41 insertions(+), 0 deletions(-)
diff --git a/fs/richacl_base.c b/fs/richacl_base.c
index d73f1b0..41ff82d 100644
--- a/fs/richacl_base.c
+++ b/fs/richacl_base.c
@@ -312,3 +312,43 @@ restart:
acl->a_flags &= ~ACL4_MASKED;
}
EXPORT_SYMBOL_GPL(richacl_compute_max_masks);
+
+/**
+ * richacl_chmod - update the file masks to reflect the new mode
+ * @mode: new file permission bits
+ *
+ * Return a copy of @acl where the file masks have been replaced by the file
+ * masks corresponding to the file permission bits in @mode, or returns @acl
+ * itself if the file masks are already up to date. Takes over a reference
+ * to @acl.
+ */
+struct richacl *
+richacl_chmod(struct richacl *acl, mode_t mode)
+{
+ unsigned int owner_mask, group_mask, other_mask;
+ struct richacl *clone;
+
+ owner_mask = richacl_mode_to_mask(mode >> 6) |
+ ACE4_POSIX_OWNER_ALLOWED;
+ group_mask = richacl_mode_to_mask(mode >> 3);
+ other_mask = richacl_mode_to_mask(mode);
+
+ if (acl->a_owner_mask == owner_mask &&
+ acl->a_group_mask == group_mask &&
+ acl->a_other_mask == other_mask &&
+ (acl->a_flags & ACL4_MASKED))
+ return acl;
+
+ clone = richacl_clone(acl);
+ richacl_put(acl);
+ if (!clone)
+ return ERR_PTR(-ENOMEM);
+
+ clone->a_flags |= ACL4_MASKED;
+ clone->a_owner_mask = owner_mask;
+ clone->a_group_mask = group_mask;
+ clone->a_other_mask = other_mask;
+
+ return clone;
+}
+EXPORT_SYMBOL_GPL(richacl_chmod);
diff --git a/include/linux/richacl.h b/include/linux/richacl.h
index 14f18b5..a1ff8a3 100644
--- a/include/linux/richacl.h
+++ b/include/linux/richacl.h
@@ -281,5 +281,6 @@ extern int richacl_masks_to_mode(const struct richacl *);
extern unsigned int richacl_mode_to_mask(mode_t);
extern unsigned int richacl_want_to_mask(unsigned int);
extern void richacl_compute_max_masks(struct richacl *);
+extern struct richacl *richacl_chmod(struct richacl *, mode_t);
#endif /* __RICHACL_H */
--
1.7.5.4
^ permalink raw reply related
* [PATCH -V8 15/26] richacl: Compute maximum file masks from an acl
From: Aneesh Kumar K.V @ 2011-10-23 17:43 UTC (permalink / raw)
To: agruen, bfields, akpm, viro, dhowells
Cc: aneesh.kumar, linux-fsdevel, linux-nfs, linux-kernel
In-Reply-To: <1319391835-5829-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: Andreas Gruenbacher <agruen@kernel.org>
Compute upper bound owner, group, and other file masks with as few
permissions as possible without denying any permissions that the NFSv4
acl in a richacl grants.
This algorithm is used when a file inherits an acl at create time and
when an acl is set via a mechanism that does not specify file modes
(such as via nfsd). When user-space sets an acl, the file masks are
passed in as part of the xattr.
When setting a richacl, the file masks determine what the file
permission bits will be set to; see richacl_masks_to_mode().
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/richacl_base.c | 128 +++++++++++++++++++++++++++++++++++++++++++++++
include/linux/richacl.h | 1 +
2 files changed, 129 insertions(+), 0 deletions(-)
diff --git a/fs/richacl_base.c b/fs/richacl_base.c
index bca2093..d73f1b0 100644
--- a/fs/richacl_base.c
+++ b/fs/richacl_base.c
@@ -184,3 +184,131 @@ richace_is_same_identifier(const struct richace *a, const struct richace *b)
return a->e_id == b->e_id;
#undef WHO_FLAGS
}
+
+/**
+ * richacl_allowed_to_who - mask flags allowed to a specific who value
+ *
+ * Computes the mask values allowed to a specific who value, taking
+ * EVERYONE@ entries into account.
+ */
+static unsigned int richacl_allowed_to_who(struct richacl *acl,
+ struct richace *who)
+{
+ struct richace *ace;
+ unsigned int allowed = 0;
+
+ richacl_for_each_entry_reverse(ace, acl) {
+ if (richace_is_inherit_only(ace))
+ continue;
+ if (richace_is_same_identifier(ace, who) ||
+ richace_is_everyone(ace)) {
+ if (richace_is_allow(ace))
+ allowed |= ace->e_mask;
+ else if (richace_is_deny(ace))
+ allowed &= ~ace->e_mask;
+ }
+ }
+ return allowed;
+}
+
+/**
+ * richacl_group_class_allowed - maximum permissions the group class is allowed
+ *
+ * See richacl_compute_max_masks().
+ */
+static unsigned int richacl_group_class_allowed(struct richacl *acl)
+{
+ struct richace *ace;
+ unsigned int everyone_allowed = 0, group_class_allowed = 0;
+ int had_group_ace = 0;
+
+ richacl_for_each_entry_reverse(ace, acl) {
+ if (richace_is_inherit_only(ace) ||
+ richace_is_owner(ace))
+ continue;
+
+ if (richace_is_everyone(ace)) {
+ if (richace_is_allow(ace))
+ everyone_allowed |= ace->e_mask;
+ else if (richace_is_deny(ace))
+ everyone_allowed &= ~ace->e_mask;
+ } else {
+ group_class_allowed |=
+ richacl_allowed_to_who(acl, ace);
+
+ if (richace_is_group(ace))
+ had_group_ace = 1;
+ }
+ }
+ if (!had_group_ace)
+ group_class_allowed |= everyone_allowed;
+ return group_class_allowed;
+}
+
+/**
+ * richacl_compute_max_masks - compute upper bound masks
+ *
+ * Computes upper bound owner, group, and other masks so that none of
+ * the mask flags allowed by the acl are disabled (for any choice of the
+ * file owner or group membership).
+ */
+void richacl_compute_max_masks(struct richacl *acl)
+{
+ unsigned int gmask = ~0;
+ struct richace *ace;
+
+ /*
+ * @gmask contains all permissions which the group class is ever
+ * allowed. We use it to avoid adding permissions to the group mask
+ * from everyone@ allow aces which the group class is always denied
+ * through other aces. For example, the following acl would otherwise
+ * result in a group mask or rw:
+ *
+ * group@:w::deny
+ * everyone@:rw::allow
+ *
+ * Avoid computing @gmask for acls which do not include any group class
+ * deny aces: in such acls, the group class is never denied any
+ * permissions from everyone@ allow aces.
+ */
+
+restart:
+ acl->a_owner_mask = 0;
+ acl->a_group_mask = 0;
+ acl->a_other_mask = 0;
+
+ richacl_for_each_entry_reverse(ace, acl) {
+ if (richace_is_inherit_only(ace))
+ continue;
+
+ if (richace_is_owner(ace)) {
+ if (richace_is_allow(ace))
+ acl->a_owner_mask |= ace->e_mask;
+ else if (richace_is_deny(ace))
+ acl->a_owner_mask &= ~ace->e_mask;
+ } else if (richace_is_everyone(ace)) {
+ if (richace_is_allow(ace)) {
+ acl->a_owner_mask |= ace->e_mask;
+ acl->a_group_mask |= ace->e_mask & gmask;
+ acl->a_other_mask |= ace->e_mask;
+ } else if (richace_is_deny(ace)) {
+ acl->a_owner_mask &= ~ace->e_mask;
+ acl->a_group_mask &= ~ace->e_mask;
+ acl->a_other_mask &= ~ace->e_mask;
+ }
+ } else {
+ if (richace_is_allow(ace)) {
+ acl->a_owner_mask |= ace->e_mask & gmask;
+ acl->a_group_mask |= ace->e_mask & gmask;
+ } else if (richace_is_deny(ace) && gmask == ~0) {
+ gmask = richacl_group_class_allowed(acl);
+ if (likely(gmask != ~0))
+ /* should always be true */
+ goto restart;
+ }
+ }
+ }
+
+ acl->a_flags &= ~ACL4_MASKED;
+}
+EXPORT_SYMBOL_GPL(richacl_compute_max_masks);
diff --git a/include/linux/richacl.h b/include/linux/richacl.h
index fe875fe..14f18b5 100644
--- a/include/linux/richacl.h
+++ b/include/linux/richacl.h
@@ -280,5 +280,6 @@ extern int richace_is_same_identifier(const struct richace *,
extern int richacl_masks_to_mode(const struct richacl *);
extern unsigned int richacl_mode_to_mask(mode_t);
extern unsigned int richacl_want_to_mask(unsigned int);
+extern void richacl_compute_max_masks(struct richacl *);
#endif /* __RICHACL_H */
--
1.7.5.4
^ permalink raw reply related
* [PATCH -V8 14/26] richacl: Permission mapping functions
From: Aneesh Kumar K.V @ 2011-10-23 17:43 UTC (permalink / raw)
To: agruen, bfields, akpm, viro, dhowells
Cc: aneesh.kumar, linux-fsdevel, linux-nfs, linux-kernel
In-Reply-To: <1319391835-5829-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: Andreas Gruenbacher <agruen@kernel.org>
We need to map from POSIX permissions to NFSv4 permissions when a
chmod() is done, from NFSv4 permissions to POSIX permissions when an acl
is set (which implicitly sets the file permission bits), and from the
MAY_READ/MAY_WRITE/MAY_EXEC/MAY_APPEND flags to NFSv4 permissions when
doing an access check in a richacl.
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/richacl_base.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++
include/linux/richacl.h | 47 +++++++++++++++++++
2 files changed, 164 insertions(+), 0 deletions(-)
diff --git a/fs/richacl_base.c b/fs/richacl_base.c
index 689e1a6..bca2093 100644
--- a/fs/richacl_base.c
+++ b/fs/richacl_base.c
@@ -56,6 +56,123 @@ richacl_clone(const struct richacl *acl)
}
/**
+ * richacl_mask_to_mode - compute the file permission bits which correspond to @mask
+ * @mask: %ACE4_* permission mask
+ *
+ * See richacl_masks_to_mode().
+ */
+static int
+richacl_mask_to_mode(unsigned int mask)
+{
+ int mode = 0;
+
+ if (mask & ACE4_POSIX_MODE_READ)
+ mode |= MAY_READ;
+ if (mask & ACE4_POSIX_MODE_WRITE)
+ mode |= MAY_WRITE;
+ if (mask & ACE4_POSIX_MODE_EXEC)
+ mode |= MAY_EXEC;
+
+ return mode;
+}
+
+/**
+ * richacl_masks_to_mode - compute the file permission bits from the file masks
+ *
+ * When setting a richacl, we set the file permission bits to indicate maximum
+ * permissions: for example, we set the Write permission when a mask contains
+ * ACE4_APPEND_DATA even if it does not also contain ACE4_WRITE_DATA.
+ *
+ * Permissions which are not in ACE4_POSIX_MODE_READ, ACE4_POSIX_MODE_WRITE, or
+ * ACE4_POSIX_MODE_EXEC cannot be represented in the file permission bits.
+ * Such permissions can still be effective, but not for new files or after a
+ * chmod(), and only if they were set explicitly, for example, by setting a
+ * richacl.
+ */
+int
+richacl_masks_to_mode(const struct richacl *acl)
+{
+ return richacl_mask_to_mode(acl->a_owner_mask) << 6 |
+ richacl_mask_to_mode(acl->a_group_mask) << 3 |
+ richacl_mask_to_mode(acl->a_other_mask);
+}
+EXPORT_SYMBOL_GPL(richacl_masks_to_mode);
+
+/**
+ * richacl_mode_to_mask - compute a file mask from the lowest three mode bits
+ *
+ * When the file permission bits of a file are set with chmod(), this specifies
+ * the maximum permissions that processes will get. All permissions beyond
+ * that will be removed from the file masks, and become ineffective.
+ *
+ * We also add in the permissions which are always allowed no matter what the
+ * acl says.
+ */
+unsigned int
+richacl_mode_to_mask(mode_t mode)
+{
+ unsigned int mask = ACE4_POSIX_ALWAYS_ALLOWED;
+
+ if (mode & MAY_READ)
+ mask |= ACE4_POSIX_MODE_READ;
+ if (mode & MAY_WRITE)
+ mask |= ACE4_POSIX_MODE_WRITE;
+ if (mode & MAY_EXEC)
+ mask |= ACE4_POSIX_MODE_EXEC;
+
+ return mask;
+}
+
+/**
+ * richacl_want_to_mask - convert the iop->permission want argument to a mask
+ * @want: @want argument of the permission inode operation
+ *
+ * When checking for append, @want is (MAY_WRITE | MAY_APPEND).
+ *
+ * Richacls use the iop->may_create and iop->may_delete hooks which are
+ * used for checking if creating and deleting files is allowed. These hooks do
+ * not use richacl_want_to_mask(), so we do not have to deal with mapping
+ * MAY_WRITE to ACE4_ADD_FILE, ACE4_ADD_SUBDIRECTORY, and ACE4_DELETE_CHILD
+ * here.
+ */
+unsigned int
+richacl_want_to_mask(unsigned int want)
+{
+ unsigned int mask = 0;
+
+ if (want & MAY_READ)
+ mask |= ACE4_READ_DATA;
+ if (want & MAY_DELETE_SELF)
+ mask |= ACE4_DELETE;
+ if (want & MAY_TAKE_OWNERSHIP)
+ mask |= ACE4_WRITE_OWNER;
+ if (want & MAY_CHMOD)
+ mask |= ACE4_WRITE_ACL;
+ if (want & MAY_SET_TIMES)
+ mask |= ACE4_WRITE_ATTRIBUTES;
+ if (want & MAY_EXEC)
+ mask |= ACE4_EXECUTE;
+ /*
+ * differentiate MAY_WRITE from these request
+ */
+ if (want & (MAY_APPEND |
+ MAY_CREATE_FILE | MAY_CREATE_DIR |
+ MAY_DELETE_CHILD)) {
+ if (want & MAY_APPEND)
+ mask |= ACE4_APPEND_DATA;
+ if (want & MAY_CREATE_FILE)
+ mask |= ACE4_ADD_FILE;
+ if (want & MAY_CREATE_DIR)
+ mask |= ACE4_ADD_SUBDIRECTORY;
+ if (want & MAY_DELETE_CHILD)
+ mask |= ACE4_DELETE_CHILD;
+ } else if (want & MAY_WRITE)
+ mask |= ACE4_WRITE_DATA;
+ return mask;
+}
+EXPORT_SYMBOL_GPL(richacl_want_to_mask);
+
+/**
* richace_is_same_identifier - are both identifiers the same?
*/
int
diff --git a/include/linux/richacl.h b/include/linux/richacl.h
index 51d6937..fe875fe 100644
--- a/include/linux/richacl.h
+++ b/include/linux/richacl.h
@@ -119,6 +119,49 @@ struct richacl {
ACE4_WRITE_OWNER | \
ACE4_SYNCHRONIZE)
+/*
+ * The POSIX permissions are supersets of the following NFSv4 permissions:
+ *
+ * - MAY_READ maps to READ_DATA or LIST_DIRECTORY, depending on the type
+ * of the file system object.
+ *
+ * - MAY_WRITE maps to WRITE_DATA or ACE4_APPEND_DATA for files, and to
+ * ADD_FILE, ACE4_ADD_SUBDIRECTORY, or ACE4_DELETE_CHILD for directories.
+ *
+ * - MAY_EXECUTE maps to ACE4_EXECUTE.
+ *
+ * (Some of these NFSv4 permissions have the same bit values.)
+ */
+#define ACE4_POSIX_MODE_READ ( \
+ ACE4_READ_DATA | \
+ ACE4_LIST_DIRECTORY)
+#define ACE4_POSIX_MODE_WRITE ( \
+ ACE4_WRITE_DATA | \
+ ACE4_ADD_FILE | \
+ ACE4_APPEND_DATA | \
+ ACE4_ADD_SUBDIRECTORY | \
+ ACE4_DELETE_CHILD)
+#define ACE4_POSIX_MODE_EXEC ACE4_EXECUTE
+#define ACE4_POSIX_MODE_ALL ( \
+ ACE4_POSIX_MODE_READ | \
+ ACE4_POSIX_MODE_WRITE | \
+ ACE4_POSIX_MODE_EXEC)
+/*
+ * These permissions are always allowed
+ * no matter what the acl says.
+ */
+#define ACE4_POSIX_ALWAYS_ALLOWED ( \
+ ACE4_SYNCHRONIZE | \
+ ACE4_READ_ATTRIBUTES | \
+ ACE4_READ_ACL)
+/*
+ * The owner is implicitly granted
+ * these permissions under POSIX.
+ */
+#define ACE4_POSIX_OWNER_ALLOWED ( \
+ ACE4_WRITE_ATTRIBUTES | \
+ ACE4_WRITE_OWNER | \
+ ACE4_WRITE_ACL)
/**
* richacl_get - grab another reference to a richacl handle
*/
@@ -234,4 +277,8 @@ richace_is_deny(const struct richace *ace)
extern struct richacl *richacl_alloc(int);
extern int richace_is_same_identifier(const struct richace *,
const struct richace *);
+extern int richacl_masks_to_mode(const struct richacl *);
+extern unsigned int richacl_mode_to_mask(mode_t);
+extern unsigned int richacl_want_to_mask(unsigned int);
+
#endif /* __RICHACL_H */
--
1.7.5.4
^ permalink raw reply related
* [PATCH -V8 13/26] richacl: In-memory representation and helper functions
From: Aneesh Kumar K.V @ 2011-10-23 17:43 UTC (permalink / raw)
To: agruen, bfields, akpm, viro, dhowells
Cc: aneesh.kumar, linux-fsdevel, linux-nfs, linux-kernel
In-Reply-To: <1319391835-5829-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: Andreas Gruenbacher <agruen@kernel.org>
A richacl consists of an NFSv4 acl and an owner, group, and other mask.
These three masks correspond to the owner, group, and other file
permission bits, but they contain NFSv4 permissions instead of POSIX
permissions.
Each entry in the NFSv4 acl applies to the file owner (OWNER@), the
owning group (GROUP@), literally everyone (EVERYONE@), or to a specific
uid or gid.
As in the standard POSIX file permission model, each process is the
owner, group, or other file class. A richacl grants a requested access
only if the NFSv4 acl in the richacl grants the access (according to the
NFSv4 permission check algorithm), and the file mask that applies to the
process includes the requested permissions.
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/Makefile | 2 +
fs/richacl_base.c | 69 ++++++++++++++
include/linux/richacl.h | 237 +++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 308 insertions(+), 0 deletions(-)
create mode 100644 fs/richacl_base.c
create mode 100644 include/linux/richacl.h
diff --git a/fs/Makefile b/fs/Makefile
index afc1096..7612168 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -48,6 +48,8 @@ obj-$(CONFIG_NFS_COMMON) += nfs_common/
obj-$(CONFIG_GENERIC_ACL) += generic_acl.o
obj-$(CONFIG_FHANDLE) += fhandle.o
+obj-$(CONFIG_FS_RICHACL) += richacl.o
+richacl-y := richacl_base.o
obj-y += quota/
diff --git a/fs/richacl_base.c b/fs/richacl_base.c
new file mode 100644
index 0000000..689e1a6
--- /dev/null
+++ b/fs/richacl_base.c
@@ -0,0 +1,69 @@
+/*
+ * Copyright (C) 2006, 2010 Novell, Inc.
+ * Written by Andreas Gruenbacher <agruen@kernel.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/sched.h>
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/richacl.h>
+
+MODULE_LICENSE("GPL");
+
+/**
+ * richacl_alloc - allocate a richacl
+ * @count: number of entries
+ */
+struct richacl *
+richacl_alloc(int count)
+{
+ size_t size = sizeof(struct richacl) + count * sizeof(struct richace);
+ struct richacl *acl = kzalloc(size, GFP_KERNEL);
+
+ if (acl) {
+ atomic_set(&acl->a_refcount, 1);
+ acl->a_count = count;
+ }
+ return acl;
+}
+EXPORT_SYMBOL_GPL(richacl_alloc);
+
+/**
+ * richacl_clone - create a copy of a richacl
+ */
+static struct richacl *
+richacl_clone(const struct richacl *acl)
+{
+ int count = acl->a_count;
+ size_t size = sizeof(struct richacl) + count * sizeof(struct richace);
+ struct richacl *dup = kmalloc(size, GFP_KERNEL);
+
+ if (dup) {
+ memcpy(dup, acl, size);
+ atomic_set(&dup->a_refcount, 1);
+ }
+ return dup;
+}
+
+/**
+ * richace_is_same_identifier - are both identifiers the same?
+ */
+int
+richace_is_same_identifier(const struct richace *a, const struct richace *b)
+{
+#define WHO_FLAGS (ACE4_SPECIAL_WHO | ACE4_IDENTIFIER_GROUP)
+ if ((a->e_flags & WHO_FLAGS) != (b->e_flags & WHO_FLAGS))
+ return 0;
+ return a->e_id == b->e_id;
+#undef WHO_FLAGS
+}
diff --git a/include/linux/richacl.h b/include/linux/richacl.h
new file mode 100644
index 0000000..51d6937
--- /dev/null
+++ b/include/linux/richacl.h
@@ -0,0 +1,237 @@
+/*
+ * Copyright (C) 2006, 2010 Novell, Inc.
+ * Written by Andreas Gruenbacher <agruen@kernel.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#ifndef __RICHACL_H
+#define __RICHACL_H
+#include <linux/slab.h>
+
+#define ACE_OWNER_ID 130
+#define ACE_GROUP_ID 131
+#define ACE_EVERYONE_ID 110
+
+struct richace {
+ unsigned short e_type;
+ unsigned short e_flags;
+ unsigned int e_mask;
+ unsigned int e_id;
+};
+
+struct richacl {
+ atomic_t a_refcount;
+ unsigned int a_owner_mask;
+ unsigned int a_group_mask;
+ unsigned int a_other_mask;
+ unsigned short a_count;
+ unsigned short a_flags;
+ struct richace a_entries[0];
+};
+
+#define richacl_for_each_entry(_ace, _acl) \
+ for (_ace = _acl->a_entries; \
+ _ace != _acl->a_entries + _acl->a_count; \
+ _ace++)
+
+#define richacl_for_each_entry_reverse(_ace, _acl) \
+ for (_ace = _acl->a_entries + _acl->a_count - 1; \
+ _ace != _acl->a_entries - 1; \
+ _ace--)
+
+/* Flag values defined by rich-acl */
+#define ACL4_MASKED 0x80
+
+#define ACL4_VALID_FLAGS ( \
+ ACL4_MASKED)
+
+/* e_type values */
+#define ACE4_ACCESS_ALLOWED_ACE_TYPE 0x0000
+#define ACE4_ACCESS_DENIED_ACE_TYPE 0x0001
+/*#define ACE4_SYSTEM_AUDIT_ACE_TYPE 0x0002*/
+/*#define ACE4_SYSTEM_ALARM_ACE_TYPE 0x0003*/
+
+/* e_flags bitflags */
+#define ACE4_FILE_INHERIT_ACE 0x0001
+#define ACE4_DIRECTORY_INHERIT_ACE 0x0002
+#define ACE4_NO_PROPAGATE_INHERIT_ACE 0x0004
+#define ACE4_INHERIT_ONLY_ACE 0x0008
+/*#define ACE4_SUCCESSFUL_ACCESS_ACE_FLAG 0x0010*/
+/*#define ACE4_FAILED_ACCESS_ACE_FLAG 0x0020*/
+#define ACE4_IDENTIFIER_GROUP 0x0040
+/* richacl specific flag values */
+#define ACE4_SPECIAL_WHO 0x4000
+
+#define ACE4_VALID_FLAGS ( \
+ ACE4_FILE_INHERIT_ACE | \
+ ACE4_DIRECTORY_INHERIT_ACE | \
+ ACE4_NO_PROPAGATE_INHERIT_ACE | \
+ ACE4_INHERIT_ONLY_ACE | \
+ ACE4_IDENTIFIER_GROUP | \
+ ACE4_SPECIAL_WHO)
+
+/* e_mask bitflags */
+#define ACE4_READ_DATA 0x00000001
+#define ACE4_LIST_DIRECTORY 0x00000001
+#define ACE4_WRITE_DATA 0x00000002
+#define ACE4_ADD_FILE 0x00000002
+#define ACE4_APPEND_DATA 0x00000004
+#define ACE4_ADD_SUBDIRECTORY 0x00000004
+#define ACE4_READ_NAMED_ATTRS 0x00000008
+#define ACE4_WRITE_NAMED_ATTRS 0x00000010
+#define ACE4_EXECUTE 0x00000020
+#define ACE4_DELETE_CHILD 0x00000040
+#define ACE4_READ_ATTRIBUTES 0x00000080
+#define ACE4_WRITE_ATTRIBUTES 0x00000100
+#define ACE4_WRITE_RETENTION 0x00000200
+#define ACE4_WRITE_RETENTION_HOLD 0x00000400
+#define ACE4_DELETE 0x00010000
+#define ACE4_READ_ACL 0x00020000
+#define ACE4_WRITE_ACL 0x00040000
+#define ACE4_WRITE_OWNER 0x00080000
+#define ACE4_SYNCHRONIZE 0x00100000
+
+/* Valid ACE4_* flags for directories and non-directories */
+#define ACE4_VALID_MASK ( \
+ ACE4_READ_DATA | ACE4_LIST_DIRECTORY | \
+ ACE4_WRITE_DATA | ACE4_ADD_FILE | \
+ ACE4_APPEND_DATA | ACE4_ADD_SUBDIRECTORY | \
+ ACE4_READ_NAMED_ATTRS | \
+ ACE4_WRITE_NAMED_ATTRS | \
+ ACE4_EXECUTE | \
+ ACE4_DELETE_CHILD | \
+ ACE4_READ_ATTRIBUTES | \
+ ACE4_WRITE_ATTRIBUTES | \
+ ACE4_WRITE_RETENTION | \
+ ACE4_WRITE_RETENTION_HOLD | \
+ ACE4_DELETE | \
+ ACE4_READ_ACL | \
+ ACE4_WRITE_ACL | \
+ ACE4_WRITE_OWNER | \
+ ACE4_SYNCHRONIZE)
+
+/**
+ * richacl_get - grab another reference to a richacl handle
+ */
+static inline struct richacl *
+richacl_get(struct richacl *acl)
+{
+ if (acl)
+ atomic_inc(&acl->a_refcount);
+ return acl;
+}
+
+/**
+ * richacl_put - free a richacl handle
+ */
+static inline void
+richacl_put(struct richacl *acl)
+{
+ if (acl && atomic_dec_and_test(&acl->a_refcount))
+ kfree(acl);
+}
+
+/**
+ * richace_is_owner - check if @ace is an OWNER@ entry
+ */
+static inline int
+richace_is_owner(const struct richace *ace)
+{
+ return (ace->e_flags & ACE4_SPECIAL_WHO) &&
+ ace->e_id == ACE_OWNER_ID;
+}
+
+/**
+ * richace_is_group - check if @ace is a GROUP@ entry
+ */
+static inline int
+richace_is_group(const struct richace *ace)
+{
+ return (ace->e_flags & ACE4_SPECIAL_WHO) &&
+ ace->e_id == ACE_GROUP_ID;
+}
+
+/**
+ * richace_is_everyone - check if @ace is an EVERYONE@ entry
+ */
+static inline int
+richace_is_everyone(const struct richace *ace)
+{
+ return (ace->e_flags & ACE4_SPECIAL_WHO) &&
+ ace->e_id == ACE_EVERYONE_ID;
+}
+
+/**
+ * richace_is_unix_id - check if @ace applies to a specific uid or gid
+ */
+static inline int
+richace_is_unix_id(const struct richace *ace)
+{
+ return !(ace->e_flags & ACE4_SPECIAL_WHO);
+}
+
+/**
+ * richace_is_inherit_only - check if @ace is for inheritance only
+ *
+ * ACEs with the %ACE4_INHERIT_ONLY_ACE flag set have no effect during
+ * permission checking.
+ */
+static inline int
+richace_is_inherit_only(const struct richace *ace)
+{
+ return ace->e_flags & ACE4_INHERIT_ONLY_ACE;
+}
+
+/**
+ * richace_is_inheritable - check if @ace is inheritable
+ */
+static inline int
+richace_is_inheritable(const struct richace *ace)
+{
+ return ace->e_flags & (ACE4_FILE_INHERIT_ACE |
+ ACE4_DIRECTORY_INHERIT_ACE);
+}
+
+/**
+ * richace_clear_inheritance_flags - clear all inheritance flags in @ace
+ */
+static inline void
+richace_clear_inheritance_flags(struct richace *ace)
+{
+ ace->e_flags &= ~(ACE4_FILE_INHERIT_ACE |
+ ACE4_DIRECTORY_INHERIT_ACE |
+ ACE4_NO_PROPAGATE_INHERIT_ACE |
+ ACE4_INHERIT_ONLY_ACE);
+}
+
+/**
+ * richace_is_allow - check if @ace is an %ALLOW type entry
+ */
+static inline int
+richace_is_allow(const struct richace *ace)
+{
+ return ace->e_type == ACE4_ACCESS_ALLOWED_ACE_TYPE;
+}
+
+/**
+ * richace_is_deny - check if @ace is a %DENY type entry
+ */
+static inline int
+richace_is_deny(const struct richace *ace)
+{
+ return ace->e_type == ACE4_ACCESS_DENIED_ACE_TYPE;
+}
+
+extern struct richacl *richacl_alloc(int);
+extern int richace_is_same_identifier(const struct richace *,
+ const struct richace *);
+#endif /* __RICHACL_H */
--
1.7.5.4
^ permalink raw reply related
* [PATCH -V8 11/26] vfs: Add permission flags for setting file attributes
From: Aneesh Kumar K.V @ 2011-10-23 17:43 UTC (permalink / raw)
To: agruen, bfields, akpm, viro, dhowells
Cc: aneesh.kumar, linux-fsdevel, linux-nfs, linux-kernel
In-Reply-To: <1319391835-5829-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: Andreas Gruenbacher <agruen@kernel.org>
Some permission models can allow processes to take ownership of a file,
change the file permissions, and set the file timestamps. Introduce new
permission mask flags and check for those permissions in
inode_change_ok().
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/attr.c | 70 +++++++++++++++++++++++++++++++++++++++++++--------
fs/namei.c | 2 +-
include/linux/fs.h | 4 +++
3 files changed, 64 insertions(+), 12 deletions(-)
diff --git a/fs/attr.c b/fs/attr.c
index f15e9e3..00578b9 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -14,6 +14,55 @@
#include <linux/fcntl.h>
#include <linux/security.h>
+static int richacl_change_ok(struct inode *inode, int mask)
+{
+ if (!IS_RICHACL(inode))
+ return -EPERM;
+
+ if (inode->i_op->permission)
+ return inode->i_op->permission(inode, mask);
+
+ return check_acl(inode, mask);
+}
+
+static bool inode_uid_change_ok(struct inode *inode, uid_t ia_uid)
+{
+ if (current_fsuid() == inode->i_uid && ia_uid == inode->i_uid)
+ return true;
+ if (current_fsuid() == ia_uid &&
+ richacl_change_ok(inode, MAY_TAKE_OWNERSHIP) == 0)
+ return true;
+ if (capable(CAP_CHOWN))
+ return true;
+ return false;
+}
+
+static bool inode_gid_change_ok(struct inode *inode, gid_t ia_gid)
+{
+ int in_group = in_group_p(ia_gid);
+ if (current_fsuid() == inode->i_uid &&
+ (in_group || ia_gid == inode->i_gid))
+ return true;
+ if (in_group && richacl_change_ok(inode, MAY_TAKE_OWNERSHIP) == 0)
+ return true;
+ if (capable(CAP_CHOWN))
+ return true;
+ return false;
+}
+
+static bool inode_owner_permitted_or_capable(struct inode *inode, int mask)
+{
+ struct user_namespace *ns = inode_userns(inode);
+
+ if (current_user_ns() == ns && current_fsuid() == inode->i_uid)
+ return true;
+ if (richacl_change_ok(inode, mask) == 0)
+ return true;
+ if (ns_capable(ns, CAP_FOWNER))
+ return true;
+ return false;
+}
+
/**
* inode_change_ok - check if attribute changes to an inode are allowed
* @inode: inode to check
@@ -45,21 +94,20 @@ int inode_change_ok(struct inode *inode, struct iattr *attr)
return 0;
/* Make sure a caller can chown. */
- if ((ia_valid & ATTR_UID) &&
- (current_fsuid() != inode->i_uid ||
- attr->ia_uid != inode->i_uid) && !capable(CAP_CHOWN))
- return -EPERM;
+ if (ia_valid & ATTR_UID) {
+ if (!inode_uid_change_ok(inode, attr->ia_uid))
+ return -EPERM;
+ }
/* Make sure caller can chgrp. */
- if ((ia_valid & ATTR_GID) &&
- (current_fsuid() != inode->i_uid ||
- (!in_group_p(attr->ia_gid) && attr->ia_gid != inode->i_gid)) &&
- !capable(CAP_CHOWN))
- return -EPERM;
+ if (ia_valid & ATTR_GID) {
+ if (!inode_gid_change_ok(inode, attr->ia_gid))
+ return -EPERM;
+ }
/* Make sure a caller can chmod. */
if (ia_valid & ATTR_MODE) {
- if (!inode_owner_or_capable(inode))
+ if (!inode_owner_permitted_or_capable(inode, MAY_CHMOD))
return -EPERM;
/* Also check the setgid bit! */
if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
@@ -69,7 +117,7 @@ int inode_change_ok(struct inode *inode, struct iattr *attr)
/* Check for setting the inode time. */
if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET | ATTR_TIMES_SET)) {
- if (!inode_owner_or_capable(inode))
+ if (!inode_owner_permitted_or_capable(inode, MAY_SET_TIMES))
return -EPERM;
}
diff --git a/fs/namei.c b/fs/namei.c
index 044b6d1..de8c7d3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -174,7 +174,7 @@ void putname(const char *name)
EXPORT_SYMBOL(putname);
#endif
-static int check_acl(struct inode *inode, int mask)
+int check_acl(struct inode *inode, int mask)
{
#ifdef CONFIG_FS_POSIX_ACL
struct posix_acl *acl;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 724a4f4..ac1d8e5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -71,6 +71,9 @@ struct inodes_stat_t {
#define MAY_CREATE_DIR 0x00000200
#define MAY_DELETE_CHILD 0x00000400
#define MAY_DELETE_SELF 0x00000800
+#define MAY_TAKE_OWNERSHIP 0x00001000
+#define MAY_CHMOD 0x00002000
+#define MAY_SET_TIMES 0x00004000
/*
* flags in file.f_mode. Note that FMODE_READ and FMODE_WRITE must correspond
@@ -2232,6 +2235,7 @@ extern sector_t bmap(struct inode *, sector_t);
extern int notify_change(struct dentry *, struct iattr *);
extern int inode_permission(struct inode *, int);
extern int generic_permission(struct inode *, int);
+extern int check_acl(struct inode *, int);
static inline bool execute_ok(struct inode *inode)
{
--
1.7.5.4
^ permalink raw reply related
* [PATCH -V8 10/26] vfs: Make the inode passed to inode_change_ok non-const
From: Aneesh Kumar K.V @ 2011-10-23 17:43 UTC (permalink / raw)
To: agruen, bfields, akpm, viro, dhowells
Cc: aneesh.kumar, linux-fsdevel, linux-nfs, linux-kernel
In-Reply-To: <1319391835-5829-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: Andreas Gruenbacher <agruen@kernel.org>
We will need to call iop->permission and iop->get_acl from
inode_change_ok() for additional permission checks, and both take a
non-const inode.
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/attr.c | 2 +-
include/linux/fs.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/attr.c b/fs/attr.c
index 538e279..f15e9e3 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -26,7 +26,7 @@
* Should be called as the first thing in ->setattr implementations,
* possibly after taking additional locks.
*/
-int inode_change_ok(const struct inode *inode, struct iattr *attr)
+int inode_change_ok(struct inode *inode, struct iattr *attr)
{
unsigned int ia_valid = attr->ia_valid;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ccece40..724a4f4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2566,7 +2566,7 @@ extern int buffer_migrate_page(struct address_space *,
#define buffer_migrate_page NULL
#endif
-extern int inode_change_ok(const struct inode *, struct iattr *);
+extern int inode_change_ok(struct inode *, struct iattr *);
extern int inode_newsize_ok(const struct inode *, loff_t offset);
extern void setattr_copy(struct inode *inode, const struct iattr *attr);
--
1.7.5.4
^ permalink raw reply related
* [PATCH -V8 09/26] vfs: Add delete child and delete self permission flags
From: Aneesh Kumar K.V @ 2011-10-23 17:43 UTC (permalink / raw)
To: agruen, bfields, akpm, viro, dhowells
Cc: aneesh.kumar, linux-fsdevel, linux-nfs, linux-kernel
In-Reply-To: <1319391835-5829-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: Andreas Gruenbacher <agruen@kernel.org>
Normally, deleting a file requires write access to the parent directory.
Some permission models use a different permission on the parent
directory to indicate delete access. In addition, a process can have
per-file delete access even without delete access on the parent
directory.
Introduce two new inode_permission() mask flags and use them in
may_delete()
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/namei.c | 53 +++++++++++++++++++++++++++++++++++++--------------
include/linux/fs.h | 2 +
2 files changed, 40 insertions(+), 15 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index f6184b8..044b6d1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -337,7 +337,7 @@ static inline int do_inode_permission(struct inode *inode, int mask)
* are used for other things.
*
* When checking for MAY_APPEND, MAY_CREATE_FILE, MAY_CREATE_DIR,
- * MAY_WRITE must also be set in @mask.
+ * MAY_DELETE_CHILD, MAY_DELETE_SELF, MAY_WRITE must also be set in @mask.
*/
int inode_permission(struct inode *inode, int mask)
{
@@ -1835,11 +1835,25 @@ static int user_path_parent(int dfd, const char __user *path,
return error;
}
+
+/*
+ * We should have exec permission on directory and MAY_DELETE_SELF
+ * on the object being deleted.
+ */
+static int richacl_may_selfdelete(struct inode *dir,
+ struct inode *inode, int replace_mask)
+{
+ return (IS_RICHACL(inode) &&
+ (inode_permission(dir, MAY_EXEC | replace_mask) == 0) &&
+ (inode_permission(inode, MAY_DELETE_SELF) == 0));
+}
+
/*
* It's inline, so penalty for filesystems that don't use sticky bit is
* minimal.
*/
-static inline int check_sticky(struct inode *dir, struct inode *inode)
+static inline int check_sticky(struct inode *dir,
+ struct inode *inode, int replace_mask)
{
uid_t fsuid = current_fsuid();
@@ -1851,7 +1865,8 @@ static inline int check_sticky(struct inode *dir, struct inode *inode)
return 0;
if (dir->i_uid == fsuid)
return 0;
-
+ if (richacl_may_selfdelete(dir, inode, replace_mask))
+ return 0;
other_userns:
return !ns_capable(inode_userns(inode), CAP_FOWNER);
}
@@ -1875,30 +1890,38 @@ other_userns:
* 10. We don't allow removal of NFS sillyrenamed files; it's handled by
* nfs_async_unlink().
*/
-static int may_delete(struct inode *dir,struct dentry *victim,int isdir)
+static int may_delete(struct inode *dir, struct dentry *victim,
+ int isdir, int replace)
{
- int error;
+ int mask, replace_mask = 0, error;
+ struct inode *inode = victim->d_inode;
- if (!victim->d_inode)
+ if (!inode)
return -ENOENT;
BUG_ON(victim->d_parent->d_inode != dir);
audit_inode_child(victim, dir);
- error = inode_permission(dir, MAY_WRITE | MAY_EXEC);
+ mask = MAY_WRITE | MAY_EXEC | MAY_DELETE_CHILD;
+ if (replace)
+ replace_mask = S_ISDIR(inode->i_mode) ?
+ MAY_CREATE_DIR : MAY_CREATE_FILE;
+ error = inode_permission(dir, mask | replace_mask);
+ if (error && richacl_may_selfdelete(dir, inode, replace_mask))
+ error = 0;
if (error)
return error;
if (IS_APPEND(dir))
return -EPERM;
- if (check_sticky(dir, victim->d_inode)||IS_APPEND(victim->d_inode)||
- IS_IMMUTABLE(victim->d_inode) || IS_SWAPFILE(victim->d_inode))
+ if (check_sticky(dir, inode, replace_mask) || IS_APPEND(inode) ||
+ IS_IMMUTABLE(inode) || IS_SWAPFILE(inode))
return -EPERM;
if (isdir) {
- if (!S_ISDIR(victim->d_inode->i_mode))
+ if (!S_ISDIR(inode->i_mode))
return -ENOTDIR;
if (IS_ROOT(victim))
return -EBUSY;
- } else if (S_ISDIR(victim->d_inode->i_mode))
+ } else if (S_ISDIR(inode->i_mode))
return -EISDIR;
if (IS_DEADDIR(dir))
return -ENOENT;
@@ -2605,7 +2628,7 @@ void dentry_unhash(struct dentry *dentry)
int vfs_rmdir(struct inode *dir, struct dentry *dentry)
{
- int error = may_delete(dir, dentry, 1);
+ int error = may_delete(dir, dentry, 1, 0);
if (error)
return error;
@@ -2700,7 +2723,7 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname)
int vfs_unlink(struct inode *dir, struct dentry *dentry)
{
- int error = may_delete(dir, dentry, 0);
+ int error = may_delete(dir, dentry, 0, 0);
if (error)
return error;
@@ -3096,14 +3119,14 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
if (old_dentry->d_inode == new_dentry->d_inode)
return 0;
- error = may_delete(old_dir, old_dentry, is_dir);
+ error = may_delete(old_dir, old_dentry, is_dir, 0);
if (error)
return error;
if (!new_dentry->d_inode)
error = may_create(new_dir, new_dentry, is_dir);
else
- error = may_delete(new_dir, new_dentry, is_dir);
+ error = may_delete(new_dir, new_dentry, is_dir, 1);
if (error)
return error;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 60361c6..ccece40 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -69,6 +69,8 @@ struct inodes_stat_t {
#define MAY_NOT_BLOCK 0x00000080
#define MAY_CREATE_FILE 0x00000100
#define MAY_CREATE_DIR 0x00000200
+#define MAY_DELETE_CHILD 0x00000400
+#define MAY_DELETE_SELF 0x00000800
/*
* flags in file.f_mode. Note that FMODE_READ and FMODE_WRITE must correspond
--
1.7.5.4
^ permalink raw reply related
* [PATCH -V8 08/26] vfs: Add new file and directory create permission flags
From: Aneesh Kumar K.V @ 2011-10-23 17:43 UTC (permalink / raw)
To: agruen, bfields, akpm, viro, dhowells
Cc: aneesh.kumar, linux-fsdevel, linux-nfs, linux-kernel
In-Reply-To: <1319391835-5829-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: Andreas Gruenbacher <agruen@kernel.org>
Some permission models distinguish between the permission to create a
non-directory and a directory. Pass this information down to
inode_permission() as mask flags
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/namei.c | 26 +++++++++++++++-----------
include/linux/fs.h | 2 ++
2 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index cf8b2f0..f6184b8 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -336,7 +336,8 @@ static inline int do_inode_permission(struct inode *inode, int mask)
* for filesystem access without changing the "normal" uids which
* are used for other things.
*
- * When checking for MAY_APPEND, MAY_WRITE must also be set in @mask.
+ * When checking for MAY_APPEND, MAY_CREATE_FILE, MAY_CREATE_DIR,
+ * MAY_WRITE must also be set in @mask.
*/
int inode_permission(struct inode *inode, int mask)
{
@@ -1914,13 +1915,15 @@ static int may_delete(struct inode *dir,struct dentry *victim,int isdir)
* 3. We should have write and exec permissions on dir
* 4. We can't do it if dir is immutable (done in permission())
*/
-static inline int may_create(struct inode *dir, struct dentry *child)
+static inline int may_create(struct inode *dir, struct dentry *child, int isdir)
{
+ int mask = isdir ? MAY_CREATE_DIR : MAY_CREATE_FILE;
+
if (child->d_inode)
return -EEXIST;
if (IS_DEADDIR(dir))
return -ENOENT;
- return inode_permission(dir, MAY_WRITE | MAY_EXEC);
+ return inode_permission(dir, MAY_WRITE | MAY_EXEC | mask);
}
/*
@@ -1968,7 +1971,7 @@ void unlock_rename(struct dentry *p1, struct dentry *p2)
int vfs_create(struct inode *dir, struct dentry *dentry, int mode,
struct nameidata *nd)
{
- int error = may_create(dir, dentry);
+ int error = may_create(dir, dentry, 0);
if (error)
return error;
@@ -2427,7 +2430,7 @@ EXPORT_SYMBOL(user_path_create);
int vfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
{
- int error = may_create(dir, dentry);
+ int error = may_create(dir, dentry, 0);
if (error)
return error;
@@ -2524,7 +2527,7 @@ SYSCALL_DEFINE3(mknod, const char __user *, filename, int, mode, unsigned, dev)
int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
{
- int error = may_create(dir, dentry);
+ int error = may_create(dir, dentry, 1);
if (error)
return error;
@@ -2806,7 +2809,7 @@ SYSCALL_DEFINE1(unlink, const char __user *, pathname)
int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname)
{
- int error = may_create(dir, dentry);
+ int error = may_create(dir, dentry, 0);
if (error)
return error;
@@ -2872,7 +2875,10 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de
if (!inode)
return -ENOENT;
- error = may_create(dir, new_dentry);
+ if (S_ISDIR(inode->i_mode))
+ return -EPERM;
+
+ error = may_create(dir, new_dentry, 0);
if (error)
return error;
@@ -2886,8 +2892,6 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de
return -EPERM;
if (!dir->i_op->link)
return -EPERM;
- if (S_ISDIR(inode->i_mode))
- return -EPERM;
error = security_inode_link(old_dentry, dir, new_dentry);
if (error)
@@ -3097,7 +3101,7 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
return error;
if (!new_dentry->d_inode)
- error = may_create(new_dir, new_dentry);
+ error = may_create(new_dir, new_dentry, is_dir);
else
error = may_delete(new_dir, new_dentry, is_dir);
if (error)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f3ebf86..60361c6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -67,6 +67,8 @@ struct inodes_stat_t {
#define MAY_CHDIR 0x00000040
/* called from RCU mode, don't block */
#define MAY_NOT_BLOCK 0x00000080
+#define MAY_CREATE_FILE 0x00000100
+#define MAY_CREATE_DIR 0x00000200
/*
* flags in file.f_mode. Note that FMODE_READ and FMODE_WRITE must correspond
--
1.7.5.4
^ permalink raw reply related
* [PATCH -V8 07/26] vfs: Optimize out IS_RICHACL() if CONFIG_FS_RICHACL is not defined
From: Aneesh Kumar K.V @ 2011-10-23 17:43 UTC (permalink / raw)
To: agruen, bfields, akpm, viro, dhowells
Cc: aneesh.kumar, linux-fsdevel, linux-nfs, linux-kernel
In-Reply-To: <1319391835-5829-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: Andreas Gruenbacher <agruen@kernel.org>
if CONFIG_FS_RICHACL is not defined optimize out
the ACL check function.
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/Kconfig | 3 +++
include/linux/fs.h | 5 +++++
2 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/fs/Kconfig b/fs/Kconfig
index 9fe0b34..7939190 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -46,6 +46,9 @@ endif # BLOCK
config FS_POSIX_ACL
def_bool n
+config FS_RICHACL
+ def_bool n
+
config EXPORTFS
tristate
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7b4bfe6..f3ebf86 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -271,7 +271,12 @@ struct inodes_stat_t {
#define IS_APPEND(inode) ((inode)->i_flags & S_APPEND)
#define IS_IMMUTABLE(inode) ((inode)->i_flags & S_IMMUTABLE)
#define IS_POSIXACL(inode) __IS_FLG(inode, MS_POSIXACL)
+
+#ifdef CONFIG_FS_RICHACL
#define IS_RICHACL(inode) __IS_FLG(inode, MS_RICHACL)
+#else
+#define IS_RICHACL(inode) 0
+#endif
#define IS_DEADDIR(inode) ((inode)->i_flags & S_DEAD)
#define IS_NOCMTIME(inode) ((inode)->i_flags & S_NOCMTIME)
--
1.7.5.4
^ permalink raw reply related
* [PATCH -V8 06/26] vfs: Add IS_RICHACL() test for richacl support
From: Aneesh Kumar K.V @ 2011-10-23 17:43 UTC (permalink / raw)
To: agruen, bfields, akpm, viro, dhowells
Cc: aneesh.kumar, linux-fsdevel, linux-nfs, linux-kernel
In-Reply-To: <1319391835-5829-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: Andreas Gruenbacher <agruen@kernel.org>
Introduce a new MS_RICHACL super-block flag and a new IS_RICHACL() test
which file systems like nfs can use. IS_ACL() is true if IS_POSIXACL()
or IS_RICHACL() is true.
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
include/linux/fs.h | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1994b84..7b4bfe6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -210,6 +210,7 @@ struct inodes_stat_t {
#define MS_KERNMOUNT (1<<22) /* this is a kern_mount call */
#define MS_I_VERSION (1<<23) /* Update inode I_version field */
#define MS_STRICTATIME (1<<24) /* Always perform atime updates */
+#define MS_RICHACL (1<<25) /* Supports richacls */
#define MS_NOSEC (1<<28)
#define MS_BORN (1<<29)
#define MS_ACTIVE (1<<30)
@@ -270,6 +271,7 @@ struct inodes_stat_t {
#define IS_APPEND(inode) ((inode)->i_flags & S_APPEND)
#define IS_IMMUTABLE(inode) ((inode)->i_flags & S_IMMUTABLE)
#define IS_POSIXACL(inode) __IS_FLG(inode, MS_POSIXACL)
+#define IS_RICHACL(inode) __IS_FLG(inode, MS_RICHACL)
#define IS_DEADDIR(inode) ((inode)->i_flags & S_DEAD)
#define IS_NOCMTIME(inode) ((inode)->i_flags & S_NOCMTIME)
@@ -283,7 +285,7 @@ struct inodes_stat_t {
* IS_ACL() tells the VFS to not apply the umask
* and use check_acl for acl permission checks when defined.
*/
-#define IS_ACL(inode) __IS_FLG(inode, MS_POSIXACL)
+#define IS_ACL(inode) __IS_FLG(inode, MS_POSIXACL | MS_RICHACL)
/* the read-only stuff doesn't really belong here, but any other place is
probably as bad and I don't want to create yet another include file. */
--
1.7.5.4
^ permalink raw reply related
* [PATCH -V8 05/26] vfs: Add generic IS_ACL() test for acl support
From: Aneesh Kumar K.V @ 2011-10-23 17:43 UTC (permalink / raw)
To: agruen, bfields, akpm, viro, dhowells
Cc: aneesh.kumar, linux-fsdevel, linux-nfs, linux-kernel
In-Reply-To: <1319391835-5829-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: Andreas Gruenbacher <agruen@kernel.org>
When IS_POSIXACL() is true, the vfs does not apply the umask. Other acl
models will need the same exception, so introduce a separate IS_ACL()
test.
The IS_POSIX_ACL() test is still needed so that nfsd can determine when
the underlying file system supports POSIX ACLs (as opposed to some other
kind).
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/namei.c | 6 +++---
include/linux/fs.h | 8 +++++++-
2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 9061157..cf8b2f0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2166,7 +2166,7 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
/* Negative dentry, just create the file */
if (!dentry->d_inode) {
int mode = op->mode;
- if (!IS_POSIXACL(dir->d_inode))
+ if (!IS_ACL(dir->d_inode))
mode &= ~current_umask();
/*
* This write is needed to ensure that a
@@ -2484,7 +2484,7 @@ SYSCALL_DEFINE4(mknodat, int, dfd, const char __user *, filename, int, mode,
if (IS_ERR(dentry))
return PTR_ERR(dentry);
- if (!IS_POSIXACL(path.dentry->d_inode))
+ if (!IS_ACL(path.dentry->d_inode))
mode &= ~current_umask();
error = may_mknod(mode);
if (error)
@@ -2553,7 +2553,7 @@ SYSCALL_DEFINE3(mkdirat, int, dfd, const char __user *, pathname, int, mode)
if (IS_ERR(dentry))
return PTR_ERR(dentry);
- if (!IS_POSIXACL(path.dentry->d_inode))
+ if (!IS_ACL(path.dentry->d_inode))
mode &= ~current_umask();
error = mnt_want_write(path.mnt);
if (error)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c1884e9..1994b84 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -201,7 +201,7 @@ struct inodes_stat_t {
#define MS_VERBOSE 32768 /* War is peace. Verbosity is silence.
MS_VERBOSE is deprecated. */
#define MS_SILENT 32768
-#define MS_POSIXACL (1<<16) /* VFS does not apply the umask */
+#define MS_POSIXACL (1<<16) /* Supports POSIX ACLs */
#define MS_UNBINDABLE (1<<17) /* change to unbindable */
#define MS_PRIVATE (1<<18) /* change to private */
#define MS_SLAVE (1<<19) /* change to slave */
@@ -279,6 +279,12 @@ struct inodes_stat_t {
#define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT)
#define IS_NOSEC(inode) ((inode)->i_flags & S_NOSEC)
+/*
+ * IS_ACL() tells the VFS to not apply the umask
+ * and use check_acl for acl permission checks when defined.
+ */
+#define IS_ACL(inode) __IS_FLG(inode, MS_POSIXACL)
+
/* the read-only stuff doesn't really belong here, but any other place is
probably as bad and I don't want to create yet another include file. */
--
1.7.5.4
^ permalink raw reply related
* [PATCH -V8 04/26] vfs: Add a comment to inode_permission()
From: Aneesh Kumar K.V @ 2011-10-23 17:43 UTC (permalink / raw)
To: agruen, bfields, akpm, viro, dhowells
Cc: aneesh.kumar, linux-fsdevel, linux-nfs, linux-kernel
In-Reply-To: <1319391835-5829-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: Andreas Gruenbacher <agruen@kernel.org>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/namei.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 276cd30..9061157 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -221,7 +221,7 @@ static int check_acl(struct inode *inode, int mask)
}
/*
- * This does basic POSIX ACL permission checking
+ * This does the basic permission checking
*/
static int acl_permission_check(struct inode *inode, int mask)
{
@@ -271,7 +271,7 @@ int generic_permission(struct inode *inode, int mask)
int ret;
/*
- * Do the basic POSIX ACL permission checks.
+ * Do the basic permission checks.
*/
ret = acl_permission_check(inode, mask);
if (ret != -EACCES)
@@ -335,6 +335,8 @@ static inline int do_inode_permission(struct inode *inode, int mask)
* We use "fsuid" for this, letting us set arbitrary permissions
* for filesystem access without changing the "normal" uids which
* are used for other things.
+ *
+ * When checking for MAY_APPEND, MAY_WRITE must also be set in @mask.
*/
int inode_permission(struct inode *inode, int mask)
{
--
1.7.5.4
^ permalink raw reply related
* [PATCH -V8 02/26] vfs: Add hex format for MAY_* flag values
From: Aneesh Kumar K.V @ 2011-10-23 17:43 UTC (permalink / raw)
To: agruen, bfields, akpm, viro, dhowells
Cc: aneesh.kumar, linux-fsdevel, linux-nfs, linux-kernel
In-Reply-To: <1319391835-5829-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
We are going to add more flags and having them in hex format
make it simpler
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
include/linux/fs.h | 17 +++++++++--------
1 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 277f497..c1884e9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -58,14 +58,15 @@ struct inodes_stat_t {
#define NR_FILE 8192 /* this can well be larger on a larger system */
-#define MAY_EXEC 1
-#define MAY_WRITE 2
-#define MAY_READ 4
-#define MAY_APPEND 8
-#define MAY_ACCESS 16
-#define MAY_OPEN 32
-#define MAY_CHDIR 64
-#define MAY_NOT_BLOCK 128 /* called from RCU mode, don't block */
+#define MAY_EXEC 0x00000001
+#define MAY_WRITE 0x00000002
+#define MAY_READ 0x00000004
+#define MAY_APPEND 0x00000008
+#define MAY_ACCESS 0x00000010
+#define MAY_OPEN 0x00000020
+#define MAY_CHDIR 0x00000040
+/* called from RCU mode, don't block */
+#define MAY_NOT_BLOCK 0x00000080
/*
* flags in file.f_mode. Note that FMODE_READ and FMODE_WRITE must correspond
--
1.7.5.4
^ permalink raw reply related
* [PATCH -V8 01/26] vfs: Indicate that the permission functions take all the MAY_* flags
From: Aneesh Kumar K.V @ 2011-10-23 17:43 UTC (permalink / raw)
To: agruen, bfields, akpm, viro, dhowells
Cc: aneesh.kumar, linux-fsdevel, linux-nfs, linux-kernel
In-Reply-To: <1319391835-5829-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: Andreas Gruenbacher <agruen@kernel.org>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/namei.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 0b3138d..2a4574f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -257,7 +257,7 @@ other_perms:
/**
* generic_permission - check for access rights on a Posix-like filesystem
* @inode: inode to check access rights for
- * @mask: right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
+ * @mask: right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC, ...)
*
* Used to check for read/write/execute permissions on a file.
* We use "fsuid" for this, letting us set arbitrary permissions
@@ -331,7 +331,7 @@ static inline int do_inode_permission(struct inode *inode, int mask)
/**
* inode_permission - check for access rights to a given inode
* @inode: inode to check permission on
- * @mask: right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
+ * @mask: right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC, ...)
*
* Used to check for read/write/execute permissions on an inode.
* We use "fsuid" for this, letting us set arbitrary permissions
--
1.7.5.4
^ permalink raw reply related
* [PATCH -V8 00/26] New ACL format for better NFSv4 acl interoperability
From: Aneesh Kumar K.V @ 2011-10-23 17:43 UTC (permalink / raw)
To: agruen, bfields, akpm, viro, dhowells
Cc: aneesh.kumar, linux-fsdevel, linux-nfs, linux-kernel
Hi,
The following set of patches implements VFS and ext4 changes needed to implement
a new acl model for linux. Rich ACLs are an implementation of NFSv4 ACLs,
extended by file masks to fit into the standard POSIX file permission model.
They are designed to work seamlessly locally as well as across the NFSv4 and
CIFS/SMB2 network file system protocols.
A user-space utility for displaying and changing richacls is available at [4]
(a number of examples can be found at http://acl.bestbits.at/richacl/examples.html).
[4] git://github.com/kvaneesh/richacl-tools.git master
To test richacl on ext4 use tune2fs -O richacl to enable richacl feature and mount
the file system using -o acl mount option.
More details regarding richacl can be found at
http://acl.bestbits.at/richacl/
Changes from v7:
a) Update patches based on review comments.
b) Add acked-by:
c) Change the richacl xattr format based on review feedback.
git repository With all the patches can be found at
git://github.com/kvaneesh/linux.git richacl
IMHO the patches are ready to be merged upstream. How do we push these changes
to Linus tree ? Andrew, Viro, any comment on how we can get this merged upstream ?
-aneesh
In-Reply-To:
^ permalink raw reply
* [Bug 42131] New: Problem with resizing OpenGL windows when using XCB
From: bugzilla-daemon @ 2011-10-23 17:40 UTC (permalink / raw)
To: dri-devel
https://bugs.freedesktop.org/show_bug.cgi?id=42131
Bug #: 42131
Summary: Problem with resizing OpenGL windows when using XCB
Classification: Unclassified
Product: Mesa
Version: 7.11
Platform: x86-64 (AMD64)
OS/Version: Linux (All)
Status: NEW
Severity: normal
Priority: medium
Component: Drivers/Gallium/r600
AssignedTo: dri-devel@lists.freedesktop.org
ReportedBy: nilschrbrause@googlemail.com
Created attachment 52651
--> https://bugs.freedesktop.org/attachment.cgi?id=52651
Test program demonstrating the problem
The attached program creates a window using XCB and draws a rectangle inside it
using OpenGL. When using indirect software rendering (LIBGL_ALWAYS_INDIRECT=1),
everything works as expected. But when using the R600 Gallium3D driver, the
OpenGL view port doesn't seem to get resized properly and the window contents
get screwed up.
--
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
^ permalink raw reply
* how to check kernel is configured with preemption or not
From: sri @ 2011-10-23 17:34 UTC (permalink / raw)
To: kernelnewbies
In-Reply-To: <CAEnQRZDKw3takWdFhp+b9PnxV+RJ_qeKJJW9Kn2Xr9mNxFb8MQ@mail.gmail.com>
No, uname did not show anything.
Is there any way to get the kernel preemption mode, programatically?
Thanks,
--Sri
On Fri, Oct 21, 2011 at 6:41 PM, Daniel Baluta <daniel.baluta@gmail.com>wrote:
> On Fri, Oct 21, 2011 at 2:28 PM, sri <bskmohan@gmail.com> wrote:
> > Hi,
> >
> > Am using kernel 2.6.18-195(centos 5.5).
> > My kernel configs have CONFIG_PREEMPT_NONE=7 and
> "CONFIG_PREEMPT_VOLUNTERY
> > is not set".
> > How to check that preemption is really in place?
> > Is there any way to check my kernel is configured with what preemption
> > levels?
>
> Hmm, uname -a?
>
--
--
Krishna Mohan B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20111023/21ddac32/attachment.html
^ permalink raw reply
* [U-Boot] debugX macro
From: Wolfgang Denk @ 2011-10-23 17:23 UTC (permalink / raw)
To: u-boot
In-Reply-To: <201110231646.59439.marek.vasut@gmail.com>
Dear Marek Vasut,
In message <201110231646.59439.marek.vasut@gmail.com> you wrote:
>
> I've been doing the debug() cleanup and found the debugX() macro is used only in
> very few patches. Maybe punting it altogether won't hurt.
>
> The following do use it:
> ./board/spc1920/hpi.c
> ./drivers/mtd/nand/s3c2410_nand.c
>
> Opinions?
Dump it!
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
"Beware of programmers carrying screwdrivers." - Chip Salzenberg
^ permalink raw reply
* [U-Boot] [PATCH] Update s3c24x0 timer implementation
From: Wolfgang Denk @ 2011-10-23 17:22 UTC (permalink / raw)
To: u-boot
In-Reply-To: <CAJtrzLOvjswmuww+xNLXxeSH4QxOBZww3JrZGKCgLq5w6B=y3w@mail.gmail.com>
Dear Mark Norman,
In message <CAJtrzLOvjswmuww+xNLXxeSH4QxOBZww3JrZGKCgLq5w6B=y3w@mail.gmail.com> you wrote:
>
> Since the .rel.text section is required by the relocation code, I
> assume that .bss global variables cannot be used until after
> relocation?
Why do you have to make such assumptions? That's documented
behaviour. Didn't you RTFM?
> After studying several other timer.c files I developed the following
> patch which uses the global data struct to store the global variables.
> I also restructured some of the code based on structure of the other
> timer.c files. I have confirmed it works correctly on the SBC I have.
Then please submit a proper patch - these introductury comments don;lt
belong intot he commit message and shouldbemoved into the comment
section (below the "---" line).
...
> + /* Use PWM Timer 4 because it has no output.
> + * Prescaler is hard fixed at 250, divider at 2.
> + * This generates a Timer clock frequency of 100kHz (@PCLK=50MHz) and
> + * therefore 10us timer ticks.
> + */
Incorrect multiline comment format; please fix globally.
> + /* Prescaler for Timer 4 is 250 */
> + const ulong prescaler = 250;
> + writel((prescaler-1) << 8, &timers->tcfg0);
Please move declarations up. Don't split declarations by comment
lines. Add a blank line between declarations and code.
...
> --- a/arch/arm/include/asm/global_data.h
> +++ b/arch/arm/include/asm/global_data.h
> @@ -38,9 +38,6 @@ typedef struct global_data {
> unsigned long flags;
> unsigned long baudrate;
> unsigned long have_console; /* serial_init() was called */
> -#ifdef CONFIG_PRE_CONSOLE_BUFFER
> - unsigned long precon_buf_idx; /* Pre-Console buffer index */
> -#endif
Make sure not to add such unrelated, incorrect changes!
> -#if defined(CONFIG_POST) || defined(CONFIG_LOGBUFFER)
> - unsigned long post_log_word; /* Record POST activities */
> - unsigned long post_log_res; /* success of POST test */
> - unsigned long post_init_f_time; /* When post_init_f started */
> -#endif
Ditto.
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
I think there's a world market for about five computers.
-- attr. Thomas J. Watson (Chairman of the Board, IBM), 1943
^ permalink raw reply
* Re: [patch net-next V3] net: introduce ethernet teaming device
From: Jiri Pirko @ 2011-10-23 17:21 UTC (permalink / raw)
To: netdev; +Cc: davem
In-Reply-To: <1319359253-1328-1-git-send-email-jpirko@redhat.com>
Please scratch this. V4 will follow up.
Sun, Oct 23, 2011 at 10:40:53AM CEST, jpirko@redhat.com wrote:
>This patch introduces new network device called team. It supposes to be
>very fast, simple, userspace-driven alternative to existing bonding
>driver.
>
>Userspace library called libteam with couple of demo apps is available
>here:
>https://github.com/jpirko/libteam
>Note it's still in its dipers atm.
>
>team<->libteam use generic netlink for communication. That and rtnl
>suppose to be the only way to configure team device, no sysfs etc.
>
>Python binding basis for libteam was recently introduced (some need
>still need to be done on it though). Daemon providing arpmon/miimon
>active-backup functionality will be introduced shortly.
>All what's necessary is already implemented in kernel team driver.
>
>Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>
>v2->v3:
> - team_change_mtu() user rcu version of list traversal to unwind
> - set and clear of mode_ops happens per pointer, not per byte
> - port hashlist changed to be embedded into team structure
> - error branch in team_port_enter() does cleanup now
> - fixed rtln->rtnl
>
>v1->v2:
> - modes are made as modules. Makes team more modular and
> extendable.
> - several commenters' nitpicks found on v1 were fixed
> - several other bugs were fixed.
> - note I ignored Eric's comment about roundrobin port selector
> as Eric's way may be easily implemented as another mode (mode
> "random") in future.
>---
> Documentation/networking/team.txt | 2 +
> MAINTAINERS | 7 +
> drivers/net/Kconfig | 2 +
> drivers/net/Makefile | 1 +
> drivers/net/team/Kconfig | 38 +
> drivers/net/team/Makefile | 7 +
> drivers/net/team/team.c | 1574 +++++++++++++++++++++++++++++
> drivers/net/team/team_mode_activebackup.c | 152 +++
> drivers/net/team/team_mode_roundrobin.c | 107 ++
> include/linux/Kbuild | 1 +
> include/linux/if.h | 1 +
> include/linux/if_team.h | 254 +++++
> include/linux/rculist.h | 14 +
> 13 files changed, 2160 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/networking/team.txt
> create mode 100644 drivers/net/team/Kconfig
> create mode 100644 drivers/net/team/Makefile
> create mode 100644 drivers/net/team/team.c
> create mode 100644 drivers/net/team/team_mode_activebackup.c
> create mode 100644 drivers/net/team/team_mode_roundrobin.c
> create mode 100644 include/linux/if_team.h
>
>diff --git a/Documentation/networking/team.txt b/Documentation/networking/team.txt
>new file mode 100644
>index 0000000..5a01368
>--- /dev/null
>+++ b/Documentation/networking/team.txt
>@@ -0,0 +1,2 @@
>+Team devices are driven from userspace via libteam library which is here:
>+ https://github.com/jpirko/libteam
>diff --git a/MAINTAINERS b/MAINTAINERS
>index 5008b08..c33400d 100644
>--- a/MAINTAINERS
>+++ b/MAINTAINERS
>@@ -6372,6 +6372,13 @@ W: http://tcp-lp-mod.sourceforge.net/
> S: Maintained
> F: net/ipv4/tcp_lp.c
>
>+TEAM DRIVER
>+M: Jiri Pirko <jpirko@redhat.com>
>+L: netdev@vger.kernel.org
>+S: Supported
>+F: drivers/net/team/
>+F: include/linux/if_team.h
>+
> TEGRA SUPPORT
> M: Colin Cross <ccross@android.com>
> M: Erik Gilling <konkers@android.com>
>diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
>index 583f66c..b3020be 100644
>--- a/drivers/net/Kconfig
>+++ b/drivers/net/Kconfig
>@@ -125,6 +125,8 @@ config IFB
> 'ifb1' etc.
> Look at the iproute2 documentation directory for usage etc
>
>+source "drivers/net/team/Kconfig"
>+
> config MACVLAN
> tristate "MAC-VLAN support (EXPERIMENTAL)"
> depends on EXPERIMENTAL
>diff --git a/drivers/net/Makefile b/drivers/net/Makefile
>index fa877cd..4e4ebfe 100644
>--- a/drivers/net/Makefile
>+++ b/drivers/net/Makefile
>@@ -17,6 +17,7 @@ obj-$(CONFIG_NET) += Space.o loopback.o
> obj-$(CONFIG_NETCONSOLE) += netconsole.o
> obj-$(CONFIG_PHYLIB) += phy/
> obj-$(CONFIG_RIONET) += rionet.o
>+obj-$(CONFIG_NET_TEAM) += team/
> obj-$(CONFIG_TUN) += tun.o
> obj-$(CONFIG_VETH) += veth.o
> obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
>diff --git a/drivers/net/team/Kconfig b/drivers/net/team/Kconfig
>new file mode 100644
>index 0000000..70a43a6
>--- /dev/null
>+++ b/drivers/net/team/Kconfig
>@@ -0,0 +1,38 @@
>+menuconfig NET_TEAM
>+ tristate "Ethernet team driver support (EXPERIMENTAL)"
>+ depends on EXPERIMENTAL
>+ ---help---
>+ This allows one to create virtual interfaces that teams together
>+ multiple ethernet devices.
>+
>+ Team devices can be added using the "ip" command from the
>+ iproute2 package:
>+
>+ "ip link add link [ address MAC ] [ NAME ] type team"
>+
>+ To compile this driver as a module, choose M here: the module
>+ will be called team.
>+
>+if NET_TEAM
>+
>+config NET_TEAM_MODE_ROUNDROBIN
>+ tristate "Round-robin mode support"
>+ depends on NET_TEAM
>+ ---help---
>+ Basic mode where port used for transmitting packets is selected in
>+ round-robin fashion using packet counter.
>+
>+ To compile this team mode as a module, choose M here: the module
>+ will be called team_mode_roundrobin.
>+
>+config NET_TEAM_MODE_ACTIVEBACKUP
>+ tristate "Active-backup mode support"
>+ depends on NET_TEAM
>+ ---help---
>+ Only one port is active at a time and the rest of ports are used
>+ for backup.
>+
>+ To compile this team mode as a module, choose M here: the module
>+ will be called team_mode_activebackup.
>+
>+endif # NET_TEAM
>diff --git a/drivers/net/team/Makefile b/drivers/net/team/Makefile
>new file mode 100644
>index 0000000..85f2028
>--- /dev/null
>+++ b/drivers/net/team/Makefile
>@@ -0,0 +1,7 @@
>+#
>+# Makefile for the network team driver
>+#
>+
>+obj-$(CONFIG_NET_TEAM) += team.o
>+obj-$(CONFIG_NET_TEAM_MODE_ROUNDROBIN) += team_mode_roundrobin.o
>+obj-$(CONFIG_NET_TEAM_MODE_ACTIVEBACKUP) += team_mode_activebackup.o
>diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>new file mode 100644
>index 0000000..8004916
>--- /dev/null
>+++ b/drivers/net/team/team.c
>@@ -0,0 +1,1574 @@
>+/*
>+ * net/drivers/team/team.c - Network team device driver
>+ * Copyright (c) 2011 Jiri Pirko <jpirko@redhat.com>
>+ *
>+ * This program is free software; you can redistribute it and/or modify
>+ * it under the terms of the GNU General Public License as published by
>+ * the Free Software Foundation; either version 2 of the License, or
>+ * (at your option) any later version.
>+ */
>+
>+#include <linux/kernel.h>
>+#include <linux/types.h>
>+#include <linux/module.h>
>+#include <linux/init.h>
>+#include <linux/slab.h>
>+#include <linux/rcupdate.h>
>+#include <linux/errno.h>
>+#include <linux/ctype.h>
>+#include <linux/notifier.h>
>+#include <linux/netdevice.h>
>+#include <linux/if_arp.h>
>+#include <linux/socket.h>
>+#include <linux/etherdevice.h>
>+#include <linux/rtnetlink.h>
>+#include <net/rtnetlink.h>
>+#include <net/genetlink.h>
>+#include <net/netlink.h>
>+#include <linux/if_team.h>
>+
>+#define DRV_NAME "team"
>+
>+
>+/**********
>+ * Helpers
>+ **********/
>+
>+#define team_port_exists(dev) (dev->priv_flags & IFF_TEAM_PORT)
>+
>+static struct team_port *team_port_get_rcu(const struct net_device *dev)
>+{
>+ struct team_port *port = rcu_dereference(dev->rx_handler_data);
>+
>+ return team_port_exists(dev) ? port : NULL;
>+}
>+
>+static struct team_port *team_port_get_rtnl(const struct net_device *dev)
>+{
>+ struct team_port *port = rtnl_dereference(dev->rx_handler_data);
>+
>+ return team_port_exists(dev) ? port : NULL;
>+}
>+
>+/*
>+ * Since the ability to change mac address for open port device is tested in
>+ * team_port_add, this function can be called without control of return value
>+ */
>+static int __set_port_mac(struct net_device *port_dev,
>+ const unsigned char *dev_addr)
>+{
>+ struct sockaddr addr;
>+
>+ memcpy(addr.sa_data, dev_addr, ETH_ALEN);
>+ addr.sa_family = ARPHRD_ETHER;
>+ return dev_set_mac_address(port_dev, &addr);
>+}
>+
>+int team_port_set_orig_mac(struct team_port *port)
>+{
>+ return __set_port_mac(port->dev, port->orig.dev_addr);
>+}
>+EXPORT_SYMBOL(team_port_set_orig_mac);
>+
>+int team_port_set_team_mac(struct team_port *port)
>+{
>+ return __set_port_mac(port->dev, port->team->dev->dev_addr);
>+}
>+EXPORT_SYMBOL(team_port_set_team_mac);
>+
>+
>+/*******************
>+ * Options handling
>+ *******************/
>+
>+void team_options_register(struct team *team, struct team_option *option,
>+ size_t option_count)
>+{
>+ int i;
>+
>+ for (i = 0; i < option_count; i++, option++)
>+ list_add_tail(&option->list, &team->option_list);
>+}
>+EXPORT_SYMBOL(team_options_register);
>+
>+static void __team_options_change_check(struct team *team,
>+ struct team_option *changed_option);
>+
>+static void __team_options_unregister(struct team *team,
>+ struct team_option *option,
>+ size_t option_count)
>+{
>+ int i;
>+
>+ for (i = 0; i < option_count; i++, option++)
>+ list_del(&option->list);
>+}
>+
>+void team_options_unregister(struct team *team, struct team_option *option,
>+ size_t option_count)
>+{
>+ __team_options_unregister(team, option, option_count);
>+ __team_options_change_check(team, NULL);
>+}
>+EXPORT_SYMBOL(team_options_unregister);
>+
>+static int team_option_get(struct team *team, struct team_option *option,
>+ void *arg)
>+{
>+ return option->getter(team, arg);
>+}
>+
>+static int team_option_set(struct team *team, struct team_option *option,
>+ void *arg)
>+{
>+ int err;
>+
>+ err = option->setter(team, arg);
>+ if (err)
>+ return err;
>+
>+ __team_options_change_check(team, option);
>+ return err;
>+}
>+
>+/****************
>+ * Mode handling
>+ ****************/
>+
>+static LIST_HEAD(mode_list);
>+static DEFINE_SPINLOCK(mode_list_lock);
>+
>+static struct team_mode *__find_mode(const char *kind)
>+{
>+ struct team_mode *mode;
>+
>+ list_for_each_entry(mode, &mode_list, list) {
>+ if (strcmp(mode->kind, kind) == 0)
>+ return mode;
>+ }
>+ return NULL;
>+}
>+
>+static bool is_good_mode_name(const char *name)
>+{
>+ while (*name != '\0') {
>+ if (!isalpha(*name) && !isdigit(*name) && *name != '_')
>+ return false;
>+ name++;
>+ }
>+ return true;
>+}
>+
>+int team_mode_register(struct team_mode *mode)
>+{
>+ int err = 0;
>+
>+ if (!is_good_mode_name(mode->kind) ||
>+ mode->priv_size > TEAM_MODE_PRIV_SIZE)
>+ return -EINVAL;
>+ spin_lock(&mode_list_lock);
>+ if (__find_mode(mode->kind)) {
>+ err = -EEXIST;
>+ goto unlock;
>+ }
>+ list_add_tail(&mode->list, &mode_list);
>+unlock:
>+ spin_unlock(&mode_list_lock);
>+ return err;
>+}
>+EXPORT_SYMBOL(team_mode_register);
>+
>+int team_mode_unregister(struct team_mode *mode)
>+{
>+ spin_lock(&mode_list_lock);
>+ list_del_init(&mode->list);
>+ spin_unlock(&mode_list_lock);
>+ return 0;
>+}
>+EXPORT_SYMBOL(team_mode_unregister);
>+
>+static struct team_mode *team_mode_get(const char *kind)
>+{
>+ struct team_mode *mode;
>+
>+ spin_lock(&mode_list_lock);
>+ mode = __find_mode(kind);
>+ if (!mode) {
>+ spin_unlock(&mode_list_lock);
>+ request_module("team-mode-%s", kind);
>+ spin_lock(&mode_list_lock);
>+ mode = __find_mode(kind);
>+ }
>+ if (mode)
>+ if (!try_module_get(mode->owner))
>+ mode = NULL;
>+
>+ spin_unlock(&mode_list_lock);
>+ return mode;
>+}
>+
>+static void team_mode_put(const char *kind)
>+{
>+ struct team_mode *mode;
>+
>+ spin_lock(&mode_list_lock);
>+ mode = __find_mode(kind);
>+ BUG_ON(!mode);
>+ module_put(mode->owner);
>+ spin_unlock(&mode_list_lock);
>+}
>+
>+/*
>+ * We can benefit from the fact that it's ensured no port is present
>+ * at the time of mode change.
>+ */
>+static int __team_change_mode(struct team *team,
>+ const struct team_mode *new_mode)
>+{
>+ /* Check if mode was previously set and do cleanup if so */
>+ if (team->mode_kind) {
>+ void (*exit_op)(struct team *team) = team->mode_ops.exit;
>+
>+ /* Clear ops area so no callback is called any longer */
>+ team_mode_ops_clear(&team->mode_ops);
>+
>+ synchronize_rcu();
>+
>+ if (exit_op)
>+ exit_op(team);
>+ team_mode_put(team->mode_kind);
>+ team->mode_kind = NULL;
>+ /* zero private data area */
>+ memset(&team->mode_priv, 0,
>+ sizeof(struct team) - offsetof(struct team, mode_priv));
>+ }
>+
>+ if (!new_mode)
>+ return 0;
>+
>+ if (new_mode->ops->init) {
>+ int err;
>+
>+ err = new_mode->ops->init(team);
>+ if (err)
>+ return err;
>+ }
>+
>+ team->mode_kind = new_mode->kind;
>+ team_mode_ops_copy(&team->mode_ops, new_mode->ops);
>+
>+ return 0;
>+}
>+
>+static int team_change_mode(struct team *team, const char *kind)
>+{
>+ struct team_mode *new_mode;
>+ struct net_device *dev = team->dev;
>+ int err;
>+
>+ if (!list_empty(&team->port_list)) {
>+ netdev_err(dev, "No ports can be present during mode change\n");
>+ return -EBUSY;
>+ }
>+
>+ if (team->mode_kind && strcmp(team->mode_kind, kind) == 0) {
>+ netdev_err(dev, "Unable to change to the same mode the team is in\n");
>+ return -EINVAL;
>+ }
>+
>+ new_mode = team_mode_get(kind);
>+ if (!new_mode) {
>+ netdev_err(dev, "Mode \"%s\" not found\n", kind);
>+ return -EINVAL;
>+ }
>+
>+ err = __team_change_mode(team, new_mode);
>+ if (err) {
>+ netdev_err(dev, "Failed to change to mode \"%s\"\n", kind);
>+ team_mode_put(kind);
>+ return err;
>+ }
>+
>+ netdev_info(dev, "Mode changed to \"%s\"\n", kind);
>+ return 0;
>+}
>+
>+
>+/************************
>+ * Rx path frame handler
>+ ************************/
>+
>+/* note: already called with rcu_read_lock */
>+static rx_handler_result_t team_handle_frame(struct sk_buff **pskb)
>+{
>+ struct sk_buff *skb = *pskb;
>+ struct team_port *port;
>+ struct team *team;
>+ rx_handler_result_t res = RX_HANDLER_ANOTHER;
>+
>+ skb = skb_share_check(skb, GFP_ATOMIC);
>+ if (!skb)
>+ return RX_HANDLER_CONSUMED;
>+
>+ *pskb = skb;
>+
>+ port = team_port_get_rcu(skb->dev);
>+ team = port->team;
>+
>+ if (team->mode_ops.receive)
>+ res = team->mode_ops.receive(team, port, skb);
>+
>+ if (res == RX_HANDLER_ANOTHER) {
>+ struct team_pcpu_stats *pcpu_stats;
>+
>+ pcpu_stats = this_cpu_ptr(team->pcpu_stats);
>+ u64_stats_update_begin(&pcpu_stats->syncp);
>+ pcpu_stats->rx_packets++;
>+ pcpu_stats->rx_bytes += skb->len;
>+ if (skb->pkt_type == PACKET_MULTICAST)
>+ pcpu_stats->rx_multicast++;
>+ u64_stats_update_end(&pcpu_stats->syncp);
>+
>+ skb->dev = team->dev;
>+ } else {
>+ this_cpu_inc(team->pcpu_stats->rx_dropped);
>+ }
>+
>+ return res;
>+}
>+
>+
>+/****************
>+ * Port handling
>+ ****************/
>+
>+static bool team_port_find(const struct team *team,
>+ const struct team_port *port)
>+{
>+ struct team_port *cur;
>+
>+ list_for_each_entry(cur, &team->port_list, list)
>+ if (cur == port)
>+ return true;
>+ return false;
>+}
>+
>+/*
>+ * Add/delete port to the team port list. Write guarded by rtnl_lock.
>+ * Takes care of correct port->index setup (might be racy).
>+ */
>+static void team_port_list_add_port(struct team *team,
>+ struct team_port *port)
>+{
>+ port->index = team->port_count++;
>+ hlist_add_head_rcu(&port->hlist,
>+ team_port_index_hash(team, port->index));
>+ list_add_tail_rcu(&port->list, &team->port_list);
>+}
>+
>+static void __reconstruct_port_hlist(struct team *team, int rm_index)
>+{
>+ int i;
>+ struct team_port *port;
>+
>+ for (i = rm_index + 1; i < team->port_count; i++) {
>+ port = team_get_port_by_index_rcu(team, i);
>+ hlist_del_rcu(&port->hlist);
>+ port->index--;
>+ hlist_add_head_rcu(&port->hlist,
>+ team_port_index_hash(team, port->index));
>+ }
>+}
>+
>+static void team_port_list_del_port(struct team *team,
>+ struct team_port *port)
>+{
>+ int rm_index = port->index;
>+
>+ hlist_del_rcu(&port->hlist);
>+ list_del_rcu(&port->list);
>+ __reconstruct_port_hlist(team, rm_index);
>+ team->port_count--;
>+}
>+
>+#define TEAM_VLAN_FEATURES (NETIF_F_ALL_CSUM | NETIF_F_SG | \
>+ NETIF_F_FRAGLIST | NETIF_F_ALL_TSO | \
>+ NETIF_F_HIGHDMA | NETIF_F_LRO)
>+
>+static void __team_compute_features(struct team *team)
>+{
>+ struct team_port *port;
>+ u32 vlan_features = TEAM_VLAN_FEATURES;
>+ unsigned short max_hard_header_len = ETH_HLEN;
>+
>+ list_for_each_entry(port, &team->port_list, list) {
>+ vlan_features = netdev_increment_features(vlan_features,
>+ port->dev->vlan_features,
>+ TEAM_VLAN_FEATURES);
>+
>+ if (port->dev->hard_header_len > max_hard_header_len)
>+ max_hard_header_len = port->dev->hard_header_len;
>+ }
>+
>+ team->dev->vlan_features = vlan_features;
>+ team->dev->hard_header_len = max_hard_header_len;
>+
>+ netdev_change_features(team->dev);
>+}
>+
>+static void team_compute_features(struct team *team)
>+{
>+ spin_lock(&team->lock);
>+ __team_compute_features(team);
>+ spin_unlock(&team->lock);
>+}
>+
>+static int team_port_enter(struct team *team, struct team_port *port)
>+{
>+ int err = 0;
>+
>+ dev_hold(team->dev);
>+ port->dev->priv_flags |= IFF_TEAM_PORT;
>+ if (team->mode_ops.port_enter) {
>+ err = team->mode_ops.port_enter(team, port);
>+ if (err) {
>+ netdev_err(team->dev, "Device %s failed to enter team mode\n",
>+ port->dev->name);
>+ goto err_port_enter;
>+ }
>+ }
>+
>+ return 0;
>+
>+err_port_enter:
>+ port->dev->priv_flags &= ~IFF_TEAM_PORT;
>+ dev_put(team->dev);
>+
>+ return err;
>+}
>+
>+static void team_port_leave(struct team *team, struct team_port *port)
>+{
>+ if (team->mode_ops.port_leave)
>+ team->mode_ops.port_leave(team, port);
>+ port->dev->priv_flags &= ~IFF_TEAM_PORT;
>+ dev_put(team->dev);
>+}
>+
>+static void __team_port_change_check(struct team_port *port, bool linkup);
>+
>+static int team_port_add(struct team *team, struct net_device *port_dev)
>+{
>+ struct net_device *dev = team->dev;
>+ struct team_port *port;
>+ char *portname = port_dev->name;
>+ char tmp_addr[ETH_ALEN];
>+ int err;
>+
>+ if (port_dev->flags & IFF_LOOPBACK ||
>+ port_dev->type != ARPHRD_ETHER) {
>+ netdev_err(dev, "Device %s is of an unsupported type\n",
>+ portname);
>+ return -EINVAL;
>+ }
>+
>+ if (team_port_exists(port_dev)) {
>+ netdev_err(dev, "Device %s is already a port "
>+ "of a team device\n", portname);
>+ return -EBUSY;
>+ }
>+
>+ if (port_dev->flags & IFF_UP) {
>+ netdev_err(dev, "Device %s is up. Set it down before adding it as a team port\n",
>+ portname);
>+ return -EBUSY;
>+ }
>+
>+ port = kzalloc(sizeof(struct team_port), GFP_KERNEL);
>+ if (!port)
>+ return -ENOMEM;
>+
>+ port->dev = port_dev;
>+ port->team = team;
>+
>+ port->orig.mtu = port_dev->mtu;
>+ err = dev_set_mtu(port_dev, dev->mtu);
>+ if (err) {
>+ netdev_dbg(dev, "Error %d calling dev_set_mtu\n", err);
>+ goto err_set_mtu;
>+ }
>+
>+ memcpy(port->orig.dev_addr, port_dev->dev_addr, ETH_ALEN);
>+ random_ether_addr(tmp_addr);
>+ err = __set_port_mac(port_dev, tmp_addr);
>+ if (err) {
>+ netdev_dbg(dev, "Device %s mac addr set failed\n",
>+ portname);
>+ goto err_set_mac_rand;
>+ }
>+
>+ err = dev_open(port_dev);
>+ if (err) {
>+ netdev_dbg(dev, "Device %s opening failed\n",
>+ portname);
>+ goto err_dev_open;
>+ }
>+
>+ err = team_port_set_orig_mac(port);
>+ if (err) {
>+ netdev_dbg(dev, "Device %s mac addr set failed - Device does not support addr change when it's opened\n",
>+ portname);
>+ goto err_set_mac_opened;
>+ }
>+
>+ err = team_port_enter(team, port);
>+ if (err) {
>+ netdev_err(dev, "Device %s failed to enter team mode\n",
>+ portname);
>+ goto err_port_enter;
>+ }
>+
>+ err = netdev_set_master(port_dev, dev);
>+ if (err) {
>+ netdev_err(dev, "Device %s failed to set master\n", portname);
>+ goto err_set_master;
>+ }
>+
>+ err = netdev_rx_handler_register(port_dev, team_handle_frame,
>+ port);
>+ if (err) {
>+ netdev_err(dev, "Device %s failed to register rx_handler\n",
>+ portname);
>+ goto err_handler_register;
>+ }
>+
>+ team_port_list_add_port(team, port);
>+ __team_compute_features(team);
>+ __team_port_change_check(port, !!netif_carrier_ok(port_dev));
>+
>+ netdev_info(dev, "Port device %s added\n", portname);
>+
>+ return 0;
>+
>+err_handler_register:
>+ netdev_set_master(port_dev, NULL);
>+
>+err_set_master:
>+ team_port_leave(team, port);
>+
>+err_port_enter:
>+err_set_mac_opened:
>+ dev_close(port_dev);
>+
>+err_dev_open:
>+ team_port_set_orig_mac(port);
>+
>+err_set_mac_rand:
>+ dev_set_mtu(port_dev, port->orig.mtu);
>+
>+err_set_mtu:
>+ kfree(port);
>+
>+ return err;
>+}
>+
>+static int team_port_del(struct team *team, struct net_device *port_dev)
>+{
>+ struct net_device *dev = team->dev;
>+ struct team_port *port;
>+ char *portname = port_dev->name;
>+
>+ port = team_port_get_rtnl(port_dev);
>+ if (!port || !team_port_find(team, port)) {
>+ netdev_err(dev, "Device %s does not act as a port of this team\n",
>+ portname);
>+ return -ENOENT;
>+ }
>+
>+ __team_port_change_check(port, false);
>+ team_port_list_del_port(team, port);
>+ netdev_rx_handler_unregister(port_dev);
>+ netdev_set_master(port_dev, NULL);
>+ team_port_leave(team, port);
>+ dev_close(port_dev);
>+ team_port_set_orig_mac(port);
>+ dev_set_mtu(port_dev, port->orig.mtu);
>+ synchronize_rcu();
>+ kfree(port);
>+ netdev_info(dev, "Port device %s removed\n", portname);
>+ __team_compute_features(team);
>+
>+ return 0;
>+}
>+
>+
>+/*****************
>+ * Net device ops
>+ *****************/
>+
>+static const char team_no_mode_kind[] = "*NOMODE*";
>+
>+static int team_mode_option_get(struct team *team, void *arg)
>+{
>+ const char **str = arg;
>+
>+ *str = team->mode_kind ? team->mode_kind : team_no_mode_kind;
>+ return 0;
>+}
>+
>+static int team_mode_option_set(struct team *team, void *arg)
>+{
>+ const char **str = arg;
>+
>+ return team_change_mode(team, *str);
>+}
>+
>+static struct team_option team_options[] = {
>+ {
>+ .name = "mode",
>+ .type = TEAM_OPTION_TYPE_STRING,
>+ .getter = team_mode_option_get,
>+ .setter = team_mode_option_set,
>+ },
>+};
>+
>+static int team_init(struct net_device *dev)
>+{
>+ struct team *team = netdev_priv(dev);
>+ int i;
>+
>+ team->dev = dev;
>+ spin_lock_init(&team->lock);
>+
>+ team->pcpu_stats = alloc_percpu(struct team_pcpu_stats);
>+ if (!team->pcpu_stats)
>+ return -ENOMEM;
>+
>+ for (i = 0; i < TEAM_PORT_HASHENTRIES; i++)
>+ INIT_HLIST_HEAD(&team->port_hlist[i]);
>+ INIT_LIST_HEAD(&team->port_list);
>+
>+ INIT_LIST_HEAD(&team->option_list);
>+ team_options_register(team, team_options, ARRAY_SIZE(team_options));
>+ netif_carrier_off(dev);
>+
>+ return 0;
>+}
>+
>+static void team_uninit(struct net_device *dev)
>+{
>+ struct team *team = netdev_priv(dev);
>+ struct team_port *port;
>+ struct team_port *tmp;
>+
>+ spin_lock(&team->lock);
>+ list_for_each_entry_safe(port, tmp, &team->port_list, list)
>+ team_port_del(team, port->dev);
>+
>+ __team_change_mode(team, NULL); /* cleanup */
>+ __team_options_unregister(team, team_options, ARRAY_SIZE(team_options));
>+ spin_unlock(&team->lock);
>+}
>+
>+static void team_destructor(struct net_device *dev)
>+{
>+ struct team *team = netdev_priv(dev);
>+
>+ free_percpu(team->pcpu_stats);
>+ free_netdev(dev);
>+}
>+
>+static int team_open(struct net_device *dev)
>+{
>+ netif_carrier_on(dev);
>+ return 0;
>+}
>+
>+static int team_close(struct net_device *dev)
>+{
>+ netif_carrier_off(dev);
>+ return 0;
>+}
>+
>+/*
>+ * note: already called with rcu_read_lock
>+ */
>+static netdev_tx_t team_xmit(struct sk_buff *skb, struct net_device *dev)
>+{
>+ struct team *team = netdev_priv(dev);
>+ bool tx_success = false;
>+ unsigned int len = skb->len;
>+
>+ /*
>+ * Ensure transmit function is called only in case there is at least
>+ * one port present.
>+ */
>+ if (likely(!list_empty(&team->port_list) && team->mode_ops.transmit))
>+ tx_success = team->mode_ops.transmit(team, skb);
>+ if (tx_success) {
>+ struct team_pcpu_stats *pcpu_stats;
>+
>+ pcpu_stats = this_cpu_ptr(team->pcpu_stats);
>+ u64_stats_update_begin(&pcpu_stats->syncp);
>+ pcpu_stats->tx_packets++;
>+ pcpu_stats->tx_bytes += len;
>+ u64_stats_update_end(&pcpu_stats->syncp);
>+ } else {
>+ this_cpu_inc(team->pcpu_stats->tx_dropped);
>+ }
>+
>+ return NETDEV_TX_OK;
>+}
>+
>+static void team_change_rx_flags(struct net_device *dev, int change)
>+{
>+ struct team *team = netdev_priv(dev);
>+ struct team_port *port;
>+ int inc;
>+
>+ rcu_read_lock();
>+ list_for_each_entry_rcu(port, &team->port_list, list) {
>+ if (change & IFF_PROMISC) {
>+ inc = dev->flags & IFF_PROMISC ? 1 : -1;
>+ dev_set_promiscuity(port->dev, inc);
>+ }
>+ if (change & IFF_ALLMULTI) {
>+ inc = dev->flags & IFF_ALLMULTI ? 1 : -1;
>+ dev_set_allmulti(port->dev, inc);
>+ }
>+ }
>+ rcu_read_unlock();
>+}
>+
>+static void team_set_rx_mode(struct net_device *dev)
>+{
>+ struct team *team = netdev_priv(dev);
>+ struct team_port *port;
>+
>+ rcu_read_lock();
>+ list_for_each_entry_rcu(port, &team->port_list, list) {
>+ dev_uc_sync(port->dev, dev);
>+ dev_mc_sync(port->dev, dev);
>+ }
>+ rcu_read_unlock();
>+}
>+
>+static int team_set_mac_address(struct net_device *dev, void *p)
>+{
>+ struct team *team = netdev_priv(dev);
>+ struct team_port *port;
>+ struct sockaddr *addr = p;
>+
>+ memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
>+ rcu_read_lock();
>+ list_for_each_entry_rcu(port, &team->port_list, list)
>+ if (team->mode_ops.port_change_mac)
>+ team->mode_ops.port_change_mac(team, port);
>+ rcu_read_unlock();
>+ return 0;
>+}
>+
>+static int team_change_mtu(struct net_device *dev, int new_mtu)
>+{
>+ struct team *team = netdev_priv(dev);
>+ struct team_port *port;
>+ int err;
>+
>+ rcu_read_lock();
>+ list_for_each_entry_rcu(port, &team->port_list, list) {
>+ err = dev_set_mtu(port->dev, new_mtu);
>+ if (err) {
>+ netdev_err(dev, "Device %s failed to change mtu",
>+ port->dev->name);
>+ goto unwind;
>+ }
>+ }
>+ rcu_read_unlock();
>+
>+ dev->mtu = new_mtu;
>+
>+ return 0;
>+
>+unwind:
>+ list_for_each_entry_continue_reverse_rcu(port, &team->port_list, list)
>+ dev_set_mtu(port->dev, dev->mtu);
>+
>+ rcu_read_unlock();
>+ return err;
>+}
>+
>+static struct rtnl_link_stats64 *
>+team_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
>+{
>+ struct team *team = netdev_priv(dev);
>+ struct team_pcpu_stats *p;
>+ u64 rx_packets, rx_bytes, rx_multicast, tx_packets, tx_bytes;
>+ u32 rx_dropped = 0, tx_dropped = 0;
>+ unsigned int start;
>+ int i;
>+
>+ for_each_possible_cpu(i) {
>+ p = per_cpu_ptr(team->pcpu_stats, i);
>+ do {
>+ start = u64_stats_fetch_begin_bh(&p->syncp);
>+ rx_packets = p->rx_packets;
>+ rx_bytes = p->rx_bytes;
>+ rx_multicast = p->rx_multicast;
>+ tx_packets = p->tx_packets;
>+ tx_bytes = p->tx_bytes;
>+ } while (u64_stats_fetch_retry_bh(&p->syncp, start));
>+
>+ stats->rx_packets += rx_packets;
>+ stats->rx_bytes += rx_bytes;
>+ stats->multicast += rx_multicast;
>+ stats->tx_packets += tx_packets;
>+ stats->tx_bytes += tx_bytes;
>+ /*
>+ * rx_dropped & tx_dropped are u32, updated
>+ * without syncp protection.
>+ */
>+ rx_dropped += p->rx_dropped;
>+ tx_dropped += p->tx_dropped;
>+ }
>+ stats->rx_dropped = rx_dropped;
>+ stats->tx_dropped = tx_dropped;
>+ return stats;
>+}
>+
>+static void team_vlan_rx_add_vid(struct net_device *dev, uint16_t vid)
>+{
>+ struct team *team = netdev_priv(dev);
>+ struct team_port *port;
>+
>+ rcu_read_lock();
>+ list_for_each_entry_rcu(port, &team->port_list, list) {
>+ const struct net_device_ops *ops = port->dev->netdev_ops;
>+
>+ ops->ndo_vlan_rx_add_vid(port->dev, vid);
>+ }
>+ rcu_read_unlock();
>+}
>+
>+static void team_vlan_rx_kill_vid(struct net_device *dev, uint16_t vid)
>+{
>+ struct team *team = netdev_priv(dev);
>+ struct team_port *port;
>+
>+ rcu_read_lock();
>+ list_for_each_entry_rcu(port, &team->port_list, list) {
>+ const struct net_device_ops *ops = port->dev->netdev_ops;
>+
>+ ops->ndo_vlan_rx_kill_vid(port->dev, vid);
>+ }
>+ rcu_read_unlock();
>+}
>+
>+static int team_add_slave(struct net_device *dev, struct net_device *port_dev)
>+{
>+ struct team *team = netdev_priv(dev);
>+ int err;
>+
>+ spin_lock(&team->lock);
>+ err = team_port_add(team, port_dev);
>+ spin_unlock(&team->lock);
>+ return err;
>+}
>+
>+static int team_del_slave(struct net_device *dev, struct net_device *port_dev)
>+{
>+ struct team *team = netdev_priv(dev);
>+ int err;
>+
>+ spin_lock(&team->lock);
>+ err = team_port_del(team, port_dev);
>+ spin_unlock(&team->lock);
>+ return err;
>+}
>+
>+static const struct net_device_ops team_netdev_ops = {
>+ .ndo_init = team_init,
>+ .ndo_uninit = team_uninit,
>+ .ndo_open = team_open,
>+ .ndo_stop = team_close,
>+ .ndo_start_xmit = team_xmit,
>+ .ndo_change_rx_flags = team_change_rx_flags,
>+ .ndo_set_rx_mode = team_set_rx_mode,
>+ .ndo_set_mac_address = team_set_mac_address,
>+ .ndo_change_mtu = team_change_mtu,
>+ .ndo_get_stats64 = team_get_stats64,
>+ .ndo_vlan_rx_add_vid = team_vlan_rx_add_vid,
>+ .ndo_vlan_rx_kill_vid = team_vlan_rx_kill_vid,
>+ .ndo_add_slave = team_add_slave,
>+ .ndo_del_slave = team_del_slave,
>+};
>+
>+
>+/***********************
>+ * rt netlink interface
>+ ***********************/
>+
>+static void team_setup(struct net_device *dev)
>+{
>+ ether_setup(dev);
>+
>+ dev->netdev_ops = &team_netdev_ops;
>+ dev->destructor = team_destructor;
>+ dev->tx_queue_len = 0;
>+ dev->flags |= IFF_MULTICAST;
>+ dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
>+
>+ /*
>+ * Indicate we support unicast address filtering. That way core won't
>+ * bring us to promisc mode in case a unicast addr is added.
>+ * Let this up to underlay drivers.
>+ */
>+ dev->priv_flags |= IFF_UNICAST_FLT;
>+
>+ dev->features |= NETIF_F_LLTX;
>+ dev->features |= NETIF_F_GRO;
>+ dev->hw_features = NETIF_F_HW_VLAN_TX |
>+ NETIF_F_HW_VLAN_RX |
>+ NETIF_F_HW_VLAN_FILTER;
>+
>+ dev->features |= dev->hw_features;
>+}
>+
>+static int team_newlink(struct net *src_net, struct net_device *dev,
>+ struct nlattr *tb[], struct nlattr *data[])
>+{
>+ int err;
>+
>+ if (tb[IFLA_ADDRESS] == NULL)
>+ random_ether_addr(dev->dev_addr);
>+
>+ err = register_netdevice(dev);
>+ if (err)
>+ return err;
>+
>+ return 0;
>+}
>+
>+static int team_validate(struct nlattr *tb[], struct nlattr *data[])
>+{
>+ if (tb[IFLA_ADDRESS]) {
>+ if (nla_len(tb[IFLA_ADDRESS]) != ETH_ALEN)
>+ return -EINVAL;
>+ if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS])))
>+ return -EADDRNOTAVAIL;
>+ }
>+ return 0;
>+}
>+
>+static struct rtnl_link_ops team_link_ops __read_mostly = {
>+ .kind = DRV_NAME,
>+ .priv_size = sizeof(struct team),
>+ .setup = team_setup,
>+ .newlink = team_newlink,
>+ .validate = team_validate,
>+};
>+
>+
>+/***********************************
>+ * Generic netlink custom interface
>+ ***********************************/
>+
>+static struct genl_family team_nl_family = {
>+ .id = GENL_ID_GENERATE,
>+ .name = TEAM_GENL_NAME,
>+ .version = TEAM_GENL_VERSION,
>+ .maxattr = TEAM_ATTR_MAX,
>+ .netnsok = true,
>+};
>+
>+static const struct nla_policy team_nl_policy[TEAM_ATTR_MAX + 1] = {
>+ [TEAM_ATTR_UNSPEC] = { .type = NLA_UNSPEC, },
>+ [TEAM_ATTR_TEAM_IFINDEX] = { .type = NLA_U32 },
>+ [TEAM_ATTR_LIST_OPTION] = { .type = NLA_NESTED },
>+ [TEAM_ATTR_LIST_PORT] = { .type = NLA_NESTED },
>+};
>+
>+static const struct nla_policy
>+team_nl_option_policy[TEAM_ATTR_OPTION_MAX + 1] = {
>+ [TEAM_ATTR_OPTION_UNSPEC] = { .type = NLA_UNSPEC, },
>+ [TEAM_ATTR_OPTION_NAME] = {
>+ .type = NLA_STRING,
>+ .len = TEAM_STRING_MAX_LEN,
>+ },
>+ [TEAM_ATTR_OPTION_CHANGED] = { .type = NLA_FLAG },
>+ [TEAM_ATTR_OPTION_TYPE] = { .type = NLA_U8 },
>+ [TEAM_ATTR_OPTION_DATA] = {
>+ .type = NLA_BINARY,
>+ .len = TEAM_STRING_MAX_LEN,
>+ },
>+};
>+
>+static int team_nl_cmd_noop(struct sk_buff *skb, struct genl_info *info)
>+{
>+ struct sk_buff *msg;
>+ void *hdr;
>+ int err;
>+
>+ msg = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
>+ if (!msg)
>+ return -ENOMEM;
>+
>+ hdr = genlmsg_put(msg, info->snd_pid, info->snd_seq,
>+ &team_nl_family, 0, TEAM_CMD_NOOP);
>+ if (IS_ERR(hdr)) {
>+ err = PTR_ERR(hdr);
>+ goto err_msg_put;
>+ }
>+
>+ genlmsg_end(msg, hdr);
>+
>+ return genlmsg_unicast(genl_info_net(info), msg, info->snd_pid);
>+
>+err_msg_put:
>+ nlmsg_free(msg);
>+
>+ return err;
>+}
>+
>+/*
>+ * Netlink cmd functions should be locked by following two functions.
>+ * To ensure team_uninit would not be called in between, hold rcu_read_lock
>+ * all the time.
>+ */
>+static struct team *team_nl_team_get(struct genl_info *info)
>+{
>+ struct net *net = genl_info_net(info);
>+ int ifindex;
>+ struct net_device *dev;
>+ struct team *team;
>+
>+ if (!info->attrs[TEAM_ATTR_TEAM_IFINDEX])
>+ return NULL;
>+
>+ ifindex = nla_get_u32(info->attrs[TEAM_ATTR_TEAM_IFINDEX]);
>+ rcu_read_lock();
>+ dev = dev_get_by_index_rcu(net, ifindex);
>+ if (!dev || dev->netdev_ops != &team_netdev_ops) {
>+ rcu_read_unlock();
>+ return NULL;
>+ }
>+
>+ team = netdev_priv(dev);
>+ spin_lock(&team->lock);
>+ return team;
>+}
>+
>+static void team_nl_team_put(struct team *team)
>+{
>+ spin_unlock(&team->lock);
>+ rcu_read_unlock();
>+}
>+
>+static int team_nl_send_generic(struct genl_info *info, struct team *team,
>+ int (*fill_func)(struct sk_buff *skb,
>+ struct genl_info *info,
>+ int flags, struct team *team))
>+{
>+ struct sk_buff *skb;
>+ int err;
>+
>+ skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
>+ if (!skb)
>+ return -ENOMEM;
>+
>+ err = fill_func(skb, info, NLM_F_ACK, team);
>+ if (err < 0)
>+ goto err_fill;
>+
>+ err = genlmsg_unicast(genl_info_net(info), skb, info->snd_pid);
>+ return err;
>+
>+err_fill:
>+ nlmsg_free(skb);
>+ return err;
>+}
>+
>+static int team_nl_fill_options_get_changed(struct sk_buff *skb,
>+ u32 pid, u32 seq, int flags,
>+ struct team *team,
>+ struct team_option *changed_option)
>+{
>+ struct nlattr *option_list;
>+ void *hdr;
>+ struct team_option *option;
>+
>+ hdr = genlmsg_put(skb, pid, seq, &team_nl_family, flags,
>+ TEAM_CMD_OPTIONS_GET);
>+ if (IS_ERR(hdr))
>+ return PTR_ERR(hdr);
>+
>+ NLA_PUT_U32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex);
>+ option_list = nla_nest_start(skb, TEAM_ATTR_LIST_OPTION);
>+ if (!option_list)
>+ return -EMSGSIZE;
>+
>+ list_for_each_entry(option, &team->option_list, list) {
>+ struct nlattr *option_item;
>+ long arg;
>+
>+ option_item = nla_nest_start(skb, TEAM_ATTR_ITEM_OPTION);
>+ if (!option_item)
>+ goto nla_put_failure;
>+ NLA_PUT_STRING(skb, TEAM_ATTR_OPTION_NAME, option->name);
>+ if (option == changed_option)
>+ NLA_PUT_FLAG(skb, TEAM_ATTR_OPTION_CHANGED);
>+ switch (option->type) {
>+ case TEAM_OPTION_TYPE_U32:
>+ NLA_PUT_U8(skb, TEAM_ATTR_OPTION_TYPE, NLA_U32);
>+ team_option_get(team, option, &arg);
>+ NLA_PUT_U32(skb, TEAM_ATTR_OPTION_DATA, arg);
>+ break;
>+ case TEAM_OPTION_TYPE_STRING:
>+ NLA_PUT_U8(skb, TEAM_ATTR_OPTION_TYPE, NLA_STRING);
>+ team_option_get(team, option, &arg);
>+ NLA_PUT_STRING(skb, TEAM_ATTR_OPTION_DATA,
>+ (char *) arg);
>+ break;
>+ default:
>+ BUG();
>+ }
>+ nla_nest_end(skb, option_item);
>+ }
>+
>+ nla_nest_end(skb, option_list);
>+ return genlmsg_end(skb, hdr);
>+
>+nla_put_failure:
>+ genlmsg_cancel(skb, hdr);
>+ return -EMSGSIZE;
>+}
>+
>+static int team_nl_fill_options_get(struct sk_buff *skb,
>+ struct genl_info *info, int flags,
>+ struct team *team)
>+{
>+ return team_nl_fill_options_get_changed(skb, info->snd_pid,
>+ info->snd_seq, NLM_F_ACK,
>+ team, NULL);
>+}
>+
>+static int team_nl_cmd_options_get(struct sk_buff *skb, struct genl_info *info)
>+{
>+ struct team *team;
>+ int err;
>+
>+ team = team_nl_team_get(info);
>+ if (!team)
>+ return -EINVAL;
>+
>+ err = team_nl_send_generic(info, team, team_nl_fill_options_get);
>+
>+ team_nl_team_put(team);
>+
>+ return err;
>+}
>+
>+static int team_nl_cmd_options_set(struct sk_buff *skb, struct genl_info *info)
>+{
>+ struct team *team;
>+ int err = 0;
>+ int i;
>+ struct nlattr *nl_option;
>+
>+ team = team_nl_team_get(info);
>+ if (!team)
>+ return -EINVAL;
>+
>+ err = -EINVAL;
>+ if (!info->attrs[TEAM_ATTR_LIST_OPTION]) {
>+ err = -EINVAL;
>+ goto team_put;
>+ }
>+
>+ nla_for_each_nested(nl_option, info->attrs[TEAM_ATTR_LIST_OPTION], i) {
>+ struct nlattr *mode_attrs[TEAM_ATTR_OPTION_MAX + 1];
>+ enum team_option_type opt_type;
>+ struct team_option *option;
>+ char *opt_name;
>+ bool opt_found = false;
>+
>+ if (nla_type(nl_option) != TEAM_ATTR_ITEM_OPTION) {
>+ err = -EINVAL;
>+ goto team_put;
>+ }
>+ err = nla_parse_nested(mode_attrs, TEAM_ATTR_OPTION_MAX,
>+ nl_option, team_nl_option_policy);
>+ if (err)
>+ goto team_put;
>+ if (!mode_attrs[TEAM_ATTR_OPTION_NAME] ||
>+ !mode_attrs[TEAM_ATTR_OPTION_TYPE] ||
>+ !mode_attrs[TEAM_ATTR_OPTION_DATA]) {
>+ err = -EINVAL;
>+ goto team_put;
>+ }
>+ switch (nla_get_u8(mode_attrs[TEAM_ATTR_OPTION_TYPE])) {
>+ case NLA_U32:
>+ opt_type = TEAM_OPTION_TYPE_U32;
>+ break;
>+ case NLA_STRING:
>+ opt_type = TEAM_OPTION_TYPE_STRING;
>+ break;
>+ default:
>+ goto team_put;
>+ }
>+
>+ opt_name = nla_data(mode_attrs[TEAM_ATTR_OPTION_NAME]);
>+ list_for_each_entry(option, &team->option_list, list) {
>+ long arg;
>+ struct nlattr *opt_data_attr;
>+
>+ if (option->type != opt_type ||
>+ strcmp(option->name, opt_name))
>+ continue;
>+ opt_found = true;
>+ opt_data_attr = mode_attrs[TEAM_ATTR_OPTION_DATA];
>+ switch (opt_type) {
>+ case TEAM_OPTION_TYPE_U32:
>+ arg = nla_get_u32(opt_data_attr);
>+ break;
>+ case TEAM_OPTION_TYPE_STRING:
>+ arg = (long) nla_data(opt_data_attr);
>+ break;
>+ default:
>+ BUG();
>+ }
>+ err = team_option_set(team, option, &arg);
>+ if (err)
>+ goto team_put;
>+ }
>+ if (!opt_found) {
>+ err = -ENOENT;
>+ goto team_put;
>+ }
>+ }
>+
>+team_put:
>+ team_nl_team_put(team);
>+
>+ return err;
>+}
>+
>+static int team_nl_fill_port_list_get_changed(struct sk_buff *skb,
>+ u32 pid, u32 seq, int flags,
>+ struct team *team,
>+ struct team_port *changed_port)
>+{
>+ struct nlattr *port_list;
>+ void *hdr;
>+ struct team_port *port;
>+
>+ hdr = genlmsg_put(skb, pid, seq, &team_nl_family, flags,
>+ TEAM_CMD_PORT_LIST_GET);
>+ if (IS_ERR(hdr))
>+ return PTR_ERR(hdr);
>+
>+ NLA_PUT_U32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex);
>+ port_list = nla_nest_start(skb, TEAM_ATTR_LIST_PORT);
>+ if (!port_list)
>+ return -EMSGSIZE;
>+
>+ list_for_each_entry_rcu(port, &team->port_list, list) {
>+ struct nlattr *port_item;
>+
>+ port_item = nla_nest_start(skb, TEAM_ATTR_ITEM_PORT);
>+ if (!port_item)
>+ goto nla_put_failure;
>+ NLA_PUT_U32(skb, TEAM_ATTR_PORT_IFINDEX, port->dev->ifindex);
>+ if (port == changed_port)
>+ NLA_PUT_FLAG(skb, TEAM_ATTR_PORT_CHANGED);
>+ if (port->linkup)
>+ NLA_PUT_FLAG(skb, TEAM_ATTR_PORT_LINKUP);
>+ NLA_PUT_U32(skb, TEAM_ATTR_PORT_SPEED, port->speed);
>+ NLA_PUT_U8(skb, TEAM_ATTR_PORT_DUPLEX, port->duplex);
>+ nla_nest_end(skb, port_item);
>+ }
>+
>+ nla_nest_end(skb, port_list);
>+ return genlmsg_end(skb, hdr);
>+
>+nla_put_failure:
>+ genlmsg_cancel(skb, hdr);
>+ return -EMSGSIZE;
>+}
>+
>+static int team_nl_fill_port_list_get(struct sk_buff *skb,
>+ struct genl_info *info, int flags,
>+ struct team *team)
>+{
>+ return team_nl_fill_port_list_get_changed(skb, info->snd_pid,
>+ info->snd_seq, NLM_F_ACK,
>+ team, NULL);
>+}
>+
>+static int team_nl_cmd_port_list_get(struct sk_buff *skb,
>+ struct genl_info *info)
>+{
>+ struct team *team;
>+ int err;
>+
>+ team = team_nl_team_get(info);
>+ if (!team)
>+ return -EINVAL;
>+
>+ err = team_nl_send_generic(info, team, team_nl_fill_port_list_get);
>+
>+ team_nl_team_put(team);
>+
>+ return err;
>+}
>+
>+static struct genl_ops team_nl_ops[] = {
>+ {
>+ .cmd = TEAM_CMD_NOOP,
>+ .doit = team_nl_cmd_noop,
>+ .policy = team_nl_policy,
>+ },
>+ {
>+ .cmd = TEAM_CMD_OPTIONS_SET,
>+ .doit = team_nl_cmd_options_set,
>+ .policy = team_nl_policy,
>+ .flags = GENL_ADMIN_PERM,
>+ },
>+ {
>+ .cmd = TEAM_CMD_OPTIONS_GET,
>+ .doit = team_nl_cmd_options_get,
>+ .policy = team_nl_policy,
>+ .flags = GENL_ADMIN_PERM,
>+ },
>+ {
>+ .cmd = TEAM_CMD_PORT_LIST_GET,
>+ .doit = team_nl_cmd_port_list_get,
>+ .policy = team_nl_policy,
>+ .flags = GENL_ADMIN_PERM,
>+ },
>+};
>+
>+static struct genl_multicast_group team_change_event_mcgrp = {
>+ .name = TEAM_GENL_CHANGE_EVENT_MC_GRP_NAME,
>+};
>+
>+static int team_nl_send_event_options_get(struct team *team,
>+ struct team_option *changed_option)
>+{
>+ struct sk_buff *skb;
>+ int err;
>+ struct net *net = dev_net(team->dev);
>+
>+ skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
>+ if (!skb)
>+ return -ENOMEM;
>+
>+ err = team_nl_fill_options_get_changed(skb, 0, 0, 0, team,
>+ changed_option);
>+ if (err < 0)
>+ goto err_fill;
>+
>+ err = genlmsg_multicast_netns(net, skb, 0, team_change_event_mcgrp.id,
>+ GFP_KERNEL);
>+ return err;
>+
>+err_fill:
>+ nlmsg_free(skb);
>+ return err;
>+}
>+
>+static int team_nl_send_event_port_list_get(struct team_port *port)
>+{
>+ struct sk_buff *skb;
>+ int err;
>+ struct net *net = dev_net(port->team->dev);
>+
>+ skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
>+ if (!skb)
>+ return -ENOMEM;
>+
>+ err = team_nl_fill_port_list_get_changed(skb, 0, 0, 0,
>+ port->team, port);
>+ if (err < 0)
>+ goto err_fill;
>+
>+ err = genlmsg_multicast_netns(net, skb, 0, team_change_event_mcgrp.id,
>+ GFP_KERNEL);
>+ return err;
>+
>+err_fill:
>+ nlmsg_free(skb);
>+ return err;
>+}
>+
>+static int team_nl_init(void)
>+{
>+ int err;
>+
>+ err = genl_register_family_with_ops(&team_nl_family, team_nl_ops,
>+ ARRAY_SIZE(team_nl_ops));
>+ if (err)
>+ return err;
>+
>+ err = genl_register_mc_group(&team_nl_family, &team_change_event_mcgrp);
>+ if (err)
>+ goto err_change_event_grp_reg;
>+
>+ return 0;
>+
>+err_change_event_grp_reg:
>+ genl_unregister_family(&team_nl_family);
>+
>+ return err;
>+}
>+
>+static void team_nl_fini(void)
>+{
>+ genl_unregister_family(&team_nl_family);
>+}
>+
>+
>+/******************
>+ * Change checkers
>+ ******************/
>+
>+static void __team_options_change_check(struct team *team,
>+ struct team_option *changed_option)
>+{
>+ int err;
>+
>+ err = team_nl_send_event_options_get(team, changed_option);
>+ if (err)
>+ netdev_warn(team->dev, "Failed to send options change via netlink\n");
>+}
>+
>+/* rtnl lock is held */
>+static void __team_port_change_check(struct team_port *port, bool linkup)
>+{
>+ int err;
>+
>+ if (port->linkup == linkup)
>+ return;
>+
>+ port->linkup = linkup;
>+ if (linkup) {
>+ struct ethtool_cmd ecmd;
>+
>+ err = __ethtool_get_settings(port->dev, &ecmd);
>+ if (!err) {
>+ port->speed = ethtool_cmd_speed(&ecmd);
>+ port->duplex = ecmd.duplex;
>+ goto send_event;
>+ }
>+ }
>+ port->speed = 0;
>+ port->duplex = 0;
>+
>+send_event:
>+ err = team_nl_send_event_port_list_get(port);
>+ if (err)
>+ netdev_warn(port->team->dev, "Failed to send port change of device %s via netlink\n",
>+ port->dev->name);
>+
>+}
>+
>+static void team_port_change_check(struct team_port *port, bool linkup)
>+{
>+ struct team *team = port->team;
>+
>+ spin_lock(&team->lock);
>+ __team_port_change_check(port, linkup);
>+ spin_unlock(&team->lock);
>+}
>+
>+/************************************
>+ * Net device notifier event handler
>+ ************************************/
>+
>+static int team_device_event(struct notifier_block *unused,
>+ unsigned long event, void *ptr)
>+{
>+ struct net_device *dev = (struct net_device *) ptr;
>+ struct team_port *port;
>+
>+ port = team_port_get_rtnl(dev);
>+ if (!port)
>+ return NOTIFY_DONE;
>+
>+ switch (event) {
>+ case NETDEV_UP:
>+ if (netif_carrier_ok(dev))
>+ team_port_change_check(port, true);
>+ case NETDEV_DOWN:
>+ team_port_change_check(port, false);
>+ case NETDEV_CHANGE:
>+ if (netif_running(port->dev))
>+ team_port_change_check(port,
>+ !!netif_carrier_ok(port->dev));
>+ break;
>+ case NETDEV_UNREGISTER:
>+ team_del_slave(port->team->dev, dev);
>+ break;
>+ case NETDEV_FEAT_CHANGE:
>+ team_compute_features(port->team);
>+ break;
>+ case NETDEV_CHANGEMTU:
>+ /* Forbid to change mtu of underlaying device */
>+ return NOTIFY_BAD;
>+ case NETDEV_CHANGEADDR:
>+ /* Forbid to change addr of underlaying device */
>+ return NOTIFY_BAD;
>+ case NETDEV_PRE_TYPE_CHANGE:
>+ /* Forbid to change type of underlaying device */
>+ return NOTIFY_BAD;
>+ }
>+ return NOTIFY_DONE;
>+}
>+
>+static struct notifier_block team_notifier_block __read_mostly = {
>+ .notifier_call = team_device_event,
>+};
>+
>+
>+/***********************
>+ * Module init and exit
>+ ***********************/
>+
>+static int __init team_module_init(void)
>+{
>+ int err;
>+
>+ register_netdevice_notifier(&team_notifier_block);
>+
>+ err = rtnl_link_register(&team_link_ops);
>+ if (err)
>+ goto err_rtnl_reg;
>+
>+ err = team_nl_init();
>+ if (err)
>+ goto err_nl_init;
>+
>+ return 0;
>+
>+err_nl_init:
>+ rtnl_link_unregister(&team_link_ops);
>+
>+err_rtnl_reg:
>+ unregister_netdevice_notifier(&team_notifier_block);
>+
>+ return err;
>+}
>+
>+static void __exit team_module_exit(void)
>+{
>+ team_nl_fini();
>+ rtnl_link_unregister(&team_link_ops);
>+ unregister_netdevice_notifier(&team_notifier_block);
>+}
>+
>+module_init(team_module_init);
>+module_exit(team_module_exit);
>+
>+MODULE_LICENSE("GPL v2");
>+MODULE_AUTHOR("Jiri Pirko <jpirko@redhat.com>");
>+MODULE_DESCRIPTION("Ethernet team device driver");
>+MODULE_ALIAS_RTNL_LINK(DRV_NAME);
>diff --git a/drivers/net/team/team_mode_activebackup.c b/drivers/net/team/team_mode_activebackup.c
>new file mode 100644
>index 0000000..1aa2bfb
>--- /dev/null
>+++ b/drivers/net/team/team_mode_activebackup.c
>@@ -0,0 +1,152 @@
>+/*
>+ * net/drivers/team/team_mode_activebackup.c - Active-backup mode for team
>+ * Copyright (c) 2011 Jiri Pirko <jpirko@redhat.com>
>+ *
>+ * This program is free software; you can redistribute it and/or modify
>+ * it under the terms of the GNU General Public License as published by
>+ * the Free Software Foundation; either version 2 of the License, or
>+ * (at your option) any later version.
>+ */
>+
>+#include <linux/kernel.h>
>+#include <linux/types.h>
>+#include <linux/module.h>
>+#include <linux/init.h>
>+#include <linux/errno.h>
>+#include <linux/netdevice.h>
>+#include <net/rtnetlink.h>
>+#include <linux/if_team.h>
>+
>+struct ab_priv {
>+ struct team_port __rcu *active_port;
>+};
>+
>+static struct ab_priv *ab_priv(struct team *team)
>+{
>+ return (struct ab_priv *) &team->mode_priv;
>+}
>+
>+static rx_handler_result_t ab_receive(struct team *team, struct team_port *port,
>+ struct sk_buff *skb) {
>+ struct team_port *active_port;
>+
>+ active_port = rcu_dereference(ab_priv(team)->active_port);
>+ if (active_port != port)
>+ return RX_HANDLER_EXACT;
>+ return RX_HANDLER_ANOTHER;
>+}
>+
>+static bool ab_transmit(struct team *team, struct sk_buff *skb)
>+{
>+ struct team_port *active_port;
>+
>+ active_port = rcu_dereference(ab_priv(team)->active_port);
>+ if (unlikely(!active_port))
>+ goto drop;
>+ skb->dev = active_port->dev;
>+ if (dev_queue_xmit(skb))
>+ return false;
>+ return true;
>+
>+drop:
>+ dev_kfree_skb(skb);
>+ return false;
>+}
>+
>+static void ab_port_leave(struct team *team, struct team_port *port)
>+{
>+ if (ab_priv(team)->active_port == port)
>+ rcu_assign_pointer(ab_priv(team)->active_port, NULL);
>+}
>+
>+static void ab_port_change_mac(struct team *team, struct team_port *port)
>+{
>+ if (ab_priv(team)->active_port == port)
>+ team_port_set_team_mac(port);
>+}
>+
>+static int ab_active_port_get(struct team *team, void *arg)
>+{
>+ u32 *ifindex = arg;
>+
>+ *ifindex = 0;
>+ if (ab_priv(team)->active_port)
>+ *ifindex = ab_priv(team)->active_port->dev->ifindex;
>+ return 0;
>+}
>+
>+static int ab_active_port_set(struct team *team, void *arg)
>+{
>+ u32 *ifindex = arg;
>+ struct team_port *port;
>+
>+ list_for_each_entry_rcu(port, &team->port_list, list) {
>+ if (port->dev->ifindex == *ifindex) {
>+ struct team_port *ac_port = ab_priv(team)->active_port;
>+
>+ /* rtnl_lock needs to be held when setting macs */
>+ rtnl_lock();
>+ if (ac_port)
>+ team_port_set_orig_mac(ac_port);
>+ rcu_assign_pointer(ab_priv(team)->active_port, port);
>+ team_port_set_team_mac(port);
>+ rtnl_unlock();
>+ return 0;
>+ }
>+ }
>+ return -ENOENT;
>+}
>+
>+static struct team_option ab_options[] = {
>+ {
>+ .name = "activeport",
>+ .type = TEAM_OPTION_TYPE_U32,
>+ .getter = ab_active_port_get,
>+ .setter = ab_active_port_set,
>+ },
>+};
>+
>+int ab_init(struct team *team)
>+{
>+ team_options_register(team, ab_options, ARRAY_SIZE(ab_options));
>+ return 0;
>+}
>+
>+void ab_exit(struct team *team)
>+{
>+ team_options_unregister(team, ab_options, ARRAY_SIZE(ab_options));
>+}
>+
>+static const struct team_mode_ops ab_mode_ops = {
>+ .init = ab_init,
>+ .exit = ab_exit,
>+ .receive = ab_receive,
>+ .transmit = ab_transmit,
>+ .port_leave = ab_port_leave,
>+ .port_change_mac = ab_port_change_mac,
>+};
>+
>+static struct team_mode ab_mode = {
>+ .kind = "activebackup",
>+ .owner = THIS_MODULE,
>+ .priv_size = sizeof(struct ab_priv),
>+ .ops = &ab_mode_ops,
>+};
>+
>+static int __init ab_init_module(void)
>+{
>+ return team_mode_register(&ab_mode);
>+}
>+
>+static void __exit ab_cleanup_module(void)
>+{
>+ team_mode_unregister(&ab_mode);
>+}
>+
>+module_init(ab_init_module);
>+module_exit(ab_cleanup_module);
>+
>+MODULE_LICENSE("GPL v2");
>+MODULE_AUTHOR("Jiri Pirko <jpirko@redhat.com>");
>+MODULE_DESCRIPTION("Active-backup mode for team");
>+MODULE_ALIAS("team-mode-activebackup");
>diff --git a/drivers/net/team/team_mode_roundrobin.c b/drivers/net/team/team_mode_roundrobin.c
>new file mode 100644
>index 0000000..0374052
>--- /dev/null
>+++ b/drivers/net/team/team_mode_roundrobin.c
>@@ -0,0 +1,107 @@
>+/*
>+ * net/drivers/team/team_mode_roundrobin.c - Round-robin mode for team
>+ * Copyright (c) 2011 Jiri Pirko <jpirko@redhat.com>
>+ *
>+ * This program is free software; you can redistribute it and/or modify
>+ * it under the terms of the GNU General Public License as published by
>+ * the Free Software Foundation; either version 2 of the License, or
>+ * (at your option) any later version.
>+ */
>+
>+#include <linux/kernel.h>
>+#include <linux/types.h>
>+#include <linux/module.h>
>+#include <linux/init.h>
>+#include <linux/errno.h>
>+#include <linux/netdevice.h>
>+#include <linux/if_team.h>
>+
>+struct rr_priv {
>+ unsigned int sent_packets;
>+};
>+
>+static struct rr_priv *rr_priv(struct team *team)
>+{
>+ return (struct rr_priv *) &team->mode_priv;
>+}
>+
>+static struct team_port *__get_first_port_up(struct team *team,
>+ struct team_port *port)
>+{
>+ struct team_port *cur;
>+
>+ if (port->linkup)
>+ return port;
>+ cur = port;
>+ list_for_each_entry_continue_rcu(cur, &team->port_list, list)
>+ if (cur->linkup)
>+ return cur;
>+ list_for_each_entry_rcu(cur, &team->port_list, list) {
>+ if (cur == port)
>+ break;
>+ if (cur->linkup)
>+ return cur;
>+ }
>+ return NULL;
>+}
>+
>+static bool rr_transmit(struct team *team, struct sk_buff *skb)
>+{
>+ struct team_port *port;
>+ int port_index;
>+
>+ port_index = rr_priv(team)->sent_packets++ % team->port_count;
>+ port = team_get_port_by_index_rcu(team, port_index);
>+ port = __get_first_port_up(team, port);
>+ if (unlikely(!port))
>+ goto drop;
>+ skb->dev = port->dev;
>+ if (dev_queue_xmit(skb))
>+ return false;
>+ return true;
>+
>+drop:
>+ dev_kfree_skb(skb);
>+ return false;
>+}
>+
>+static int rr_port_enter(struct team *team, struct team_port *port)
>+{
>+ return team_port_set_team_mac(port);
>+}
>+
>+static void rr_port_change_mac(struct team *team, struct team_port *port)
>+{
>+ team_port_set_team_mac(port);
>+}
>+
>+static const struct team_mode_ops rr_mode_ops = {
>+ .transmit = rr_transmit,
>+ .port_enter = rr_port_enter,
>+ .port_change_mac = rr_port_change_mac,
>+};
>+
>+static struct team_mode rr_mode = {
>+ .kind = "roundrobin",
>+ .owner = THIS_MODULE,
>+ .priv_size = sizeof(struct rr_priv),
>+ .ops = &rr_mode_ops,
>+};
>+
>+static int __init rr_init_module(void)
>+{
>+ return team_mode_register(&rr_mode);
>+}
>+
>+static void __exit rr_cleanup_module(void)
>+{
>+ team_mode_unregister(&rr_mode);
>+}
>+
>+module_init(rr_init_module);
>+module_exit(rr_cleanup_module);
>+
>+MODULE_LICENSE("GPL v2");
>+MODULE_AUTHOR("Jiri Pirko <jpirko@redhat.com>");
>+MODULE_DESCRIPTION("Round-robin mode for team");
>+MODULE_ALIAS("team-mode-roundrobin");
>diff --git a/include/linux/Kbuild b/include/linux/Kbuild
>index 619b565..0b091b3 100644
>--- a/include/linux/Kbuild
>+++ b/include/linux/Kbuild
>@@ -185,6 +185,7 @@ header-y += if_pppol2tp.h
> header-y += if_pppox.h
> header-y += if_slip.h
> header-y += if_strip.h
>+header-y += if_team.h
> header-y += if_tr.h
> header-y += if_tun.h
> header-y += if_tunnel.h
>diff --git a/include/linux/if.h b/include/linux/if.h
>index db20bd4..06b6ef6 100644
>--- a/include/linux/if.h
>+++ b/include/linux/if.h
>@@ -79,6 +79,7 @@
> #define IFF_TX_SKB_SHARING 0x10000 /* The interface supports sharing
> * skbs on transmit */
> #define IFF_UNICAST_FLT 0x20000 /* Supports unicast filtering */
>+#define IFF_TEAM_PORT 0x40000 /* device used as team port */
>
> #define IF_GET_IFACE 0x0001 /* for querying only */
> #define IF_GET_PROTO 0x0002
>diff --git a/include/linux/if_team.h b/include/linux/if_team.h
>new file mode 100644
>index 0000000..de395fc
>--- /dev/null
>+++ b/include/linux/if_team.h
>@@ -0,0 +1,254 @@
>+/*
>+ * include/linux/if_team.h - Network team device driver header
>+ * Copyright (c) 2011 Jiri Pirko <jpirko@redhat.com>
>+ *
>+ * This program is free software; you can redistribute it and/or modify
>+ * it under the terms of the GNU General Public License as published by
>+ * the Free Software Foundation; either version 2 of the License, or
>+ * (at your option) any later version.
>+ */
>+
>+#ifndef _LINUX_IF_TEAM_H_
>+#define _LINUX_IF_TEAM_H_
>+
>+#ifdef __KERNEL__
>+
>+struct team_pcpu_stats {
>+ u64 rx_packets;
>+ u64 rx_bytes;
>+ u64 rx_multicast;
>+ u64 tx_packets;
>+ u64 tx_bytes;
>+ struct u64_stats_sync syncp;
>+ u32 rx_dropped;
>+ u32 tx_dropped;
>+};
>+
>+struct team;
>+
>+struct team_port {
>+ struct net_device *dev;
>+ struct hlist_node hlist; /* node in hash list */
>+ struct list_head list; /* node in ordinary list */
>+ struct team *team;
>+ int index;
>+
>+ /*
>+ * A place for storing original values of the device before it
>+ * become a port.
>+ */
>+ struct {
>+ unsigned char dev_addr[MAX_ADDR_LEN];
>+ unsigned int mtu;
>+ } orig;
>+
>+ bool linkup;
>+ u32 speed;
>+ u8 duplex;
>+
>+ struct rcu_head rcu;
>+};
>+
>+struct team_mode_ops {
>+ int (*init)(struct team *team);
>+ void (*exit)(struct team *team);
>+ rx_handler_result_t (*receive)(struct team *team,
>+ struct team_port *port,
>+ struct sk_buff *skb);
>+ bool (*transmit)(struct team *team, struct sk_buff *skb);
>+ int (*port_enter)(struct team *team, struct team_port *port);
>+ void (*port_leave)(struct team *team, struct team_port *port);
>+ void (*port_change_mac)(struct team *team, struct team_port *port);
>+};
>+
>+static inline void team_mode_ops_copy(struct team_mode_ops *dst,
>+ const struct team_mode_ops *src)
>+{
>+ dst->init = src->init;
>+ dst->exit = src->exit;
>+ dst->receive = src->receive;
>+ dst->transmit = src->transmit;
>+ dst->port_enter = src->port_enter;
>+ dst->port_leave = src->port_leave;
>+ dst->port_change_mac = src->port_change_mac;
>+}
>+
>+static inline void team_mode_ops_clear(struct team_mode_ops *dst)
>+{
>+ dst->init = NULL;
>+ dst->exit = NULL;
>+ dst->receive = NULL;
>+ dst->transmit = NULL;
>+ dst->port_enter = NULL;
>+ dst->port_leave = NULL;
>+ dst->port_change_mac = NULL;
>+}
>+
>+enum team_option_type {
>+ TEAM_OPTION_TYPE_U32,
>+ TEAM_OPTION_TYPE_STRING,
>+};
>+
>+struct team_option {
>+ struct list_head list;
>+ const char *name;
>+ enum team_option_type type;
>+ int (*getter)(struct team *team, void *arg);
>+ int (*setter)(struct team *team, void *arg);
>+};
>+
>+struct team_mode {
>+ struct list_head list;
>+ const char *kind;
>+ struct module *owner;
>+ size_t priv_size;
>+ const struct team_mode_ops *ops;
>+};
>+
>+#define TEAM_PORT_HASHBITS 4
>+#define TEAM_PORT_HASHENTRIES (1 << TEAM_PORT_HASHBITS)
>+
>+#define TEAM_MODE_PRIV_LONGS 4
>+#define TEAM_MODE_PRIV_SIZE (sizeof(long) * TEAM_MODE_PRIV_LONGS)
>+
>+struct team {
>+ struct net_device *dev; /* associated netdevice */
>+ struct team_pcpu_stats __percpu *pcpu_stats;
>+
>+ spinlock_t lock; /* used for overall locking, e.g. port lists write */
>+
>+ /*
>+ * port lists with port count
>+ */
>+ int port_count;
>+ struct hlist_head port_hlist[TEAM_PORT_HASHENTRIES];
>+ struct list_head port_list;
>+
>+ struct list_head option_list;
>+
>+ const char *mode_kind;
>+ struct team_mode_ops mode_ops;
>+ long mode_priv[TEAM_MODE_PRIV_LONGS];
>+};
>+
>+static inline struct hlist_head *team_port_index_hash(struct team *team,
>+ int port_index)
>+{
>+ return &team->port_hlist[port_index & (TEAM_PORT_HASHENTRIES - 1)];
>+}
>+
>+static inline struct team_port *team_get_port_by_index_rcu(struct team *team,
>+ int port_index)
>+{
>+ struct hlist_node *p;
>+ struct team_port *port;
>+ struct hlist_head *head = team_port_index_hash(team, port_index);
>+
>+ hlist_for_each_entry_rcu(port, p, head, hlist)
>+ if (port->index == port_index)
>+ return port;
>+ return NULL;
>+}
>+
>+extern int team_port_set_orig_mac(struct team_port *port);
>+extern int team_port_set_team_mac(struct team_port *port);
>+extern void team_options_register(struct team *team,
>+ struct team_option *option,
>+ size_t option_count);
>+extern void team_options_unregister(struct team *team,
>+ struct team_option *option,
>+ size_t option_count);
>+extern int team_mode_register(struct team_mode *mode);
>+extern int team_mode_unregister(struct team_mode *mode);
>+
>+#endif /* __KERNEL__ */
>+
>+#define TEAM_STRING_MAX_LEN 32
>+
>+/**********************************
>+ * NETLINK_GENERIC netlink family.
>+ **********************************/
>+
>+enum {
>+ TEAM_CMD_NOOP,
>+ TEAM_CMD_OPTIONS_SET,
>+ TEAM_CMD_OPTIONS_GET,
>+ TEAM_CMD_PORT_LIST_GET,
>+
>+ __TEAM_CMD_MAX,
>+ TEAM_CMD_MAX = (__TEAM_CMD_MAX - 1),
>+};
>+
>+enum {
>+ TEAM_ATTR_UNSPEC,
>+ TEAM_ATTR_TEAM_IFINDEX, /* u32 */
>+ TEAM_ATTR_LIST_OPTION, /* nest */
>+ TEAM_ATTR_LIST_PORT, /* nest */
>+
>+ __TEAM_ATTR_MAX,
>+ TEAM_ATTR_MAX = __TEAM_ATTR_MAX - 1,
>+};
>+
>+/* Nested layout of get/set msg:
>+ *
>+ * [TEAM_ATTR_LIST_OPTION]
>+ * [TEAM_ATTR_ITEM_OPTION]
>+ * [TEAM_ATTR_OPTION_*], ...
>+ * [TEAM_ATTR_ITEM_OPTION]
>+ * [TEAM_ATTR_OPTION_*], ...
>+ * ...
>+ * [TEAM_ATTR_LIST_PORT]
>+ * [TEAM_ATTR_ITEM_PORT]
>+ * [TEAM_ATTR_PORT_*], ...
>+ * [TEAM_ATTR_ITEM_PORT]
>+ * [TEAM_ATTR_PORT_*], ...
>+ * ...
>+ */
>+
>+enum {
>+ TEAM_ATTR_ITEM_OPTION_UNSPEC,
>+ TEAM_ATTR_ITEM_OPTION, /* nest */
>+
>+ __TEAM_ATTR_ITEM_OPTION_MAX,
>+ TEAM_ATTR_ITEM_OPTION_MAX = __TEAM_ATTR_ITEM_OPTION_MAX - 1,
>+};
>+
>+enum {
>+ TEAM_ATTR_OPTION_UNSPEC,
>+ TEAM_ATTR_OPTION_NAME, /* string */
>+ TEAM_ATTR_OPTION_CHANGED, /* flag */
>+ TEAM_ATTR_OPTION_TYPE, /* u8 */
>+ TEAM_ATTR_OPTION_DATA, /* dynamic */
>+
>+ __TEAM_ATTR_OPTION_MAX,
>+ TEAM_ATTR_OPTION_MAX = __TEAM_ATTR_OPTION_MAX - 1,
>+};
>+
>+enum {
>+ TEAM_ATTR_ITEM_PORT_UNSPEC,
>+ TEAM_ATTR_ITEM_PORT, /* nest */
>+
>+ __TEAM_ATTR_ITEM_PORT_MAX,
>+ TEAM_ATTR_ITEM_PORT_MAX = __TEAM_ATTR_ITEM_PORT_MAX - 1,
>+};
>+
>+enum {
>+ TEAM_ATTR_PORT_UNSPEC,
>+ TEAM_ATTR_PORT_IFINDEX, /* u32 */
>+ TEAM_ATTR_PORT_CHANGED, /* flag */
>+ TEAM_ATTR_PORT_LINKUP, /* flag */
>+ TEAM_ATTR_PORT_SPEED, /* u32 */
>+ TEAM_ATTR_PORT_DUPLEX, /* u8 */
>+
>+ __TEAM_ATTR_PORT_MAX,
>+ TEAM_ATTR_PORT_MAX = __TEAM_ATTR_PORT_MAX - 1,
>+};
>+
>+/*
>+ * NETLINK_GENERIC related info
>+ */
>+#define TEAM_GENL_NAME "team"
>+#define TEAM_GENL_VERSION 0x1
>+#define TEAM_GENL_CHANGE_EVENT_MC_GRP_NAME "change_event"
>+
>+#endif /* _LINUX_IF_TEAM_H_ */
>diff --git a/include/linux/rculist.h b/include/linux/rculist.h
>index d079290..7586b2c 100644
>--- a/include/linux/rculist.h
>+++ b/include/linux/rculist.h
>@@ -288,6 +288,20 @@ static inline void list_splice_init_rcu(struct list_head *list,
> pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>
> /**
>+ * list_for_each_entry_continue_reverse_rcu - iterate backwards from the given point
>+ * @pos: the type * to use as a loop cursor.
>+ * @head: the head for your list.
>+ * @member: the name of the list_struct within the struct.
>+ *
>+ * Start to iterate over list of given type backwards, continuing after
>+ * the current position.
>+ */
>+#define list_for_each_entry_continue_reverse_rcu(pos, head, member) \
>+ for (pos = list_entry_rcu(pos->member.prev, typeof(*pos), member); \
>+ &pos->member != (head); \
>+ pos = list_entry_rcu(pos->member.prev, typeof(*pos), member))
>+
>+/**
> * hlist_del_rcu - deletes entry from hash list without re-initialization
> * @n: the element to delete from the hash list.
> *
>--
>1.7.6
>
^ permalink raw reply
* lvs-users mailing list and archive dead?
From: Tomasz Chmielewski @ 2011-10-23 17:19 UTC (permalink / raw)
To: lvs-devel
Hi,
does anyone know what happened to lvs-users mailing list?
It used to be hosted on
http://lists.graemef.net/mailman/listinfo/lvs-users (as pointed on
http://www.linuxvirtualserver.org/mailing.html).
--
Tomasz Chmielewski
http://wpkg.org
^ permalink raw reply
* Re: tag process's future sockets for iptables rules?
From: p. awa @ 2011-10-23 17:18 UTC (permalink / raw)
To: netfilter
In-Reply-To: <alpine.LNX.2.01.1110222242100.17728@frira.zrqbmnf.qr>
> >| netfilter_add_tag("public-addresses-proxied-via-tor");
> >| netfilter_add_tag("internal-addresses-directly");
> >| netfilter_remove_tag("proxy-dns");
> >| execlp("wget", ...);
>
> A socket option, SO_MARK, for use with setsockopt/getsockopt.
but setsockopt is per socket. i'm looking for something that is
per process (and inherited by children - in the example, wget).
this is to replace what i do at the moment, namely
| setgid(123);
| execlp("wget", ...);
and
# iptables ... -m owner --gid-owner 123 ...
^ permalink raw reply
* [PATCH staging 6/6] et131x: uncloak PCIe capabilities.
From: Francois Romieu @ 2011-10-23 17:12 UTC (permalink / raw)
To: Mark Einon; +Cc: Greg KH, devel, linux-kernel
In-Reply-To: <20111023094231.GA3409@msilap.einon>
FIXME: it should be possible to get rid of ET1310_PCI_L0L1LATENCY as well.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
---
drivers/staging/et131x/et131x.c | 53 ++++++++++++++++++++++----------------
1 files changed, 31 insertions(+), 22 deletions(-)
diff --git a/drivers/staging/et131x/et131x.c b/drivers/staging/et131x/et131x.c
index 79ca1d3..2a0b794 100644
--- a/drivers/staging/et131x/et131x.c
+++ b/drivers/staging/et131x/et131x.c
@@ -155,7 +155,6 @@ MODULE_DESCRIPTION("10/100/1000 Base-T Ethernet Driver "
#define fMP_ADAPTER_FAIL_SEND_MASK 0x3ff00000
/* Some offsets in PCI config space that are actually used. */
-#define ET1310_PCI_MAX_PYLD 0x4C
#define ET1310_PCI_MAC_ADDRESS 0xA4
#define ET1310_PCI_EEPROM_STATUS 0xB2
#define ET1310_PCI_ACK_NACK 0xC0
@@ -4024,24 +4023,31 @@ static void et131x_hwaddr_init(struct et131x_adapter *adapter)
static int et131x_pci_init(struct et131x_adapter *adapter,
struct pci_dev *pdev)
{
- int i;
- u8 max_payload;
- u8 read_size_reg;
+ int cap = pci_pcie_cap(pdev);
+ u16 max_payload;
+ u16 ctl;
+ int i, rc;
- if (et131x_init_eeprom(adapter) < 0)
- return -EIO;
+ rc = et131x_init_eeprom(adapter);
+ if (rc < 0)
+ goto out;
+ if (!cap) {
+ dev_err(&pdev->dev, "Missing PCIe capabilities\n");
+ goto err_out;
+ }
+
/* Let's set up the PORT LOGIC Register. First we need to know what
* the max_payload_size is
*/
- if (pci_read_config_byte(pdev, ET1310_PCI_MAX_PYLD, &max_payload)) {
+ if (pci_read_config_word(pdev, cap + PCI_EXP_DEVCAP, &max_payload)) {
dev_err(&pdev->dev,
"Could not read PCI config space for Max Payload Size\n");
- return -EIO;
+ goto err_out;
}
/* Program the Ack/Nak latency and replay timers */
- max_payload &= 0x07; /* Only the lower 3 bits are valid */
+ max_payload &= 0x07;
if (max_payload < 2) {
static const u16 acknak[2] = { 0x76, 0xD0 };
@@ -4051,13 +4057,13 @@ static int et131x_pci_init(struct et131x_adapter *adapter,
acknak[max_payload])) {
dev_err(&pdev->dev,
"Could not write PCI config space for ACK/NAK\n");
- return -EIO;
+ goto err_out;
}
if (pci_write_config_word(pdev, ET1310_PCI_REPLAY,
replay[max_payload])) {
dev_err(&pdev->dev,
"Could not write PCI config space for Replay Timer\n");
- return -EIO;
+ goto err_out;
}
}
@@ -4067,23 +4073,22 @@ static int et131x_pci_init(struct et131x_adapter *adapter,
if (pci_write_config_byte(pdev, ET1310_PCI_L0L1LATENCY, 0x11)) {
dev_err(&pdev->dev,
"Could not write PCI config space for Latency Timers\n");
- return -EIO;
+ goto err_out;
}
/* Change the max read size to 2k */
- if (pci_read_config_byte(pdev, 0x51, &read_size_reg)) {
+ if (pci_read_config_word(pdev, cap + PCI_EXP_DEVCTL, &ctl)) {
dev_err(&pdev->dev,
"Could not read PCI config space for Max read size\n");
- return -EIO;
+ goto err_out;
}
- read_size_reg &= 0x8f;
- read_size_reg |= 0x40;
+ ctl = (ctl & ~PCI_EXP_DEVCTL_READRQ) | ( 0x04 << 12);
- if (pci_write_config_byte(pdev, 0x51, read_size_reg)) {
+ if (pci_write_config_word(pdev, cap + PCI_EXP_DEVCTL, ctl)) {
dev_err(&pdev->dev,
"Could not write PCI config space for Max read size\n");
- return -EIO;
+ goto err_out;
}
/* Get MAC address from config space if an eeprom exists, otherwise
@@ -4098,11 +4103,15 @@ static int et131x_pci_init(struct et131x_adapter *adapter,
if (pci_read_config_byte(pdev, ET1310_PCI_MAC_ADDRESS + i,
adapter->rom_addr + i)) {
dev_err(&pdev->dev, "Could not read PCI config space for MAC address\n");
- return -EIO;
+ goto err_out;
}
}
memcpy(adapter->addr, adapter->rom_addr, ETH_ALEN);
- return 0;
+out:
+ return rc;
+err_out:
+ rc = -EIO;
+ goto out;
}
/**
--
1.7.6.4
^ permalink raw reply related
* [U-Boot] [PATCH v4 2/2] NS16550: buffer reads
From: Wolfgang Denk @ 2011-10-23 17:15 UTC (permalink / raw)
To: u-boot
In-Reply-To: <CALButCLEg6c30En3N4LjPv1woJjFxwkEkHvQYMoJy8+MgSzqJw@mail.gmail.com>
Dear Graeme Russ,
In message <CALButCLEg6c30En3N4LjPv1woJjFxwkEkHvQYMoJy8+MgSzqJw@mail.gmail.com> you wrote:
>
> > It should be sufficient to send XOFF after receiving a newline
> > character.
>
> And, ergo, we send an XON when entering the readline function
This is probably not sufficient, as some commands take direct input.
I think both getc() and tstc() should check the XON/XOFF state and
send a XON if XOFF was sent before.
> Hmm, should we move readline() into console.c
Makes sense.
> > This should not be necessary. Actually the implementation should not
> > need to know about such special cases.
>
> So how does kermit/ymodem send the XON after the user has entered the
> receive command and we have sent the XOFF after the newline?
Upon the first getc() that follows?
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
Marriage is the sole cause of divorce.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.