* [PATCH 42/45] include/uapi/linux/rds.h: include linux/socket.h and linux/types.h always
From: Mikko Rapeli @ 2015-02-16 23:05 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: Mikko Rapeli, linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1424127948-22484-1-git-send-email-mikko.rapeli-X3B1VOXEql0@public.gmane.org>
Fixes compilation errors in userspace like:
error: unknown type name ‘__be32’
Signed-off-by: Mikko Rapeli <mikko.rapeli-X3B1VOXEql0@public.gmane.org>
---
include/uapi/linux/rds.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
index 7ff0c70..47b1ede 100644
--- a/include/uapi/linux/rds.h
+++ b/include/uapi/linux/rds.h
@@ -34,9 +34,9 @@
#ifndef _LINUX_RDS_H
#define _LINUX_RDS_H
-#ifdef __KERNEL__
#include <linux/types.h>
-#else
+#include <linux/socket.h>
+#ifndef __KERNEL__
#include <stdint.h>
#endif
--
2.1.4
^ permalink raw reply related
* [PATCH 43/45] include/uapi/linux/netfilter_bridge.h: include if.h
From: Mikko Rapeli @ 2015-02-16 23:05 UTC (permalink / raw)
To: linux-kernel
Cc: Mikko Rapeli, Pablo Neira Ayuso, Patrick McHardy,
Jozsef Kadlecsik, netfilter-devel, coreteam, linux-api
In-Reply-To: <1424127948-22484-1-git-send-email-mikko.rapeli@iki.fi>
Fixes userspace compilation errors like:
error: field ‘in’ has incomplete type
struct in_addr in;
Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
---
include/uapi/linux/netfilter_bridge.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/netfilter_bridge.h b/include/uapi/linux/netfilter_bridge.h
index a5eda6d..5bb0528 100644
--- a/include/uapi/linux/netfilter_bridge.h
+++ b/include/uapi/linux/netfilter_bridge.h
@@ -4,6 +4,7 @@
/* bridge-specific defines for netfilter.
*/
+#include <linux/if.h>
#include <linux/netfilter.h>
#include <linux/if_ether.h>
#include <linux/if_vlan.h>
--
2.1.4
^ permalink raw reply related
* [PATCH 44/45] nf_conntrack_tuple_common.h: include linux/types.h and linux/netfilter.h
From: Mikko Rapeli @ 2015-02-16 23:05 UTC (permalink / raw)
To: linux-kernel
Cc: Mikko Rapeli, Pablo Neira Ayuso, Patrick McHardy,
Jozsef Kadlecsik, netfilter-devel, coreteam, linux-api
In-Reply-To: <1424127948-22484-1-git-send-email-mikko.rapeli@iki.fi>
Fixes userspace compilation errors like:
error: unknown type name ‘__be16’
Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
---
include/uapi/linux/netfilter/nf_conntrack_tuple_common.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/uapi/linux/netfilter/nf_conntrack_tuple_common.h b/include/uapi/linux/netfilter/nf_conntrack_tuple_common.h
index 2f6bbc5..a9c3834 100644
--- a/include/uapi/linux/netfilter/nf_conntrack_tuple_common.h
+++ b/include/uapi/linux/netfilter/nf_conntrack_tuple_common.h
@@ -1,6 +1,9 @@
#ifndef _NF_CONNTRACK_TUPLE_COMMON_H
#define _NF_CONNTRACK_TUPLE_COMMON_H
+#include <linux/types.h>
+#include <linux/netfilter.h>
+
enum ip_conntrack_dir {
IP_CT_DIR_ORIGINAL,
IP_CT_DIR_REPLY,
--
2.1.4
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 45/45] include/uapi/asm-generic/ucontext.h: include signal.h and sigcontext.h
From: Mikko Rapeli @ 2015-02-16 23:05 UTC (permalink / raw)
To: linux-kernel; +Cc: Mikko Rapeli, Arnd Bergmann, linux-arch, linux-api
In-Reply-To: <1424127948-22484-1-git-send-email-mikko.rapeli@iki.fi>
Fixes userspace compiler errors:
error: unknown type name ‘stack_t’
error: field ‘uc_mcontext’ has incomplete type
struct sigcontext uc_mcontext;
error: unknown type name ‘sigset_t’
Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
---
include/uapi/asm-generic/ucontext.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/uapi/asm-generic/ucontext.h b/include/uapi/asm-generic/ucontext.h
index ad77343..31ece341 100644
--- a/include/uapi/asm-generic/ucontext.h
+++ b/include/uapi/asm-generic/ucontext.h
@@ -1,6 +1,9 @@
#ifndef __ASM_GENERIC_UCONTEXT_H
#define __ASM_GENERIC_UCONTEXT_H
+#include <asm-generic/signal.h>
+#include <asm/sigcontext.h>
+
struct ucontext {
unsigned long uc_flags;
struct ucontext *uc_link;
--
2.1.4
^ permalink raw reply related
* Re: [PATCH 15/45] dm-log-userspace.h: include stdint.h in userspace
From: Mike Snitzer @ 2015-02-16 23:32 UTC (permalink / raw)
To: Mikko Rapeli; +Cc: linux-kernel, Alasdair Kergon, dm-devel, linux-api
In-Reply-To: <1424127948-22484-16-git-send-email-mikko.rapeli@iki.fi>
On Mon, Feb 16 2015 at 6:05pm -0500,
Mikko Rapeli <mikko.rapeli@iki.fi> wrote:
> Fixes compilation error:
>
> linux/dm-log-userspace.h:416:2: error: unknown type name ‘uint64_t’
What userspace code are you compiling? Do you have a feel for when this
stopped working?
^ permalink raw reply
* Re: [PATCH 15/45] dm-log-userspace.h: include stdint.h in userspace
From: Mikko Rapeli @ 2015-02-16 23:48 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Alasdair Kergon,
dm-devel-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20150216233254.GA23033-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
On Mon, Feb 16, 2015 at 06:32:54PM -0500, Mike Snitzer wrote:
> On Mon, Feb 16 2015 at 6:05pm -0500,
> Mikko Rapeli <mikko.rapeli-X3B1VOXEql0@public.gmane.org> wrote:
>
> > Fixes compilation error:
> >
> > linux/dm-log-userspace.h:416:2: error: unknown type name ‘uint64_t’
>
> What userspace code are you compiling? Do you have a feel for when this
> stopped working?
See Message-Id: <1424127948-22484-1-git-send-email-mikko.rapeli-X3B1VOXEql0@public.gmane.org>
or https://lkml.org/lkml/2015/2/16/521
The failure comes from a test which tries to compile each exported header file
in userspace one at a time.
-Mikko
^ permalink raw reply
* Re: [PATCH RESEND 1/12] fs: Add support FALLOC_FL_INSERT_RANGE for fallocate
From: Dave Chinner @ 2015-02-16 23:53 UTC (permalink / raw)
To: Namjae Jeon
Cc: tytso, linux-fsdevel, linux-kernel, linux-ext4, xfs, a.sangwan,
bfoster, mtk.manpages, linux-man, linux-api, Namjae Jeon
In-Reply-To: <1424101680-3301-2-git-send-email-linkinjeon@gmail.com>
On Tue, Feb 17, 2015 at 12:47:48AM +0900, Namjae Jeon wrote:
> From: Namjae Jeon <namjae.jeon@samsung.com>
>
> FALLOC_FL_INSERT_RANGE command is the opposite command of
> FALLOC_FL_COLLAPSE_RANGE that is needed for advertisers or someone who want to
> add some data in the middle of file. FALLOC_FL_INSERT_RANGE will create space
> for writing new data within a file after shifting extents to right as given
> length. and this command also has same limitation as FALLOC_FL_COLLAPSE_RANGE,
> that is block boundary and use ftruncate(2) for crosses EOF.
>
> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
> Cc: Brian Foster<bfoster@redhat.com>
> ---
> fs/open.c | 8 +++++++-
> include/uapi/linux/falloc.h | 17 +++++++++++++++++
> 2 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/fs/open.c b/fs/open.c
> index 813be03..762fb45 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -232,7 +232,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>
> /* Return error if mode is not supported */
> if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
> - FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE))
> + FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
> + FALLOC_FL_INSERT_RANGE))
> return -EOPNOTSUPP;
Can we create a FALLOC_FL_SUPPORTED_MASK define in falloc.h
so that we only need to add new flags to the mask in rather than
change this code every time we add a new flag?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply
* Re: [PATCH 43/45] include/uapi/linux/netfilter_bridge.h: include if.h
From: Jan Engelhardt @ 2015-02-17 0:02 UTC (permalink / raw)
To: Mikko Rapeli
Cc: Linux Kernel Mailing List, Netfilter Developer Mailing List,
linux-api
In-Reply-To: <1424127948-22484-44-git-send-email-mikko.rapeli@iki.fi>
On Tuesday 2015-02-17 00:05, Mikko Rapeli wrote:
>Fixes userspace compilation errors like:
>
>error: field ‘in’ has incomplete type
>struct in_addr in;
>
>+#include <linux/if.h>
Patch 36/45 included linux/in.h instead of linux/if.h for addressing "in has incomplete
type". Should this be used here too?
^ permalink raw reply
* Re: [PATCH RESEND 2/12] xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate
From: Dave Chinner @ 2015-02-17 0:54 UTC (permalink / raw)
To: Namjae Jeon
Cc: tytso-3s7WtUTddSA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-ext4-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ,
a.sangwan-Sze3O3UU22JBDgjK7y7TUQ, bfoster-H+wXaHxf7aLQT0dZR+AlfA,
mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
linux-man-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA, Namjae Jeon
In-Reply-To: <1424101680-3301-3-git-send-email-linkinjeon-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
On Tue, Feb 17, 2015 at 12:47:49AM +0900, Namjae Jeon wrote:
> From: Namjae Jeon <namjae.jeon-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
>
> This patch implements fallocate's FALLOC_FL_INSERT_RANGE for XFS.
>
> 1) Make sure that both offset and len are block size aligned.
> 2) Update the i_size of inode by len bytes.
> 3) Compute the file's logical block number against offset. If the computed
> block number is not the starting block of the extent, split the extent
> such that the block number is the starting block of the extent.
> 4) Shift all the extents which are lying bewteen [offset, last allocated extent]
> towards right by len bytes. This step will make a hole of len bytes
> at offset.
>
> Signed-off-by: Namjae Jeon <namjae.jeon-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
> Signed-off-by: Ashish Sangwan <a.sangwan-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
> Reviewed-by: Brian Foster <bfoster-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
> fs/xfs/libxfs/xfs_bmap.c | 358 ++++++++++++++++++++++++++++++++++++++++------
> fs/xfs/libxfs/xfs_bmap.h | 13 +-
> fs/xfs/xfs_bmap_util.c | 126 +++++++++++-----
> fs/xfs/xfs_bmap_util.h | 2 +
> fs/xfs/xfs_file.c | 38 ++++-
> fs/xfs/xfs_trace.h | 1 +
> 6 files changed, 455 insertions(+), 83 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 61ec015..6699e53 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5518,50 +5518,86 @@ xfs_bmse_shift_one(
> int *current_ext,
> struct xfs_bmbt_rec_host *gotp,
> struct xfs_btree_cur *cur,
> - int *logflags)
> + int *logflags,
> + enum SHIFT_DIRECTION SHIFT)
Please don't shout. ;)
Lower case for types and variables, upper case for the enum values.
I also think the "shift" variable should be named "direction",
too, so the code reads "if (direction == SHIFT_LEFT)" and so is
clearly self documenting...
(only commenting once on this, please change it in other places)
as well ;)
> {
> struct xfs_ifork *ifp;
> xfs_fileoff_t startoff;
> - struct xfs_bmbt_rec_host *leftp;
> + struct xfs_bmbt_rec_host *contp;
> struct xfs_bmbt_irec got;
> - struct xfs_bmbt_irec left;
> + struct xfs_bmbt_irec cont;
Not sure what "cont" is short for. It's used as the "adjacent
extent" record, so that would be a better name IMO.
> int error;
> int i;
> + int total_extents;
>
> ifp = XFS_IFORK_PTR(ip, whichfork);
> + total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
>
> xfs_bmbt_get_all(gotp, &got);
> - startoff = got.br_startoff - offset_shift_fsb;
>
> /* delalloc extents should be prevented by caller */
> XFS_WANT_CORRUPTED_RETURN(!isnullstartblock(got.br_startblock));
>
> - /*
> - * Check for merge if we've got an extent to the left, otherwise make
> - * sure there's enough room at the start of the file for the shift.
> - */
> - if (*current_ext) {
> - /* grab the left extent and check for a large enough hole */
> - leftp = xfs_iext_get_ext(ifp, *current_ext - 1);
> - xfs_bmbt_get_all(leftp, &left);
> + if (SHIFT == SHIFT_LEFT) {
> + startoff = got.br_startoff - offset_shift_fsb;
>
> - if (startoff < left.br_startoff + left.br_blockcount)
> + /*
> + * Check for merge if we've got an extent to the left,
> + * otherwise make sure there's enough room at the start
> + * of the file for the shift.
> + */
> + if (*current_ext) {
> + /*
> + * grab the left extent and check for a large
> + * enough hole.
> + */
> + contp = xfs_iext_get_ext(ifp, *current_ext - 1);
> + xfs_bmbt_get_all(contp, &cont);
> +
> + if (startoff < cont.br_startoff + cont.br_blockcount)
> + return -EINVAL;
> +
> + /* check whether to merge the extent or shift it down */
> + if (xfs_bmse_can_merge(&cont, &got, offset_shift_fsb)) {
> + return xfs_bmse_merge(ip, whichfork,
> + offset_shift_fsb,
> + *current_ext, gotp, contp,
> + cur, logflags);
> + }
> + } else if (got.br_startoff < offset_shift_fsb)
> return -EINVAL;
This would be better written:
if (!*current_ext) {
if (got.br_startoff < offset_shift_fsb)
return -EINVAL;
goto update_current_ext;
}
and then the rest of the code in the shift left branch can drop a
level of indent and hence become less congested and easier to read.
> + } else {
> + startoff = got.br_startoff + offset_shift_fsb;
> + /*
> + * If this is not the last extent in the file, make sure there's
> + * enough room between current extent and next extent for
> + * accommodating the shift.
> + */
> + if (*current_ext < (total_extents - 1)) {
> + contp = xfs_iext_get_ext(ifp, *current_ext + 1);
> + xfs_bmbt_get_all(contp, &cont);
> + if (startoff + got.br_blockcount > cont.br_startoff)
> + return -EINVAL;
>
> - /* check whether to merge the extent or shift it down */
> - if (xfs_bmse_can_merge(&left, &got, offset_shift_fsb)) {
> - return xfs_bmse_merge(ip, whichfork, offset_shift_fsb,
> - *current_ext, gotp, leftp, cur,
> - logflags);
> + /*
> + * Unlike a left shift (which involves a hole punch),
> + * a right shift does not modify extent neighbors
> + * in any way. We should never find mergeable extents
> + * in this scenario. Check anyways and warn if we
> + * encounter two extents that could be one.
> + */
> + if (xfs_bmse_can_merge(&got, &cont, offset_shift_fsb))
> + WARN_ON_ONCE(1);
> }
Similarly:
/* nothing to move if this is the last extent */
if (*current_ext >= total_extents)
goto update_current_ext;
> - } else if (got.br_startoff < offset_shift_fsb)
> - return -EINVAL;
> -
> + }
> /*
> * Increment the extent index for the next iteration, update the start
> * offset of the in-core extent and update the btree if applicable.
> */
> - (*current_ext)++;
update_current_ext:
> + if (SHIFT == SHIFT_LEFT)
> + (*current_ext)++;
> + else
> + (*current_ext)--;
> xfs_bmbt_set_startoff(gotp, startoff);
> *logflags |= XFS_ILOG_CORE;
> if (!cur) {
> @@ -5581,10 +5617,10 @@ xfs_bmse_shift_one(
> }
>
> /*
> - * Shift extent records to the left to cover a hole.
> + * Shift extent records to the left/right to cover/create a hole.
> *
> * The maximum number of extents to be shifted in a single operation is
> - * @num_exts. @start_fsb specifies the file offset to start the shift and the
> + * @num_exts. @stop_fsb specifies the file offset at which to stop shift and the
> * file offset where we've left off is returned in @next_fsb. @offset_shift_fsb
> * is the length by which each extent is shifted. If there is no hole to shift
> * the extents into, this will be considered invalid operation and we abort
> @@ -5594,12 +5630,13 @@ int
> xfs_bmap_shift_extents(
> struct xfs_trans *tp,
> struct xfs_inode *ip,
> - xfs_fileoff_t start_fsb,
> + xfs_fileoff_t *next_fsb,
> xfs_fileoff_t offset_shift_fsb,
> int *done,
> - xfs_fileoff_t *next_fsb,
> + xfs_fileoff_t stop_fsb,
> xfs_fsblock_t *firstblock,
> struct xfs_bmap_free *flist,
> + enum SHIFT_DIRECTION SHIFT,
> int num_exts)
> {
> struct xfs_btree_cur *cur = NULL;
> @@ -5609,10 +5646,11 @@ xfs_bmap_shift_extents(
> struct xfs_ifork *ifp;
> xfs_extnum_t nexts = 0;
> xfs_extnum_t current_ext;
> + xfs_extnum_t total_extents;
> + xfs_extnum_t stop_extent;
> int error = 0;
> int whichfork = XFS_DATA_FORK;
> int logflags = 0;
> - int total_extents;
>
> if (unlikely(XFS_TEST_ERROR(
> (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
> @@ -5628,6 +5666,7 @@ xfs_bmap_shift_extents(
>
> ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
> ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> + ASSERT(SHIFT == SHIFT_LEFT || SHIFT == SHIFT_RIGHT);
>
> ifp = XFS_IFORK_PTR(ip, whichfork);
> if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> @@ -5645,43 +5684,85 @@ xfs_bmap_shift_extents(
> }
>
> /*
> + * There may be delalloc extents in the data fork before the range we
> + * are collapsing out, so we cannot use the count of real extents here.
> + * Instead we have to calculate it from the incore fork.
> + */
> + total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> + if (total_extents == 0) {
> + *done = 1;
> + goto del_cursor;
> + }
> +
> + /*
> + * In case of first right shift, we need to initialize next_fsb
> + */
> + if (*next_fsb == NULLFSBLOCK) {
> + ASSERT(SHIFT == SHIFT_RIGHT);
This should be at the top of the function. i.e.
ASSERT(*next_fsb != NULLFSBLOCK || direction == SHIFT_RIGHT)
> + gotp = xfs_iext_get_ext(ifp, total_extents - 1);
> + xfs_bmbt_get_all(gotp, &got);
> + *next_fsb = got.br_startoff;
> + if (stop_fsb > *next_fsb) {
> + *done = 1;
> + goto del_cursor;
> + }
> + }
> +
> + /* Lookup the extent index at which we have to stop */
> + if (SHIFT == SHIFT_RIGHT) {
> + gotp = xfs_iext_bno_to_ext(ifp, stop_fsb, &stop_extent);
> + /* Make stop_extent exclusive of shift range */
> + stop_extent--;
> + } else
> + stop_extent = total_extents;
> +
> + /*
> * Look up the extent index for the fsb where we start shifting. We can
> * henceforth iterate with current_ext as extent list changes are locked
> * out via ilock.
> *
> * gotp can be null in 2 cases: 1) if there are no extents or 2)
> - * start_fsb lies in a hole beyond which there are no extents. Either
> + * *next_fsb lies in a hole beyond which there are no extents. Either
> * way, we are done.
> */
> - gotp = xfs_iext_bno_to_ext(ifp, start_fsb, ¤t_ext);
> + gotp = xfs_iext_bno_to_ext(ifp, *next_fsb, ¤t_ext);
> if (!gotp) {
> *done = 1;
> goto del_cursor;
> }
>
> - /*
> - * There may be delalloc extents in the data fork before the range we
> - * are collapsing out, so we cannot use the count of real extents here.
> - * Instead we have to calculate it from the incore fork.
> - */
> - total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> - while (nexts++ < num_exts && current_ext < total_extents) {
> + /* some sanity checking before we finally start shifting extents */
> + if ((SHIFT == SHIFT_LEFT && current_ext >= stop_extent) ||
> + (SHIFT == SHIFT_RIGHT && current_ext <= stop_extent)) {
> + error = EIO;
error = -EIO;
> + goto del_cursor;
> + }
> +
> + while (nexts++ < num_exts) {
> error = xfs_bmse_shift_one(ip, whichfork, offset_shift_fsb,
> - ¤t_ext, gotp, cur, &logflags);
> + ¤t_ext, gotp, cur, &logflags,
> + SHIFT);
> if (error)
> goto del_cursor;
> + /*
> + * In case there was an extent merge after shifting extent,
> + * extent numbers would change.
> + * Update total extent count and grab the next record.
> + */
/*
* If there was an extent merge during the shift, the extent
* count can change. Update the total and grade the next record.
*/
> + if (SHIFT == SHIFT_LEFT) {
> + total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> + stop_extent = total_extents;
> + }
>
> - /* update total extent count and grab the next record */
> - total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> - if (current_ext >= total_extents)
> + if (current_ext == stop_extent) {
> + *done = 1;
> + *next_fsb = NULLFSBLOCK;
> break;
> + }
> gotp = xfs_iext_get_ext(ifp, current_ext);
> }
>
> - /* Check if we are done */
> - if (current_ext == total_extents) {
> - *done = 1;
> - } else if (next_fsb) {
> + if (!*done) {
> xfs_bmbt_get_all(gotp, &got);
> *next_fsb = got.br_startoff;
> }
> @@ -5696,3 +5777,192 @@ del_cursor:
>
> return error;
> }
> +
> +/*
> + * Splits an extent into two extents at split_fsb block that it is
> + * the first block of the current_ext. @current_ext is a target extent
> + * to be split. @split_fsb is a block where the extents is split.
> + * If split_fsb lies in a hole or the first block of extents, just return 0.
> + */
> +STATIC int
> +xfs_bmap_split_extent_at(
> + struct xfs_trans *tp,
> + struct xfs_inode *ip,
> + xfs_fileoff_t split_fsb,
> + xfs_fsblock_t *firstfsb,
> + struct xfs_bmap_free *free_list)
> +{
> + int whichfork = XFS_DATA_FORK;
> + struct xfs_btree_cur *cur = NULL;
> + struct xfs_bmbt_rec_host *gotp;
> + struct xfs_bmbt_irec got;
> + struct xfs_bmbt_irec new; /* split extent */
> + struct xfs_mount *mp = ip->i_mount;
> + struct xfs_ifork *ifp;
> + xfs_fsblock_t gotblkcnt; /* new block count for got */
> + xfs_extnum_t current_ext;
> + int error = 0;
> + int logflags = 0;
> + int i = 0;
> +
> + if (unlikely(XFS_TEST_ERROR(
> + (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
> + XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
> + mp, XFS_ERRTAG_BMAPIFORMAT, XFS_RANDOM_BMAPIFORMAT))) {
> + XFS_ERROR_REPORT("xfs_bmap_split_extent_at",
> + XFS_ERRLEVEL_LOW, mp);
> + return -EFSCORRUPTED;
> + }
> +
> + if (XFS_FORCED_SHUTDOWN(mp))
> + return -EIO;
> +
> + ifp = XFS_IFORK_PTR(ip, whichfork);
> + if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> + /* Read in all the extents */
> + error = xfs_iread_extents(tp, ip, whichfork);
> + if (error)
> + return error;
> + }
> +
> + gotp = xfs_iext_bno_to_ext(ifp, split_fsb, ¤t_ext);
> + /*
> + * gotp can be null in 2 cases: 1) if there are no extents
> + * or 2) split_fsb lies in a hole beyond which there are
> + * no extents. Either way, we are done.
> + */
> + if (!gotp)
> + return 0;
Comment can go before the call to xfs_iext_bno_to_ext().
> +
> + xfs_bmbt_get_all(gotp, &got);
> +
> + /*
> + * Check split_fsb lies in a hole or the start boundary offset
> + * of the extent.
> + */
> + if (got.br_startoff >= split_fsb)
> + return 0;
> +
> + gotblkcnt = split_fsb - got.br_startoff;
> + new.br_startoff = split_fsb;
> + new.br_startblock = got.br_startblock + gotblkcnt;
> + new.br_blockcount = got.br_blockcount - gotblkcnt;
> + new.br_state = got.br_state;
> +
> + if (ifp->if_flags & XFS_IFBROOT) {
> + cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork);
> + cur->bc_private.b.firstblock = *firstfsb;
> + cur->bc_private.b.flist = free_list;
> + cur->bc_private.b.flags = 0;
> + }
> +
> + if (cur) {
No need to close the XFS_IFBROOT branch and then check for cur;
we just allocated it inside the XFS_IFBROOT branch!
> + error = xfs_bmbt_lookup_eq(cur, got.br_startoff,
> + got.br_startblock,
> + got.br_blockcount,
> + &i);
> + if (error)
> + goto del_cursor;
> + XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor);
> + }
....
> @@ -1427,20 +1429,23 @@ xfs_collapse_file_space(
>
> /*
> * Writeback and invalidate cache for the remainder of the file as we're
> - * about to shift down every extent from the collapse range to EOF. The
> - * free of the collapse range above might have already done some of
> - * this, but we shouldn't rely on it to do anything outside of the range
> - * that was freed.
> + * about to shift down every extent from offset to EOF.
> */
> error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
> - offset + len, -1);
> + offset, -1);
> if (error)
> return error;
> error = invalidate_inode_pages2_range(VFS_I(ip)->i_mapping,
> - (offset + len) >> PAGE_CACHE_SHIFT, -1);
> + offset >> PAGE_CACHE_SHIFT, -1);
> if (error)
> return error;
>
> + if (SHIFT == SHIFT_RIGHT) {
> + error = xfs_bmap_split_extent(ip, stop_fsb);
> + if (error)
> + return error;
> + }
This needs a comment explaining why we are splitting an extent here.
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 1cdba95..222a91a 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -823,11 +823,13 @@ xfs_file_fallocate(
> long error;
> enum xfs_prealloc_flags flags = 0;
> loff_t new_size = 0;
> + int do_file_insert = 0;
bool rather than int.
>
> if (!S_ISREG(inode->i_mode))
> return -EINVAL;
> if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
> - FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE))
> + FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
> + FALLOC_FL_INSERT_RANGE))
> return -EOPNOTSUPP;
This should use a local define before the function such as:
#define XFS_FALLOC_FL_SUPPORTED \
(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \
FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE | \
FALLOC_FL_INSERT_RANGE)
This is similar to how we define supported checks for FIEMAP
operations in xfs_vn_fiemap().
>
> xfs_ilock(ip, XFS_IOLOCK_EXCL);
> @@ -857,6 +859,28 @@ xfs_file_fallocate(
> error = xfs_collapse_file_space(ip, offset, len);
> if (error)
> goto out_unlock;
> + } else if (mode & FALLOC_FL_INSERT_RANGE) {
> + unsigned blksize_mask = (1 << inode->i_blkbits) - 1;
> +
> + if (offset & blksize_mask || len & blksize_mask) {
> + error = -EINVAL;
> + goto out_unlock;
> + }
> +
> + /* Check for wrap through zero */
> + if (inode->i_size + len > inode->i_sb->s_maxbytes) {
> + error = -EFBIG;
> + goto out_unlock;
> + }
At first I thought that was a duplicate check of what is in
vfs_fallocate() (i.e. off + len > s_maxbytes). Can you change the
comment to read something like:
/* check the new inode size does not wrap through zero */
> +
> + /* Offset should be less than i_size */
> + if (offset >= i_size_read(inode)) {
> + error = -EINVAL;
> + goto out_unlock;
> + }
> +
> + new_size = i_size_read(inode) + len;
> + do_file_insert = 1;
Why do you use inode->i_size onthe wrap check, yet i_size_read()
twice here?
> } else {
> flags |= XFS_PREALLOC_SET;
>
> @@ -891,8 +915,20 @@ xfs_file_fallocate(
> iattr.ia_valid = ATTR_SIZE;
> iattr.ia_size = new_size;
> error = xfs_setattr_size(ip, &iattr);
> + if (error)
> + goto out_unlock;
> }
>
> + /*
> + * Some operations are performed after the inode size is updated. For
> + * example, insert range expands the address space of the file, shifts
> + * all subsequent extents to create a hole inside the file. Updating
> + * the size first ensures that shifted extents aren't left hanging
> + * past EOF in the event of a crash or failure.
> + */
/*
* Perform hole insertion now that the file size has been
* updated so that if we crash during the operation we don't
* leave shifted extents past EOF and hence losing access to
* the data that is contained within them.
*/
> + if (do_file_insert)
> + error = xfs_insert_file_space(ip, offset, len);
> +
> out_unlock:
> xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> return error;
Cheers,
Dave.
--
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH RESEND 4/12] xfsprog: xfsio: update xfs_io manpage for FALLOC_FL_INSERT_RANGE
From: Dave Chinner @ 2015-02-17 0:56 UTC (permalink / raw)
To: Namjae Jeon
Cc: tytso, linux-fsdevel, linux-kernel, linux-ext4, xfs, a.sangwan,
bfoster, mtk.manpages, linux-man, linux-api, Namjae Jeon
In-Reply-To: <1424101680-3301-5-git-send-email-linkinjeon@gmail.com>
On Tue, Feb 17, 2015 at 12:47:51AM +0900, Namjae Jeon wrote:
> From: Namjae Jeon <namjae.jeon@samsung.com>
>
> Update xfs_io manpage for FALLOC_FL_INSERT_RANGE.
>
> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Looks good. That'll fix up the complaining fstest ;)
Reviewed-by: Dave Chinner <dchinner@redhat.com>
-Dave
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply
* Re: [PATCH RESEND 11/12] xfstests: fsx: Add fallocate insert range operation
From: Dave Chinner @ 2015-02-17 1:00 UTC (permalink / raw)
To: Namjae Jeon
Cc: tytso, linux-fsdevel, linux-kernel, linux-ext4, xfs, a.sangwan,
bfoster, mtk.manpages, linux-man, linux-api, Namjae Jeon
In-Reply-To: <1424101680-3301-12-git-send-email-linkinjeon@gmail.com>
On Tue, Feb 17, 2015 at 12:47:58AM +0900, Namjae Jeon wrote:
> From: Namjae Jeon <namjae.jeon@samsung.com>
>
> This commit adds fallocate FALLOC_FL_INSERT_RANGE support for fsx.
>
> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> ---
> ltp/fsx.c | 124 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 114 insertions(+), 10 deletions(-)
.....
> @@ -339,6 +341,14 @@ logdump(void)
> lp->args[0] + lp->args[1])
> prt("\t******CCCC");
> break;
> + case OP_INSERT_RANGE:
> + prt("INSERT 0x%x thru 0x%x\t(0x%x bytes)",
> + lp->args[0], lp->args[0] + lp->args[1] - 1,
> + lp->args[1]);
> + if (badoff >= lp->args[0] && badoff <
> + lp->args[0] + lp->args[1])
> + prt("\t******CCCC");
Probably should output "*****IIII" so we can distinguish it from
collapse operations easily.
> @@ -1307,6 +1403,9 @@ usage(void)
> #ifdef FALLOC_FL_COLLAPSE_RANGE
> " -C: Do not use collapse range calls\n"
> #endif
> +#ifdef FALLOC_FL_INSERT_RANGE
> +" -i: Do not use insert range calls\n"
> +#endif
I'd make that "-I" rather than "-i" so it matches with the "-C" of
collapse range.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply
* Re: [PATCH RESEND 3/12] ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate
From: Dave Chinner @ 2015-02-17 1:02 UTC (permalink / raw)
To: Namjae Jeon
Cc: tytso, linux-fsdevel, linux-kernel, linux-ext4, xfs, a.sangwan,
bfoster, mtk.manpages, linux-man, linux-api, Namjae Jeon
In-Reply-To: <1424101680-3301-4-git-send-email-linkinjeon@gmail.com>
On Tue, Feb 17, 2015 at 12:47:50AM +0900, Namjae Jeon wrote:
> From: Namjae Jeon <namjae.jeon@samsung.com>
>
> This patch implements fallocate's FALLOC_FL_INSERT_RANGE for Ext4.
>
> 1) Make sure that both offset and len are block size aligned.
> 2) Update the i_size of inode by len bytes.
> 3) Compute the file's logical block number against offset. If the computed
> block number is not the starting block of the extent, split the extent
> such that the block number is the starting block of the extent.
> 4) Shift all the extents which are lying bewteen [offset, last allocated extent]
> towards right by len bytes. This step will make a hole of len bytes
> at offset.
>
> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
I'll leave this for the ext4 folk to review. If I don't get a review
by the time we're ready to merge the VFS and XFS code, then I'll
leave it out and let Ted merge it inhis own time.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply
* RE: [PATCH RESEND 11/12] xfstests: fsx: Add fallocate insert range operation
From: Namjae Jeon @ 2015-02-17 1:43 UTC (permalink / raw)
To: 'Dave Chinner', 'Namjae Jeon'
Cc: tytso, linux-fsdevel, linux-kernel, linux-ext4, xfs, a.sangwan,
bfoster, mtk.manpages, linux-man, linux-api
In-Reply-To: <20150217010033.GG4251@dastard>
>
> On Tue, Feb 17, 2015 at 12:47:58AM +0900, Namjae Jeon wrote:
> > From: Namjae Jeon <namjae.jeon@samsung.com>
> >
> > This commit adds fallocate FALLOC_FL_INSERT_RANGE support for fsx.
> >
> > Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
> > Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > ---
> > ltp/fsx.c | 124 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> > 1 file changed, 114 insertions(+), 10 deletions(-)
> .....
> > @@ -339,6 +341,14 @@ logdump(void)
> > lp->args[0] + lp->args[1])
> > prt("\t******CCCC");
> > break;
> > + case OP_INSERT_RANGE:
> > + prt("INSERT 0x%x thru 0x%x\t(0x%x bytes)",
> > + lp->args[0], lp->args[0] + lp->args[1] - 1,
> > + lp->args[1]);
> > + if (badoff >= lp->args[0] && badoff <
> > + lp->args[0] + lp->args[1])
> > + prt("\t******CCCC");
>
Hi Dave,
> Probably should output "*****IIII" so we can distinguish it from
> collapse operations easily.
Right. I will change it.
>
> > @@ -1307,6 +1403,9 @@ usage(void)
> > #ifdef FALLOC_FL_COLLAPSE_RANGE
> > " -C: Do not use collapse range calls\n"
> > #endif
> > +#ifdef FALLOC_FL_INSERT_RANGE
> > +" -i: Do not use insert range calls\n"
> > +#endif
>
> I'd make that "-I" rather than "-i" so it matches with the "-C" of
> collapse range.
Okay.
Thanks for your review!
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
^ permalink raw reply
* RE: [PATCH RESEND 2/12] xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate
From: Namjae Jeon @ 2015-02-17 1:47 UTC (permalink / raw)
To: 'Dave Chinner'
Cc: tytso-3s7WtUTddSA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-ext4-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ,
a.sangwan-Sze3O3UU22JBDgjK7y7TUQ, bfoster-H+wXaHxf7aLQT0dZR+AlfA,
mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
linux-man-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA, 'Namjae Jeon'
In-Reply-To: <20150217005408.GE4251@dastard>
Hi Dave,
I did totally check your review points.
I will share the patch soon.
Thanks for your review!
> On Tue, Feb 17, 2015 at 12:47:49AM +0900, Namjae Jeon wrote:
> > From: Namjae Jeon <namjae.jeon-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
> >
> > This patch implements fallocate's FALLOC_FL_INSERT_RANGE for XFS.
> >
> > 1) Make sure that both offset and len are block size aligned.
> > 2) Update the i_size of inode by len bytes.
> > 3) Compute the file's logical block number against offset. If the computed
> > block number is not the starting block of the extent, split the extent
> > such that the block number is the starting block of the extent.
> > 4) Shift all the extents which are lying bewteen [offset, last allocated extent]
> > towards right by len bytes. This step will make a hole of len bytes
> > at offset.
> >
> > Signed-off-by: Namjae Jeon <namjae.jeon-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
> > Signed-off-by: Ashish Sangwan <a.sangwan-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
> > Reviewed-by: Brian Foster <bfoster-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > ---
> > fs/xfs/libxfs/xfs_bmap.c | 358 ++++++++++++++++++++++++++++++++++++++++------
> > fs/xfs/libxfs/xfs_bmap.h | 13 +-
> > fs/xfs/xfs_bmap_util.c | 126 +++++++++++-----
> > fs/xfs/xfs_bmap_util.h | 2 +
> > fs/xfs/xfs_file.c | 38 ++++-
> > fs/xfs/xfs_trace.h | 1 +
> > 6 files changed, 455 insertions(+), 83 deletions(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index 61ec015..6699e53 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -5518,50 +5518,86 @@ xfs_bmse_shift_one(
> > int *current_ext,
> > struct xfs_bmbt_rec_host *gotp,
> > struct xfs_btree_cur *cur,
> > - int *logflags)
> > + int *logflags,
> > + enum SHIFT_DIRECTION SHIFT)
>
> Please don't shout. ;)
>
> Lower case for types and variables, upper case for the enum values.
> I also think the "shift" variable should be named "direction",
> too, so the code reads "if (direction == SHIFT_LEFT)" and so is
> clearly self documenting...
>
> (only commenting once on this, please change it in other places)
> as well ;)
>
> > {
> > struct xfs_ifork *ifp;
> > xfs_fileoff_t startoff;
> > - struct xfs_bmbt_rec_host *leftp;
> > + struct xfs_bmbt_rec_host *contp;
> > struct xfs_bmbt_irec got;
> > - struct xfs_bmbt_irec left;
> > + struct xfs_bmbt_irec cont;
>
> Not sure what "cont" is short for. It's used as the "adjacent
> extent" record, so that would be a better name IMO.
>
> > int error;
> > int i;
> > + int total_extents;
> >
> > ifp = XFS_IFORK_PTR(ip, whichfork);
> > + total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> >
> > xfs_bmbt_get_all(gotp, &got);
> > - startoff = got.br_startoff - offset_shift_fsb;
> >
> > /* delalloc extents should be prevented by caller */
> > XFS_WANT_CORRUPTED_RETURN(!isnullstartblock(got.br_startblock));
> >
> > - /*
> > - * Check for merge if we've got an extent to the left, otherwise make
> > - * sure there's enough room at the start of the file for the shift.
> > - */
> > - if (*current_ext) {
> > - /* grab the left extent and check for a large enough hole */
> > - leftp = xfs_iext_get_ext(ifp, *current_ext - 1);
> > - xfs_bmbt_get_all(leftp, &left);
> > + if (SHIFT == SHIFT_LEFT) {
> > + startoff = got.br_startoff - offset_shift_fsb;
> >
> > - if (startoff < left.br_startoff + left.br_blockcount)
> > + /*
> > + * Check for merge if we've got an extent to the left,
> > + * otherwise make sure there's enough room at the start
> > + * of the file for the shift.
> > + */
> > + if (*current_ext) {
> > + /*
> > + * grab the left extent and check for a large
> > + * enough hole.
> > + */
> > + contp = xfs_iext_get_ext(ifp, *current_ext - 1);
> > + xfs_bmbt_get_all(contp, &cont);
> > +
> > + if (startoff < cont.br_startoff + cont.br_blockcount)
> > + return -EINVAL;
> > +
> > + /* check whether to merge the extent or shift it down */
> > + if (xfs_bmse_can_merge(&cont, &got, offset_shift_fsb)) {
> > + return xfs_bmse_merge(ip, whichfork,
> > + offset_shift_fsb,
> > + *current_ext, gotp, contp,
> > + cur, logflags);
> > + }
> > + } else if (got.br_startoff < offset_shift_fsb)
> > return -EINVAL;
>
> This would be better written:
>
> if (!*current_ext) {
> if (got.br_startoff < offset_shift_fsb)
> return -EINVAL;
> goto update_current_ext;
> }
>
> and then the rest of the code in the shift left branch can drop a
> level of indent and hence become less congested and easier to read.
>
>
> > + } else {
> > + startoff = got.br_startoff + offset_shift_fsb;
> > + /*
> > + * If this is not the last extent in the file, make sure there's
> > + * enough room between current extent and next extent for
> > + * accommodating the shift.
> > + */
> > + if (*current_ext < (total_extents - 1)) {
> > + contp = xfs_iext_get_ext(ifp, *current_ext + 1);
> > + xfs_bmbt_get_all(contp, &cont);
> > + if (startoff + got.br_blockcount > cont.br_startoff)
> > + return -EINVAL;
> >
> > - /* check whether to merge the extent or shift it down */
> > - if (xfs_bmse_can_merge(&left, &got, offset_shift_fsb)) {
> > - return xfs_bmse_merge(ip, whichfork, offset_shift_fsb,
> > - *current_ext, gotp, leftp, cur,
> > - logflags);
> > + /*
> > + * Unlike a left shift (which involves a hole punch),
> > + * a right shift does not modify extent neighbors
> > + * in any way. We should never find mergeable extents
> > + * in this scenario. Check anyways and warn if we
> > + * encounter two extents that could be one.
> > + */
> > + if (xfs_bmse_can_merge(&got, &cont, offset_shift_fsb))
> > + WARN_ON_ONCE(1);
> > }
>
> Similarly:
> /* nothing to move if this is the last extent */
> if (*current_ext >= total_extents)
> goto update_current_ext;
>
> > - } else if (got.br_startoff < offset_shift_fsb)
> > - return -EINVAL;
> > -
> > + }
> > /*
> > * Increment the extent index for the next iteration, update the start
> > * offset of the in-core extent and update the btree if applicable.
> > */
> > - (*current_ext)++;
>
> update_current_ext:
> > + if (SHIFT == SHIFT_LEFT)
> > + (*current_ext)++;
> > + else
> > + (*current_ext)--;
> > xfs_bmbt_set_startoff(gotp, startoff);
> > *logflags |= XFS_ILOG_CORE;
> > if (!cur) {
> > @@ -5581,10 +5617,10 @@ xfs_bmse_shift_one(
> > }
> >
> > /*
> > - * Shift extent records to the left to cover a hole.
> > + * Shift extent records to the left/right to cover/create a hole.
> > *
> > * The maximum number of extents to be shifted in a single operation is
> > - * @num_exts. @start_fsb specifies the file offset to start the shift and the
> > + * @num_exts. @stop_fsb specifies the file offset at which to stop shift and the
> > * file offset where we've left off is returned in @next_fsb. @offset_shift_fsb
> > * is the length by which each extent is shifted. If there is no hole to shift
> > * the extents into, this will be considered invalid operation and we abort
> > @@ -5594,12 +5630,13 @@ int
> > xfs_bmap_shift_extents(
> > struct xfs_trans *tp,
> > struct xfs_inode *ip,
> > - xfs_fileoff_t start_fsb,
> > + xfs_fileoff_t *next_fsb,
> > xfs_fileoff_t offset_shift_fsb,
> > int *done,
> > - xfs_fileoff_t *next_fsb,
> > + xfs_fileoff_t stop_fsb,
> > xfs_fsblock_t *firstblock,
> > struct xfs_bmap_free *flist,
> > + enum SHIFT_DIRECTION SHIFT,
> > int num_exts)
> > {
> > struct xfs_btree_cur *cur = NULL;
> > @@ -5609,10 +5646,11 @@ xfs_bmap_shift_extents(
> > struct xfs_ifork *ifp;
> > xfs_extnum_t nexts = 0;
> > xfs_extnum_t current_ext;
> > + xfs_extnum_t total_extents;
> > + xfs_extnum_t stop_extent;
> > int error = 0;
> > int whichfork = XFS_DATA_FORK;
> > int logflags = 0;
> > - int total_extents;
> >
> > if (unlikely(XFS_TEST_ERROR(
> > (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
> > @@ -5628,6 +5666,7 @@ xfs_bmap_shift_extents(
> >
> > ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
> > ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> > + ASSERT(SHIFT == SHIFT_LEFT || SHIFT == SHIFT_RIGHT);
> >
> > ifp = XFS_IFORK_PTR(ip, whichfork);
> > if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> > @@ -5645,43 +5684,85 @@ xfs_bmap_shift_extents(
> > }
> >
> > /*
> > + * There may be delalloc extents in the data fork before the range we
> > + * are collapsing out, so we cannot use the count of real extents here.
> > + * Instead we have to calculate it from the incore fork.
> > + */
> > + total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> > + if (total_extents == 0) {
> > + *done = 1;
> > + goto del_cursor;
> > + }
> > +
> > + /*
> > + * In case of first right shift, we need to initialize next_fsb
> > + */
> > + if (*next_fsb == NULLFSBLOCK) {
> > + ASSERT(SHIFT == SHIFT_RIGHT);
>
> This should be at the top of the function. i.e.
>
> ASSERT(*next_fsb != NULLFSBLOCK || direction == SHIFT_RIGHT)
>
> > + gotp = xfs_iext_get_ext(ifp, total_extents - 1);
> > + xfs_bmbt_get_all(gotp, &got);
> > + *next_fsb = got.br_startoff;
> > + if (stop_fsb > *next_fsb) {
> > + *done = 1;
> > + goto del_cursor;
> > + }
> > + }
> > +
> > + /* Lookup the extent index at which we have to stop */
> > + if (SHIFT == SHIFT_RIGHT) {
> > + gotp = xfs_iext_bno_to_ext(ifp, stop_fsb, &stop_extent);
> > + /* Make stop_extent exclusive of shift range */
> > + stop_extent--;
> > + } else
> > + stop_extent = total_extents;
> > +
> > + /*
> > * Look up the extent index for the fsb where we start shifting. We can
> > * henceforth iterate with current_ext as extent list changes are locked
> > * out via ilock.
> > *
> > * gotp can be null in 2 cases: 1) if there are no extents or 2)
> > - * start_fsb lies in a hole beyond which there are no extents. Either
> > + * *next_fsb lies in a hole beyond which there are no extents. Either
> > * way, we are done.
> > */
> > - gotp = xfs_iext_bno_to_ext(ifp, start_fsb, ¤t_ext);
> > + gotp = xfs_iext_bno_to_ext(ifp, *next_fsb, ¤t_ext);
> > if (!gotp) {
> > *done = 1;
> > goto del_cursor;
> > }
> >
> > - /*
> > - * There may be delalloc extents in the data fork before the range we
> > - * are collapsing out, so we cannot use the count of real extents here.
> > - * Instead we have to calculate it from the incore fork.
> > - */
> > - total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> > - while (nexts++ < num_exts && current_ext < total_extents) {
> > + /* some sanity checking before we finally start shifting extents */
> > + if ((SHIFT == SHIFT_LEFT && current_ext >= stop_extent) ||
> > + (SHIFT == SHIFT_RIGHT && current_ext <= stop_extent)) {
> > + error = EIO;
>
> error = -EIO;
>
> > + goto del_cursor;
> > + }
> > +
> > + while (nexts++ < num_exts) {
> > error = xfs_bmse_shift_one(ip, whichfork, offset_shift_fsb,
> > - ¤t_ext, gotp, cur, &logflags);
> > + ¤t_ext, gotp, cur, &logflags,
> > + SHIFT);
> > if (error)
> > goto del_cursor;
> > + /*
> > + * In case there was an extent merge after shifting extent,
> > + * extent numbers would change.
> > + * Update total extent count and grab the next record.
> > + */
>
> /*
> * If there was an extent merge during the shift, the extent
> * count can change. Update the total and grade the next record.
> */
>
> > + if (SHIFT == SHIFT_LEFT) {
> > + total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> > + stop_extent = total_extents;
> > + }
> >
> > - /* update total extent count and grab the next record */
> > - total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> > - if (current_ext >= total_extents)
> > + if (current_ext == stop_extent) {
> > + *done = 1;
> > + *next_fsb = NULLFSBLOCK;
> > break;
> > + }
> > gotp = xfs_iext_get_ext(ifp, current_ext);
> > }
> >
> > - /* Check if we are done */
> > - if (current_ext == total_extents) {
> > - *done = 1;
> > - } else if (next_fsb) {
> > + if (!*done) {
> > xfs_bmbt_get_all(gotp, &got);
> > *next_fsb = got.br_startoff;
> > }
> > @@ -5696,3 +5777,192 @@ del_cursor:
> >
> > return error;
> > }
> > +
> > +/*
> > + * Splits an extent into two extents at split_fsb block that it is
> > + * the first block of the current_ext. @current_ext is a target extent
> > + * to be split. @split_fsb is a block where the extents is split.
> > + * If split_fsb lies in a hole or the first block of extents, just return 0.
> > + */
> > +STATIC int
> > +xfs_bmap_split_extent_at(
> > + struct xfs_trans *tp,
> > + struct xfs_inode *ip,
> > + xfs_fileoff_t split_fsb,
> > + xfs_fsblock_t *firstfsb,
> > + struct xfs_bmap_free *free_list)
> > +{
> > + int whichfork = XFS_DATA_FORK;
> > + struct xfs_btree_cur *cur = NULL;
> > + struct xfs_bmbt_rec_host *gotp;
> > + struct xfs_bmbt_irec got;
> > + struct xfs_bmbt_irec new; /* split extent */
> > + struct xfs_mount *mp = ip->i_mount;
> > + struct xfs_ifork *ifp;
> > + xfs_fsblock_t gotblkcnt; /* new block count for got */
> > + xfs_extnum_t current_ext;
> > + int error = 0;
> > + int logflags = 0;
> > + int i = 0;
> > +
> > + if (unlikely(XFS_TEST_ERROR(
> > + (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
> > + XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
> > + mp, XFS_ERRTAG_BMAPIFORMAT, XFS_RANDOM_BMAPIFORMAT))) {
> > + XFS_ERROR_REPORT("xfs_bmap_split_extent_at",
> > + XFS_ERRLEVEL_LOW, mp);
> > + return -EFSCORRUPTED;
> > + }
> > +
> > + if (XFS_FORCED_SHUTDOWN(mp))
> > + return -EIO;
> > +
> > + ifp = XFS_IFORK_PTR(ip, whichfork);
> > + if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> > + /* Read in all the extents */
> > + error = xfs_iread_extents(tp, ip, whichfork);
> > + if (error)
> > + return error;
> > + }
> > +
> > + gotp = xfs_iext_bno_to_ext(ifp, split_fsb, ¤t_ext);
> > + /*
> > + * gotp can be null in 2 cases: 1) if there are no extents
> > + * or 2) split_fsb lies in a hole beyond which there are
> > + * no extents. Either way, we are done.
> > + */
> > + if (!gotp)
> > + return 0;
>
> Comment can go before the call to xfs_iext_bno_to_ext().
>
> > +
> > + xfs_bmbt_get_all(gotp, &got);
> > +
> > + /*
> > + * Check split_fsb lies in a hole or the start boundary offset
> > + * of the extent.
> > + */
> > + if (got.br_startoff >= split_fsb)
> > + return 0;
> > +
> > + gotblkcnt = split_fsb - got.br_startoff;
> > + new.br_startoff = split_fsb;
> > + new.br_startblock = got.br_startblock + gotblkcnt;
> > + new.br_blockcount = got.br_blockcount - gotblkcnt;
> > + new.br_state = got.br_state;
> > +
> > + if (ifp->if_flags & XFS_IFBROOT) {
> > + cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork);
> > + cur->bc_private.b.firstblock = *firstfsb;
> > + cur->bc_private.b.flist = free_list;
> > + cur->bc_private.b.flags = 0;
> > + }
> > +
> > + if (cur) {
>
> No need to close the XFS_IFBROOT branch and then check for cur;
> we just allocated it inside the XFS_IFBROOT branch!
>
> > + error = xfs_bmbt_lookup_eq(cur, got.br_startoff,
> > + got.br_startblock,
> > + got.br_blockcount,
> > + &i);
> > + if (error)
> > + goto del_cursor;
> > + XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor);
> > + }
>
> ....
>
> > @@ -1427,20 +1429,23 @@ xfs_collapse_file_space(
> >
> > /*
> > * Writeback and invalidate cache for the remainder of the file as we're
> > - * about to shift down every extent from the collapse range to EOF. The
> > - * free of the collapse range above might have already done some of
> > - * this, but we shouldn't rely on it to do anything outside of the range
> > - * that was freed.
> > + * about to shift down every extent from offset to EOF.
> > */
> > error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
> > - offset + len, -1);
> > + offset, -1);
> > if (error)
> > return error;
> > error = invalidate_inode_pages2_range(VFS_I(ip)->i_mapping,
> > - (offset + len) >> PAGE_CACHE_SHIFT, -1);
> > + offset >> PAGE_CACHE_SHIFT, -1);
> > if (error)
> > return error;
> >
> > + if (SHIFT == SHIFT_RIGHT) {
> > + error = xfs_bmap_split_extent(ip, stop_fsb);
> > + if (error)
> > + return error;
> > + }
>
> This needs a comment explaining why we are splitting an extent here.
>
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index 1cdba95..222a91a 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -823,11 +823,13 @@ xfs_file_fallocate(
> > long error;
> > enum xfs_prealloc_flags flags = 0;
> > loff_t new_size = 0;
> > + int do_file_insert = 0;
>
> bool rather than int.
>
> >
> > if (!S_ISREG(inode->i_mode))
> > return -EINVAL;
> > if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
> > - FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE))
> > + FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
> > + FALLOC_FL_INSERT_RANGE))
> > return -EOPNOTSUPP;
>
> This should use a local define before the function such as:
>
> #define XFS_FALLOC_FL_SUPPORTED \
> (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \
> FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE | \
> FALLOC_FL_INSERT_RANGE)
>
> This is similar to how we define supported checks for FIEMAP
> operations in xfs_vn_fiemap().
>
> >
> > xfs_ilock(ip, XFS_IOLOCK_EXCL);
> > @@ -857,6 +859,28 @@ xfs_file_fallocate(
> > error = xfs_collapse_file_space(ip, offset, len);
> > if (error)
> > goto out_unlock;
> > + } else if (mode & FALLOC_FL_INSERT_RANGE) {
> > + unsigned blksize_mask = (1 << inode->i_blkbits) - 1;
> > +
> > + if (offset & blksize_mask || len & blksize_mask) {
> > + error = -EINVAL;
> > + goto out_unlock;
> > + }
> > +
> > + /* Check for wrap through zero */
> > + if (inode->i_size + len > inode->i_sb->s_maxbytes) {
> > + error = -EFBIG;
> > + goto out_unlock;
> > + }
>
> At first I thought that was a duplicate check of what is in
> vfs_fallocate() (i.e. off + len > s_maxbytes). Can you change the
> comment to read something like:
>
> /* check the new inode size does not wrap through zero */
>
> > +
> > + /* Offset should be less than i_size */
> > + if (offset >= i_size_read(inode)) {
> > + error = -EINVAL;
> > + goto out_unlock;
> > + }
> > +
> > + new_size = i_size_read(inode) + len;
> > + do_file_insert = 1;
>
> Why do you use inode->i_size onthe wrap check, yet i_size_read()
> twice here?
>
> > } else {
> > flags |= XFS_PREALLOC_SET;
> >
> > @@ -891,8 +915,20 @@ xfs_file_fallocate(
> > iattr.ia_valid = ATTR_SIZE;
> > iattr.ia_size = new_size;
> > error = xfs_setattr_size(ip, &iattr);
> > + if (error)
> > + goto out_unlock;
> > }
> >
> > + /*
> > + * Some operations are performed after the inode size is updated. For
> > + * example, insert range expands the address space of the file, shifts
> > + * all subsequent extents to create a hole inside the file. Updating
> > + * the size first ensures that shifted extents aren't left hanging
> > + * past EOF in the event of a crash or failure.
> > + */
>
> /*
> * Perform hole insertion now that the file size has been
> * updated so that if we crash during the operation we don't
> * leave shifted extents past EOF and hence losing access to
> * the data that is contained within them.
> */
> > + if (do_file_insert)
> > + error = xfs_insert_file_space(ip, offset, len);
> > +
> > out_unlock:
> > xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> > return error;
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* RE: [PATCH RESEND 1/12] fs: Add support FALLOC_FL_INSERT_RANGE for fallocate
From: Namjae Jeon @ 2015-02-17 1:49 UTC (permalink / raw)
To: 'Dave Chinner'
Cc: tytso, linux-fsdevel, linux-kernel, linux-ext4, xfs, a.sangwan,
bfoster, mtk.manpages, linux-man, linux-api,
'Namjae Jeon'
In-Reply-To: <20150216235346.GD4251@dastard>
> On Tue, Feb 17, 2015 at 12:47:48AM +0900, Namjae Jeon wrote:
> > From: Namjae Jeon <namjae.jeon@samsung.com>
> >
> > FALLOC_FL_INSERT_RANGE command is the opposite command of
> > FALLOC_FL_COLLAPSE_RANGE that is needed for advertisers or someone who want to
> > add some data in the middle of file. FALLOC_FL_INSERT_RANGE will create space
> > for writing new data within a file after shifting extents to right as given
> > length. and this command also has same limitation as FALLOC_FL_COLLAPSE_RANGE,
> > that is block boundary and use ftruncate(2) for crosses EOF.
> >
> > Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
> > Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
> > Cc: Brian Foster<bfoster@redhat.com>
> > ---
> > fs/open.c | 8 +++++++-
> > include/uapi/linux/falloc.h | 17 +++++++++++++++++
> > 2 files changed, 24 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/open.c b/fs/open.c
> > index 813be03..762fb45 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -232,7 +232,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> >
> > /* Return error if mode is not supported */
> > if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
> > - FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE))
> > + FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
> > + FALLOC_FL_INSERT_RANGE))
> > return -EOPNOTSUPP;
>
> Can we create a FALLOC_FL_SUPPORTED_MASK define in falloc.h
> so that we only need to add new flags to the mask in rather than
> change this code every time we add a new flag?
Sure, I will do it. and share the patch with the others you gave me review points soon.
Thanks for review!
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
^ permalink raw reply
* Re: [PATCH 0/2] ASoC: pcm512x: Add knobs to allow and control overclocking
From: Mark Brown @ 2015-02-17 1:53 UTC (permalink / raw)
To: Peter Rosin
Cc: alsa-devel, Peter Rosin, Liam Girdwood, Jaroslav Kysela,
Takashi Iwai, Lars-Peter Clausen, linux-api, linux-kernel
In-Reply-To: <1424120568-24648-2-git-send-email-peda@lysator.liu.se>
[-- Attachment #1: Type: text/plain, Size: 426 bytes --]
On Mon, Feb 16, 2015 at 10:02:46PM +0100, Peter Rosin wrote:
> I wasn't sure if I should add Documentation/* for these sysfs knobs
> or not? A lot of knobs do not have docs... And I'm not sure how I
> should name the doc-file since the pcm512x driver handles devices
> connected with both i2c and spi. So, this isn't perfect, suggestions
> welcome.
It's supposed to be mandatory to have documentation but not well
enforced.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply
* Re: [PATCH 41/45] include/uapi/sound/emu10k1.h: hide gpr_valid, tram_valid and code_valid in userspace
From: Takashi Iwai @ 2015-02-17 6:27 UTC (permalink / raw)
To: Mikko Rapeli
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Jaroslav Kysela,
alsa-devel-K7yf7f+aM1XWsZ/bQMPhNw,
linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1424127948-22484-42-git-send-email-mikko.rapeli-X3B1VOXEql0@public.gmane.org>
At Tue, 17 Feb 2015 00:05:44 +0100,
Mikko Rapeli wrote:
>
> The DECLARE_BITMAP macro is not available in userspace headers.
> Fixes userspace compile error:
> error: expected specifier-qualifier-list before ‘DECLARE_BITMAP’
It's nonsense. This results in an incompatible structure, thus ABI
would be broken completely (actually this will break the compile of
ld10k1).
Takashi
>
> Signed-off-by: Mikko Rapeli <mikko.rapeli-X3B1VOXEql0@public.gmane.org>
> ---
> include/uapi/sound/emu10k1.h | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/include/uapi/sound/emu10k1.h b/include/uapi/sound/emu10k1.h
> index ec1535b..f2fd870 100644
> --- a/include/uapi/sound/emu10k1.h
> +++ b/include/uapi/sound/emu10k1.h
> @@ -300,7 +300,9 @@ struct snd_emu10k1_fx8010_control_old_gpr {
> struct snd_emu10k1_fx8010_code {
> char name[128];
>
> +#ifdef __KERNEL__
> DECLARE_BITMAP(gpr_valid, 0x200); /* bitmask of valid initializers */
> +#endif
> __u32 __user *gpr_map; /* initializers */
>
> unsigned int gpr_add_control_count; /* count of GPR controls to add/replace */
> @@ -313,11 +315,15 @@ struct snd_emu10k1_fx8010_code {
> unsigned int gpr_list_control_total; /* total count of GPR controls */
> struct snd_emu10k1_fx8010_control_gpr __user *gpr_list_controls; /* listed GPR controls */
>
> +#ifdef __KERNEL__
> DECLARE_BITMAP(tram_valid, 0x100); /* bitmask of valid initializers */
> +#endif
> __u32 __user *tram_data_map; /* data initializers */
> __u32 __user *tram_addr_map; /* map initializers */
>
> +#ifdef __KERNEL__
> DECLARE_BITMAP(code_valid, 1024); /* bitmask of valid instructions */
> +#endif
> __u32 __user *code; /* one instruction - 64 bits */
> };
>
> --
> 2.1.4
>
^ permalink raw reply
* Re: [PATCH 24/45] hdspm.h: include stdint.h in userspace
From: Takashi Iwai @ 2015-02-17 6:46 UTC (permalink / raw)
To: Mikko Rapeli; +Cc: linux-api, alsa-devel, linux-kernel
In-Reply-To: <1424127948-22484-25-git-send-email-mikko.rapeli@iki.fi>
At Tue, 17 Feb 2015 00:05:27 +0100,
Mikko Rapeli wrote:
>
> Fixes compilation error:
>
> sound/hdspm.h:43:2: error: unknown type name ‘uint32_t’
>
> Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Applied for 3.21, thanks.
Takashi
> ---
> include/uapi/sound/hdspm.h | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/include/uapi/sound/hdspm.h b/include/uapi/sound/hdspm.h
> index d956c35..f799828 100644
> --- a/include/uapi/sound/hdspm.h
> +++ b/include/uapi/sound/hdspm.h
> @@ -20,6 +20,12 @@
> * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> */
>
> +#ifdef __KERNEL__
> +#include <linux/types.h>
> +#else
> +#include <stdint.h>
> +#endif
> +
> /* Maximum channels is 64 even on 56Mode you have 64playbacks to matrix */
> #define HDSPM_MAX_CHANNELS 64
>
> --
> 2.1.4
>
_______________________________________________
Alsa-devel mailing list
Alsa-devel@alsa-project.org
http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
^ permalink raw reply
* Re: [PATCH 35/45] include/uapi/sound/asound.h: include stdlib.h in userspace
From: Takashi Iwai @ 2015-02-17 6:46 UTC (permalink / raw)
To: Mikko Rapeli; +Cc: linux-kernel, Jaroslav Kysela, alsa-devel, linux-api
In-Reply-To: <1424127948-22484-36-git-send-email-mikko.rapeli@iki.fi>
At Tue, 17 Feb 2015 00:05:38 +0100,
Mikko Rapeli wrote:
>
> Fixes compiler errors like:
> error: field ‘trigger_tstamp’ has incomplete type
> error: invalid application of ‘sizeof’ to incomplete t
> ype ‘struct timespec’
>
> Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Applied for 3.21, thanks.
Takashi
> ---
> include/uapi/sound/asound.h | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/include/uapi/sound/asound.h b/include/uapi/sound/asound.h
> index 941d32f..af156b0 100644
> --- a/include/uapi/sound/asound.h
> +++ b/include/uapi/sound/asound.h
> @@ -25,6 +25,9 @@
>
> #include <linux/types.h>
>
> +#ifndef __KERNEL__
> +#include <stdlib.h>
> +#endif
>
> /*
> * protocol version
> --
> 2.1.4
>
^ permalink raw reply
* Re: [PATCH 39/45] include/uapi/sound/asequencer.h: include sound/asound.h
From: Takashi Iwai @ 2015-02-17 6:46 UTC (permalink / raw)
To: Mikko Rapeli; +Cc: linux-kernel, Jaroslav Kysela, alsa-devel, linux-api
In-Reply-To: <1424127948-22484-40-git-send-email-mikko.rapeli@iki.fi>
At Tue, 17 Feb 2015 00:05:42 +0100,
Mikko Rapeli wrote:
>
> Fixes userspace compilation error:
> error: unknown type name ‘snd_seq_client_type_t’
> snd_seq_client_type_t type; /* client type */
>
> Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Applied for 3.21, thanks.
Takashi
> ---
> include/uapi/sound/asequencer.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/uapi/sound/asequencer.h b/include/uapi/sound/asequencer.h
> index 09c8a00..5a5fa49 100644
> --- a/include/uapi/sound/asequencer.h
> +++ b/include/uapi/sound/asequencer.h
> @@ -22,6 +22,7 @@
> #ifndef _UAPI__SOUND_ASEQUENCER_H
> #define _UAPI__SOUND_ASEQUENCER_H
>
> +#include <sound/asound.h>
>
> /** version of the sequencer */
> #define SNDRV_SEQ_VERSION SNDRV_PROTOCOL_VERSION (1, 0, 1)
> --
> 2.1.4
>
^ permalink raw reply
* Re: [PATCH 40/45] include/uapi/sound/emu10k1.h: include sound/asound.h
From: Takashi Iwai @ 2015-02-17 6:46 UTC (permalink / raw)
To: Mikko Rapeli; +Cc: linux-kernel, Jaroslav Kysela, alsa-devel, linux-api
In-Reply-To: <1424127948-22484-41-git-send-email-mikko.rapeli@iki.fi>
At Tue, 17 Feb 2015 00:05:43 +0100,
Mikko Rapeli wrote:
>
> Fixes userspace compilation errors like:
> error: field ‘id’ has incomplete type
> struct snd_ctl_elem_id id; /* full control ID definition */
>
> Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Applied for 3.21, thanks.
Takashi
> ---
> include/uapi/sound/emu10k1.h | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/include/uapi/sound/emu10k1.h b/include/uapi/sound/emu10k1.h
> index d1bbaf7..ec1535b 100644
> --- a/include/uapi/sound/emu10k1.h
> +++ b/include/uapi/sound/emu10k1.h
> @@ -23,8 +23,7 @@
> #define _UAPI__SOUND_EMU10K1_H
>
> #include <linux/types.h>
> -
> -
> +#include <sound/asound.h>
>
> /*
> * ---- FX8010 ----
> --
> 2.1.4
>
^ permalink raw reply
* [PATCH 0/7] [RFC] kernel: add a netlink interface to get information about processes
From: Andrey Vagin @ 2015-02-17 8:20 UTC (permalink / raw)
To: linux-kernel
Cc: linux-api, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
Pavel Emelyanov, Roger Luethi, Andrey Vagin
Here is a preview version. It provides restricted set of functionality.
I would like to collect feedback about this idea.
Currently we use the proc file system, where all information are
presented in text files, what is convenient for humans. But if we need
to get information about processes from code (e.g. in C), the procfs
doesn't look so cool.
>From code we would prefer to get information in binary format and to be
able to specify which information and for which tasks are required. Here
is a new interface with all these features, which is called task_diag.
In addition it's much faster than procfs.
task_diag is based on netlink sockets and looks like socket-diag, which
is used to get information about sockets.
A request is described by the task_diag_pid structure:
struct task_diag_pid {
__u64 show_flags; /* specify which information are required */
__u64 dump_stratagy; /* specify a group of processes */
__u32 pid;
};
A respone is a set of netlink messages. Each message describes one task.
All task properties are divided on groups. A message contains the
TASK_DIAG_MSG group and other groups if they have been requested in
show_flags. For example, if show_flags contains TASK_DIAG_SHOW_CRED, a
response will contain the TASK_DIAG_CRED group which is described by the
task_diag_creds structure.
struct task_diag_msg {
__u32 tgid;
__u32 pid;
__u32 ppid;
__u32 tpid;
__u32 sid;
__u32 pgid;
__u8 state;
char comm[TASK_DIAG_COMM_LEN];
};
Another good feature of task_diag is an ability to request information
for a few processes. Currently here are two stratgies
TASK_DIAG_DUMP_ALL - get information for all tasks
TASK_DIAG_DUMP_CHILDREN - get information for children of a specified
tasks
The task diag is much faster than the proc file system. We don't need to
create a new file descriptor for each task. We need to send a request
and get a response. It allows to get information for a few task in one
request-response iteration.
I have compared performance of procfs and task-diag for the
"ps ax -o pid,ppid" command.
A test stand contains 10348 processes.
$ ps ax -o pid,ppid | wc -l
10348
$ time ps ax -o pid,ppid > /dev/null
real 0m1.073s
user 0m0.086s
sys 0m0.903s
$ time ./task_diag_all > /dev/null
real 0m0.037s
user 0m0.004s
sys 0m0.020s
And here are statistics about syscalls which were called by each
command.
$ perf stat -e syscalls:sys_exit* -- ps ax -o pid,ppid 2>&1 | grep syscalls | sort -n -r | head -n 5
20,713 syscalls:sys_exit_open
20,710 syscalls:sys_exit_close
20,708 syscalls:sys_exit_read
10,348 syscalls:sys_exit_newstat
31 syscalls:sys_exit_write
$ perf stat -e syscalls:sys_exit* -- ./task_diag_all 2>&1 | grep syscalls | sort -n -r | head -n 5
114 syscalls:sys_exit_recvfrom
49 syscalls:sys_exit_write
8 syscalls:sys_exit_mmap
4 syscalls:sys_exit_mprotect
3 syscalls:sys_exit_newfstat
You can find the test program from this experiment in the last patch.
The idea of this functionality was suggested by Pavel Emelyanov
(xemul@), when he found that operations with /proc forms a significant
part of a checkpointing time.
Ten years ago here was attempt to add a netlink interface to access to /proc
information:
http://lwn.net/Articles/99600/
Signed-off-by: Andrey Vagin <avagin@openvz.org>
git repo: https://github.com/avagin/linux-task-diag
Andrey Vagin (7):
[RFC] kernel: add a netlink interface to get information about tasks
kernel: move next_tgid from fs/proc
task-diag: add ability to get information about all tasks
task-diag: add a new group to get process credentials
kernel: add ability to iterate children of a specified task
task_diag: add ability to dump children
selftest: check the task_diag functinonality
fs/proc/array.c | 58 +---
fs/proc/base.c | 43 ---
include/linux/proc_fs.h | 13 +
include/uapi/linux/taskdiag.h | 89 ++++++
init/Kconfig | 12 +
kernel/Makefile | 1 +
kernel/pid.c | 94 ++++++
kernel/taskdiag.c | 343 +++++++++++++++++++++
tools/testing/selftests/task_diag/Makefile | 16 +
tools/testing/selftests/task_diag/task_diag.c | 59 ++++
tools/testing/selftests/task_diag/task_diag_all.c | 82 +++++
tools/testing/selftests/task_diag/task_diag_comm.c | 195 ++++++++++++
tools/testing/selftests/task_diag/task_diag_comm.h | 47 +++
tools/testing/selftests/task_diag/taskdiag.h | 1 +
14 files changed, 967 insertions(+), 86 deletions(-)
create mode 100644 include/uapi/linux/taskdiag.h
create mode 100644 kernel/taskdiag.c
create mode 100644 tools/testing/selftests/task_diag/Makefile
create mode 100644 tools/testing/selftests/task_diag/task_diag.c
create mode 100644 tools/testing/selftests/task_diag/task_diag_all.c
create mode 100644 tools/testing/selftests/task_diag/task_diag_comm.c
create mode 100644 tools/testing/selftests/task_diag/task_diag_comm.h
create mode 120000 tools/testing/selftests/task_diag/taskdiag.h
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Roger Luethi <rl@hellgate.ch>
--
2.1.0
^ permalink raw reply
* [PATCH 1/7] kernel: add a netlink interface to get information about tasks
From: Andrey Vagin @ 2015-02-17 8:20 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, Oleg Nesterov, Andrew Morton,
Cyrill Gorcunov, Pavel Emelyanov, Roger Luethi, Andrey Vagin
In-Reply-To: <1424161226-15176-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
task_diag is based on netlink sockets and looks like socket-diag, which
is used to get information about sockets.
task_diag is a new interface which is going to raplace the proc file
system in cases when we need to get information in a binary format.
A request messages is described by the task_diag_pid structure:
struct task_diag_pid {
__u64 show_flags;
__u64 dump_stratagy;
__u32 pid;
};
A respone is a set of netlink messages. Each message describes one task.
All task properties are divided on groups. A message contains the
TASK_DIAG_MSG group, and other groups if they have been requested in
show_flags. For example, if show_flags contains TASK_DIAG_SHOW_CRED, a
response will contain the TASK_DIAG_CRED group which is described by the
task_diag_creds structure.
struct task_diag_msg {
__u32 tgid;
__u32 pid;
__u32 ppid;
__u32 tpid;
__u32 sid;
__u32 pgid;
__u8 state;
char comm[TASK_DIAG_COMM_LEN];
};
The dump_stratagy field will be used in following patches to request
information for a group of processes.
Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
include/uapi/linux/taskdiag.h | 64 +++++++++++++++
init/Kconfig | 12 +++
kernel/Makefile | 1 +
kernel/taskdiag.c | 179 ++++++++++++++++++++++++++++++++++++++++++
4 files changed, 256 insertions(+)
create mode 100644 include/uapi/linux/taskdiag.h
create mode 100644 kernel/taskdiag.c
diff --git a/include/uapi/linux/taskdiag.h b/include/uapi/linux/taskdiag.h
new file mode 100644
index 0000000..e1feb35
--- /dev/null
+++ b/include/uapi/linux/taskdiag.h
@@ -0,0 +1,64 @@
+#ifndef _LINUX_TASKDIAG_H
+#define _LINUX_TASKDIAG_H
+
+#include <linux/types.h>
+#include <linux/capability.h>
+
+#define TASKDIAG_GENL_NAME "TASKDIAG"
+#define TASKDIAG_GENL_VERSION 0x1
+
+enum {
+ /* optional attributes which can be specified in show_flags */
+
+ /* other attributes */
+ TASK_DIAG_MSG = 64,
+};
+
+enum {
+ TASK_DIAG_RUNNING,
+ TASK_DIAG_INTERRUPTIBLE,
+ TASK_DIAG_UNINTERRUPTIBLE,
+ TASK_DIAG_STOPPED,
+ TASK_DIAG_TRACE_STOP,
+ TASK_DIAG_DEAD,
+ TASK_DIAG_ZOMBIE,
+};
+
+#define TASK_DIAG_COMM_LEN 16
+
+struct task_diag_msg {
+ __u32 tgid;
+ __u32 pid;
+ __u32 ppid;
+ __u32 tpid;
+ __u32 sid;
+ __u32 pgid;
+ __u8 state;
+ char comm[TASK_DIAG_COMM_LEN];
+};
+
+enum {
+ TASKDIAG_CMD_UNSPEC = 0, /* Reserved */
+ TASKDIAG_CMD_GET,
+ __TASKDIAG_CMD_MAX,
+};
+#define TASKDIAG_CMD_MAX (__TASKDIAG_CMD_MAX - 1)
+
+#define TASK_DIAG_DUMP_ALL 0
+
+struct task_diag_pid {
+ __u64 show_flags;
+ __u64 dump_stratagy;
+
+ __u32 pid;
+};
+
+enum {
+ TASKDIAG_CMD_ATTR_UNSPEC = 0,
+ TASKDIAG_CMD_ATTR_GET,
+ __TASKDIAG_CMD_ATTR_MAX,
+};
+
+#define TASKDIAG_CMD_ATTR_MAX (__TASKDIAG_CMD_ATTR_MAX - 1)
+
+#endif /* _LINUX_TASKDIAG_H */
diff --git a/init/Kconfig b/init/Kconfig
index 9afb971..e959ae3 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -430,6 +430,18 @@ config TASKSTATS
Say N if unsure.
+config TASK_DIAG
+ bool "Export task/process properties through netlink"
+ depends on NET
+ default n
+ help
+ Export selected properties for tasks/processes through the
+ generic netlink interface. Unlike the proc file system, task_diag
+ returns information in a binary format, allows to specify which
+ information are required.
+
+ Say N if unsure.
+
config TASK_DELAY_ACCT
bool "Enable per-task delay accounting"
depends on TASKSTATS
diff --git a/kernel/Makefile b/kernel/Makefile
index a59481a..2d4fc71 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -95,6 +95,7 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
obj-$(CONFIG_JUMP_LABEL) += jump_label.o
obj-$(CONFIG_CONTEXT_TRACKING) += context_tracking.o
obj-$(CONFIG_TORTURE_TEST) += torture.o
+obj-$(CONFIG_TASK_DIAG) += taskdiag.o
$(obj)/configs.o: $(obj)/config_data.h
diff --git a/kernel/taskdiag.c b/kernel/taskdiag.c
new file mode 100644
index 0000000..5faf3f0
--- /dev/null
+++ b/kernel/taskdiag.c
@@ -0,0 +1,179 @@
+#include <uapi/linux/taskdiag.h>
+#include <net/genetlink.h>
+#include <linux/pid_namespace.h>
+#include <linux/ptrace.h>
+#include <linux/proc_fs.h>
+#include <linux/sched.h>
+
+static struct genl_family family = {
+ .id = GENL_ID_GENERATE,
+ .name = TASKDIAG_GENL_NAME,
+ .version = TASKDIAG_GENL_VERSION,
+ .maxattr = TASKDIAG_CMD_ATTR_MAX,
+ .netnsok = true,
+};
+
+static size_t taskdiag_packet_size(u64 show_flags)
+{
+ return nla_total_size(sizeof(struct task_diag_msg));
+}
+
+/*
+ * The task state array is a strange "bitmap" of
+ * reasons to sleep. Thus "running" is zero, and
+ * you can test for combinations of others with
+ * simple bit tests.
+ */
+static const __u8 task_state_array[] = {
+ TASK_DIAG_RUNNING,
+ TASK_DIAG_INTERRUPTIBLE,
+ TASK_DIAG_UNINTERRUPTIBLE,
+ TASK_DIAG_STOPPED,
+ TASK_DIAG_TRACE_STOP,
+ TASK_DIAG_DEAD,
+ TASK_DIAG_ZOMBIE,
+};
+
+static inline const __u8 get_task_state(struct task_struct *tsk)
+{
+ unsigned int state = (tsk->state | tsk->exit_state) & TASK_REPORT;
+
+ BUILD_BUG_ON(1 + ilog2(TASK_REPORT) != ARRAY_SIZE(task_state_array)-1);
+
+ return task_state_array[fls(state)];
+}
+
+static int fill_task_msg(struct task_struct *p, struct sk_buff *skb)
+{
+ struct pid_namespace *ns = task_active_pid_ns(current);
+ struct task_diag_msg *msg;
+ struct nlattr *attr;
+ char tcomm[sizeof(p->comm)];
+ struct task_struct *tracer;
+
+ attr = nla_reserve(skb, TASK_DIAG_MSG, sizeof(struct task_diag_msg));
+ if (!attr)
+ return -EMSGSIZE;
+
+ msg = nla_data(attr);
+
+ rcu_read_lock();
+ msg->ppid = pid_alive(p) ?
+ task_tgid_nr_ns(rcu_dereference(p->real_parent), ns) : 0;
+
+ msg->tpid = 0;
+ tracer = ptrace_parent(p);
+ if (tracer)
+ msg->tpid = task_pid_nr_ns(tracer, ns);
+
+ msg->tgid = task_tgid_nr_ns(p, ns);
+ msg->pid = task_pid_nr_ns(p, ns);
+ msg->sid = task_session_nr_ns(p, ns);
+ msg->pgid = task_pgrp_nr_ns(p, ns);
+
+ rcu_read_unlock();
+
+ get_task_comm(tcomm, p);
+ memset(msg->comm, 0, TASK_DIAG_COMM_LEN);
+ strncpy(msg->comm, tcomm, TASK_DIAG_COMM_LEN);
+
+ msg->state = get_task_state(p);
+
+ return 0;
+}
+
+static int task_diag_fill(struct task_struct *tsk, struct sk_buff *skb,
+ u64 show_flags, u32 portid, u32 seq)
+{
+ void *reply;
+ int err;
+
+ reply = genlmsg_put(skb, portid, seq, &family, 0, TASKDIAG_CMD_GET);
+ if (reply == NULL)
+ return -EMSGSIZE;
+
+ err = fill_task_msg(tsk, skb);
+ if (err)
+ goto err;
+
+ return genlmsg_end(skb, reply);
+err:
+ genlmsg_cancel(skb, reply);
+ return err;
+}
+
+static int taskdiag_doit(struct sk_buff *skb, struct genl_info *info)
+{
+ struct task_struct *tsk = NULL;
+ struct task_diag_pid *req;
+ struct sk_buff *msg;
+ size_t size;
+ int rc;
+
+ req = nla_data(info->attrs[TASKDIAG_CMD_ATTR_GET]);
+ if (req == NULL)
+ return -EINVAL;
+
+ if (nla_len(info->attrs[TASKDIAG_CMD_ATTR_GET]) < sizeof(*req))
+ return -EINVAL;
+
+ size = taskdiag_packet_size(req->show_flags);
+ msg = genlmsg_new(size, GFP_KERNEL);
+ if (!msg)
+ return -ENOMEM;
+
+ rcu_read_lock();
+ tsk = find_task_by_vpid(req->pid);
+ if (tsk)
+ get_task_struct(tsk);
+ rcu_read_unlock();
+ if (!tsk) {
+ rc = -ESRCH;
+ goto err;
+ };
+
+ if (!ptrace_may_access(tsk, PTRACE_MODE_READ)) {
+ put_task_struct(tsk);
+ rc = -EPERM;
+ goto err;
+ }
+
+ rc = task_diag_fill(tsk, msg, req->show_flags,
+ info->snd_portid, info->snd_seq);
+ put_task_struct(tsk);
+ if (rc < 0)
+ goto err;
+
+ return genlmsg_reply(msg, info);
+err:
+ nlmsg_free(msg);
+ return rc;
+}
+
+static const struct nla_policy
+ taskstats_cmd_get_policy[TASKDIAG_CMD_ATTR_MAX+1] = {
+ [TASKDIAG_CMD_ATTR_GET] = { .type = NLA_UNSPEC,
+ .len = sizeof(struct task_diag_pid)
+ },
+};
+
+static const struct genl_ops taskdiag_ops[] = {
+ {
+ .cmd = TASKDIAG_CMD_GET,
+ .doit = taskdiag_doit,
+ .policy = taskstats_cmd_get_policy,
+ },
+};
+
+static int __init taskdiag_init(void)
+{
+ int rc;
+
+ rc = genl_register_family_with_ops(&family, taskdiag_ops);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+late_initcall(taskdiag_init);
--
2.1.0
^ permalink raw reply related
* [PATCH 2/7] kernel: move next_tgid from fs/proc
From: Andrey Vagin @ 2015-02-17 8:20 UTC (permalink / raw)
To: linux-kernel
Cc: linux-api, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
Pavel Emelyanov, Roger Luethi, Andrey Vagin
In-Reply-To: <1424161226-15176-1-git-send-email-avagin@openvz.org>
This function will be used in task_diag.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
fs/proc/base.c | 43 -------------------------------------------
include/linux/proc_fs.h | 7 +++++++
kernel/pid.c | 39 +++++++++++++++++++++++++++++++++++++++
3 files changed, 46 insertions(+), 43 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 3f3d7ae..24ed43d 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2795,49 +2795,6 @@ out:
return ERR_PTR(result);
}
-/*
- * Find the first task with tgid >= tgid
- *
- */
-struct tgid_iter {
- unsigned int tgid;
- struct task_struct *task;
-};
-static struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter iter)
-{
- struct pid *pid;
-
- if (iter.task)
- put_task_struct(iter.task);
- rcu_read_lock();
-retry:
- iter.task = NULL;
- pid = find_ge_pid(iter.tgid, ns);
- if (pid) {
- iter.tgid = pid_nr_ns(pid, ns);
- iter.task = pid_task(pid, PIDTYPE_PID);
- /* What we to know is if the pid we have find is the
- * pid of a thread_group_leader. Testing for task
- * being a thread_group_leader is the obvious thing
- * todo but there is a window when it fails, due to
- * the pid transfer logic in de_thread.
- *
- * So we perform the straight forward test of seeing
- * if the pid we have found is the pid of a thread
- * group leader, and don't worry if the task we have
- * found doesn't happen to be a thread group leader.
- * As we don't care in the case of readdir.
- */
- if (!iter.task || !has_group_leader_pid(iter.task)) {
- iter.tgid += 1;
- goto retry;
- }
- get_task_struct(iter.task);
- }
- rcu_read_unlock();
- return iter;
-}
-
#define TGID_OFFSET (FIRST_PROCESS_ENTRY + 2)
/* for the /proc/ directory itself, after non-process stuff has been done */
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index b97bf2e..136b6ed 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -82,4 +82,11 @@ static inline struct proc_dir_entry *proc_net_mkdir(
return proc_mkdir_data(name, 0, parent, net);
}
+struct tgid_iter {
+ unsigned int tgid;
+ struct task_struct *task;
+};
+
+struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter iter);
+
#endif /* _LINUX_PROC_FS_H */
diff --git a/kernel/pid.c b/kernel/pid.c
index cd36a5e..082307a 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -568,6 +568,45 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
}
/*
+ * Find the first task with tgid >= tgid
+ *
+ */
+struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter iter)
+{
+ struct pid *pid;
+
+ if (iter.task)
+ put_task_struct(iter.task);
+ rcu_read_lock();
+retry:
+ iter.task = NULL;
+ pid = find_ge_pid(iter.tgid, ns);
+ if (pid) {
+ iter.tgid = pid_nr_ns(pid, ns);
+ iter.task = pid_task(pid, PIDTYPE_PID);
+ /* What we to know is if the pid we have find is the
+ * pid of a thread_group_leader. Testing for task
+ * being a thread_group_leader is the obvious thing
+ * todo but there is a window when it fails, due to
+ * the pid transfer logic in de_thread.
+ *
+ * So we perform the straight forward test of seeing
+ * if the pid we have found is the pid of a thread
+ * group leader, and don't worry if the task we have
+ * found doesn't happen to be a thread group leader.
+ * As we don't care in the case of readdir.
+ */
+ if (!iter.task || !has_group_leader_pid(iter.task)) {
+ iter.tgid += 1;
+ goto retry;
+ }
+ get_task_struct(iter.task);
+ }
+ rcu_read_unlock();
+ return iter;
+}
+
+/*
* The pid hash table is scaled according to the amount of memory in the
* machine. From a minimum of 16 slots up to 4096 slots at one gigabyte or
* more.
--
2.1.0
^ permalink raw reply related
* [PATCH 3/7] task-diag: add ability to get information about all tasks
From: Andrey Vagin @ 2015-02-17 8:20 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, Oleg Nesterov, Andrew Morton,
Cyrill Gorcunov, Pavel Emelyanov, Roger Luethi, Andrey Vagin
In-Reply-To: <1424161226-15176-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
For that we need to set NLM_F_DUMP. Currently here are no
filters. Any suggestions are welcome.
I think we can add request for children, threads, session or group
members.
Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
kernel/taskdiag.c | 41 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/kernel/taskdiag.c b/kernel/taskdiag.c
index 5faf3f0..da4a51b 100644
--- a/kernel/taskdiag.c
+++ b/kernel/taskdiag.c
@@ -102,6 +102,46 @@ err:
return err;
}
+static int taskdiag_dumpid(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct pid_namespace *ns = task_active_pid_ns(current);
+ struct tgid_iter iter;
+ struct nlattr *na;
+ struct task_diag_pid *req;
+ int rc;
+
+ if (nlmsg_len(cb->nlh) < GENL_HDRLEN + sizeof(*req))
+ return -EINVAL;
+
+ na = nlmsg_data(cb->nlh) + GENL_HDRLEN;
+ if (na->nla_type < 0)
+ return -EINVAL;
+
+ req = (struct task_diag_pid *) nla_data(na);
+
+ iter.tgid = cb->args[0];
+ iter.task = NULL;
+ for (iter = next_tgid(ns, iter);
+ iter.task;
+ iter.tgid += 1, iter = next_tgid(ns, iter)) {
+ if (!ptrace_may_access(iter.task, PTRACE_MODE_READ))
+ continue;
+
+ rc = task_diag_fill(iter.task, skb, req->show_flags,
+ NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq);
+ if (rc < 0) {
+ put_task_struct(iter.task);
+ if (rc != -EMSGSIZE)
+ return rc;
+ break;
+ }
+ }
+
+ cb->args[0] = iter.tgid;
+
+ return skb->len;
+}
+
static int taskdiag_doit(struct sk_buff *skb, struct genl_info *info)
{
struct task_struct *tsk = NULL;
@@ -161,6 +201,7 @@ static const struct genl_ops taskdiag_ops[] = {
{
.cmd = TASKDIAG_CMD_GET,
.doit = taskdiag_doit,
+ .dumpit = taskdiag_dumpid,
.policy = taskstats_cmd_get_policy,
},
};
--
2.1.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox