* Re: [PATCH v2 04/18] fpga: dfl: fme: support 512bit data width PR
From: Wu Hao @ 2019-05-17 3:50 UTC (permalink / raw)
To: Alan Tull
Cc: Scott Wood, Moritz Fischer, linux-fpga, linux-kernel, linux-api,
Ananda Ravuri, Xu Yilun
In-Reply-To: <CANk1AXSuacSP79gtxiOUdf3pKgq0-eWXoeoQdTYX4QV4=NedDA@mail.gmail.com>
On Thu, May 16, 2019 at 12:35:27PM -0500, Alan Tull wrote:
> On Mon, Apr 29, 2019 at 4:12 AM Wu Hao <hao.wu@intel.com> wrote:
>
> It looks like this addressed the review comments. Adding my Ack. Is
> there anything else on this patch?
Nothing else, just addressed the review comments. : )
Thanks for the review and ack.
Hao
>
> Alan
>
> >
> > In early partial reconfiguration private feature, it only
> > supports 32bit data width when writing data to hardware for
> > PR. 512bit data width PR support is an important optimization
> > for some specific solutions (e.g. XEON with FPGA integrated),
> > it allows driver to use AVX512 instruction to improve the
> > performance of partial reconfiguration. e.g. programming one
> > 100MB bitstream image via this 512bit data width PR hardware
> > only takes ~300ms, but 32bit revision requires ~3s per test
> > result.
> >
> > Please note now this optimization is only done on revision 2
> > of this PR private feature which is only used in integrated
> > solution that AVX512 is always supported. This revision 2
> > hardware doesn't support 32bit PR.
> >
> > Signed-off-by: Ananda Ravuri <ananda.ravuri@intel.com>
> > Signed-off-by: Xu Yilun <yilun.xu@intel.com>
> > Signed-off-by: Wu Hao <hao.wu@intel.com>
>
> Acked-by: Alan Tull <atull@kernel.org>
>
>
> > ---
> > v2: check AVX512 support using cpu_feature_enabled()
> > fix other comments from Scott Wood <swood@redhat.com>
> > ---
> > drivers/fpga/dfl-fme-main.c | 3 ++
> > drivers/fpga/dfl-fme-mgr.c | 113 +++++++++++++++++++++++++++++++++++++-------
> > drivers/fpga/dfl-fme-pr.c | 43 +++++++++++------
> > drivers/fpga/dfl-fme.h | 2 +
> > drivers/fpga/dfl.h | 5 ++
> > 5 files changed, 135 insertions(+), 31 deletions(-)
> >
> > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> > index 086ad24..076d74f 100644
> > --- a/drivers/fpga/dfl-fme-main.c
> > +++ b/drivers/fpga/dfl-fme-main.c
> > @@ -21,6 +21,8 @@
> > #include "dfl.h"
> > #include "dfl-fme.h"
> >
> > +#define DRV_VERSION "0.8"
> > +
> > static ssize_t ports_num_show(struct device *dev,
> > struct device_attribute *attr, char *buf)
> > {
> > @@ -277,3 +279,4 @@ static int fme_remove(struct platform_device *pdev)
> > MODULE_AUTHOR("Intel Corporation");
> > MODULE_LICENSE("GPL v2");
> > MODULE_ALIAS("platform:dfl-fme");
> > +MODULE_VERSION(DRV_VERSION);
> > diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c
> > index b3f7eee..d1a4ba5 100644
> > --- a/drivers/fpga/dfl-fme-mgr.c
> > +++ b/drivers/fpga/dfl-fme-mgr.c
> > @@ -22,14 +22,18 @@
> > #include <linux/io-64-nonatomic-lo-hi.h>
> > #include <linux/fpga/fpga-mgr.h>
> >
> > +#include "dfl.h"
> > #include "dfl-fme-pr.h"
> >
> > +#define DRV_VERSION "0.8"
> > +
> > /* FME Partial Reconfiguration Sub Feature Register Set */
> > #define FME_PR_DFH 0x0
> > #define FME_PR_CTRL 0x8
> > #define FME_PR_STS 0x10
> > #define FME_PR_DATA 0x18
> > #define FME_PR_ERR 0x20
> > +#define FME_PR_512_DATA 0x40 /* Data Register for 512bit datawidth PR */
> > #define FME_PR_INTFC_ID_L 0xA8
> > #define FME_PR_INTFC_ID_H 0xB0
> >
> > @@ -67,8 +71,43 @@
> > #define PR_WAIT_TIMEOUT 8000000
> > #define PR_HOST_STATUS_IDLE 0
> >
> > +#if defined(CONFIG_X86) && defined(CONFIG_AS_AVX512)
> > +
> > +#include <linux/cpufeature.h>
> > +#include <asm/fpu/api.h>
> > +
> > +static inline int is_cpu_avx512_enabled(void)
> > +{
> > + return cpu_feature_enabled(X86_FEATURE_AVX512F);
> > +}
> > +
> > +static inline void copy512(const void *src, void __iomem *dst)
> > +{
> > + kernel_fpu_begin();
> > +
> > + asm volatile("vmovdqu64 (%0), %%zmm0;"
> > + "vmovntdq %%zmm0, (%1);"
> > + :
> > + : "r"(src), "r"(dst)
> > + : "memory");
> > +
> > + kernel_fpu_end();
> > +}
> > +#else
> > +static inline int is_cpu_avx512_enabled(void)
> > +{
> > + return 0;
> > +}
> > +
> > +static inline void copy512(const void *src, void __iomem *dst)
> > +{
> > + WARN_ON_ONCE(1);
> > +}
> > +#endif
> > +
> > struct fme_mgr_priv {
> > void __iomem *ioaddr;
> > + unsigned int pr_datawidth;
> > u64 pr_error;
> > };
> >
> > @@ -169,7 +208,7 @@ static int fme_mgr_write(struct fpga_manager *mgr,
> > struct fme_mgr_priv *priv = mgr->priv;
> > void __iomem *fme_pr = priv->ioaddr;
> > u64 pr_ctrl, pr_status, pr_data;
> > - int delay = 0, pr_credit, i = 0;
> > + int ret = 0, delay = 0, pr_credit;
> >
> > dev_dbg(dev, "start request\n");
> >
> > @@ -181,9 +220,9 @@ static int fme_mgr_write(struct fpga_manager *mgr,
> >
> > /*
> > * driver can push data to PR hardware using PR_DATA register once HW
> > - * has enough pr_credit (> 1), pr_credit reduces one for every 32bit
> > - * pr data write to PR_DATA register. If pr_credit <= 1, driver needs
> > - * to wait for enough pr_credit from hardware by polling.
> > + * has enough pr_credit (> 1), pr_credit reduces one for every pr data
> > + * width write to PR_DATA register. If pr_credit <= 1, driver needs to
> > + * wait for enough pr_credit from hardware by polling.
> > */
> > pr_status = readq(fme_pr + FME_PR_STS);
> > pr_credit = FIELD_GET(FME_PR_STS_PR_CREDIT, pr_status);
> > @@ -192,7 +231,8 @@ static int fme_mgr_write(struct fpga_manager *mgr,
> > while (pr_credit <= 1) {
> > if (delay++ > PR_WAIT_TIMEOUT) {
> > dev_err(dev, "PR_CREDIT timeout\n");
> > - return -ETIMEDOUT;
> > + ret = -ETIMEDOUT;
> > + goto done;
> > }
> > udelay(1);
> >
> > @@ -200,21 +240,27 @@ static int fme_mgr_write(struct fpga_manager *mgr,
> > pr_credit = FIELD_GET(FME_PR_STS_PR_CREDIT, pr_status);
> > }
> >
> > - if (count < 4) {
> > - dev_err(dev, "Invalid PR bitstream size\n");
> > - return -EINVAL;
> > + WARN_ON(count < priv->pr_datawidth);
> > +
> > + switch (priv->pr_datawidth) {
> > + case 4:
> > + pr_data = FIELD_PREP(FME_PR_DATA_PR_DATA_RAW,
> > + *(u32 *)buf);
> > + writeq(pr_data, fme_pr + FME_PR_DATA);
> > + break;
> > + case 64:
> > + copy512(buf, fme_pr + FME_PR_512_DATA);
> > + break;
> > + default:
> > + WARN_ON_ONCE(1);
> > }
> > -
> > - pr_data = 0;
> > - pr_data |= FIELD_PREP(FME_PR_DATA_PR_DATA_RAW,
> > - *(((u32 *)buf) + i));
> > - writeq(pr_data, fme_pr + FME_PR_DATA);
> > - count -= 4;
> > + buf += priv->pr_datawidth;
> > + count -= priv->pr_datawidth;
> > pr_credit--;
> > - i++;
> > }
> >
> > - return 0;
> > +done:
> > + return ret;
> > }
> >
> > static int fme_mgr_write_complete(struct fpga_manager *mgr,
> > @@ -279,6 +325,36 @@ static void fme_mgr_get_compat_id(void __iomem *fme_pr,
> > id->id_h = readq(fme_pr + FME_PR_INTFC_ID_H);
> > }
> >
> > +static u8 fme_mgr_get_pr_datawidth(struct device *dev, void __iomem *fme_pr)
> > +{
> > + u8 revision = dfl_feature_revision(fme_pr);
> > +
> > + if (revision < 2) {
> > + /*
> > + * revision 0 and 1 only support 32bit data width partial
> > + * reconfiguration, so pr_datawidth is 4 (Byte).
> > + */
> > + return 4;
> > + } else if (revision == 2) {
> > + /*
> > + * revision 2 hardware has optimization to support 512bit data
> > + * width partial reconfiguration with AVX512 instructions. So
> > + * pr_datawidth is 64 (Byte). As revision 2 hardware is only
> > + * used in integrated solution, CPU supports AVX512 instructions
> > + * for sure, but it still needs to check here as AVX512 could be
> > + * disabled in kernel (e.g. using clearcpuid boot option).
> > + */
> > + if (is_cpu_avx512_enabled())
> > + return 64;
> > +
> > + dev_err(dev, "revision 2: AVX512 is disabled\n");
> > + return 0;
> > + }
> > +
> > + dev_err(dev, "revision %d is not supported yet\n", revision);
> > + return 0;
> > +}
> > +
> > static int fme_mgr_probe(struct platform_device *pdev)
> > {
> > struct dfl_fme_mgr_pdata *pdata = dev_get_platdata(&pdev->dev);
> > @@ -302,6 +378,10 @@ static int fme_mgr_probe(struct platform_device *pdev)
> > return PTR_ERR(priv->ioaddr);
> > }
> >
> > + priv->pr_datawidth = fme_mgr_get_pr_datawidth(dev, priv->ioaddr);
> > + if (!priv->pr_datawidth)
> > + return -ENODEV;
> > +
> > compat_id = devm_kzalloc(dev, sizeof(*compat_id), GFP_KERNEL);
> > if (!compat_id)
> > return -ENOMEM;
> > @@ -342,3 +422,4 @@ static int fme_mgr_remove(struct platform_device *pdev)
> > MODULE_AUTHOR("Intel Corporation");
> > MODULE_LICENSE("GPL v2");
> > MODULE_ALIAS("platform:dfl-fme-mgr");
> > +MODULE_VERSION(DRV_VERSION);
> > diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c
> > index 3c71dc3..cd94ba8 100644
> > --- a/drivers/fpga/dfl-fme-pr.c
> > +++ b/drivers/fpga/dfl-fme-pr.c
> > @@ -83,7 +83,7 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
> > if (copy_from_user(&port_pr, argp, minsz))
> > return -EFAULT;
> >
> > - if (port_pr.argsz < minsz || port_pr.flags)
> > + if (port_pr.argsz < minsz || port_pr.flags || !port_pr.buffer_size)
> > return -EINVAL;
> >
> > /* get fme header region */
> > @@ -101,15 +101,25 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
> > port_pr.buffer_size))
> > return -EFAULT;
> >
> > + mutex_lock(&pdata->lock);
> > + fme = dfl_fpga_pdata_get_private(pdata);
> > + /* fme device has been unregistered. */
> > + if (!fme) {
> > + ret = -EINVAL;
> > + goto unlock_exit;
> > + }
> > +
> > /*
> > * align PR buffer per PR bandwidth, as HW ignores the extra padding
> > * data automatically.
> > */
> > - length = ALIGN(port_pr.buffer_size, 4);
> > + length = ALIGN(port_pr.buffer_size, fme->pr_datawidth);
> >
> > buf = vmalloc(length);
> > - if (!buf)
> > - return -ENOMEM;
> > + if (!buf) {
> > + ret = -ENOMEM;
> > + goto unlock_exit;
> > + }
> >
> > if (copy_from_user(buf,
> > (void __user *)(unsigned long)port_pr.buffer_address,
> > @@ -127,18 +137,10 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
> >
> > info->flags |= FPGA_MGR_PARTIAL_RECONFIG;
> >
> > - mutex_lock(&pdata->lock);
> > - fme = dfl_fpga_pdata_get_private(pdata);
> > - /* fme device has been unregistered. */
> > - if (!fme) {
> > - ret = -EINVAL;
> > - goto unlock_exit;
> > - }
> > -
> > region = dfl_fme_region_find(fme, port_pr.port_id);
> > if (!region) {
> > ret = -EINVAL;
> > - goto unlock_exit;
> > + goto free_exit;
> > }
> >
> > fpga_image_info_free(region->info);
> > @@ -159,10 +161,10 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
> > fpga_bridges_put(®ion->bridge_list);
> >
> > put_device(®ion->dev);
> > -unlock_exit:
> > - mutex_unlock(&pdata->lock);
> > free_exit:
> > vfree(buf);
> > +unlock_exit:
> > + mutex_unlock(&pdata->lock);
> > return ret;
> > }
> >
> > @@ -388,6 +390,17 @@ static int pr_mgmt_init(struct platform_device *pdev,
> > mutex_lock(&pdata->lock);
> > priv = dfl_fpga_pdata_get_private(pdata);
> >
> > + /*
> > + * Initialize PR data width.
> > + * Only revision 2 supports 512bit datawidth for better performance,
> > + * other revisions use default 32bit datawidth. This is used for
> > + * buffer alignment.
> > + */
> > + if (dfl_feature_revision(feature->ioaddr) == 2)
> > + priv->pr_datawidth = 64;
> > + else
> > + priv->pr_datawidth = 4;
> > +
> > /* Initialize the region and bridge sub device list */
> > INIT_LIST_HEAD(&priv->region_list);
> > INIT_LIST_HEAD(&priv->bridge_list);
> > diff --git a/drivers/fpga/dfl-fme.h b/drivers/fpga/dfl-fme.h
> > index 5394a21..de20755 100644
> > --- a/drivers/fpga/dfl-fme.h
> > +++ b/drivers/fpga/dfl-fme.h
> > @@ -21,12 +21,14 @@
> > /**
> > * struct dfl_fme - dfl fme private data
> > *
> > + * @pr_datawidth: data width for partial reconfiguration.
> > * @mgr: FME's FPGA manager platform device.
> > * @region_list: linked list of FME's FPGA regions.
> > * @bridge_list: linked list of FME's FPGA bridges.
> > * @pdata: fme platform device's pdata.
> > */
> > struct dfl_fme {
> > + int pr_datawidth;
> > struct platform_device *mgr;
> > struct list_head region_list;
> > struct list_head bridge_list;
> > diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
> > index a8b869e..8851c6c 100644
> > --- a/drivers/fpga/dfl.h
> > +++ b/drivers/fpga/dfl.h
> > @@ -331,6 +331,11 @@ static inline bool dfl_feature_is_port(void __iomem *base)
> > (FIELD_GET(DFH_ID, v) == DFH_ID_FIU_PORT);
> > }
> >
> > +static inline u8 dfl_feature_revision(void __iomem *base)
> > +{
> > + return (u8)FIELD_GET(DFH_REVISION, readq(base + DFH));
> > +}
> > +
> > /**
> > * struct dfl_fpga_enum_info - DFL FPGA enumeration information
> > *
> > --
> > 1.8.3.1
> >
^ permalink raw reply
* Re: [PATCH v2 05/18] Documentation: fpga: dfl: add descriptions for virtualization and new interfaces.
From: Wu Hao @ 2019-05-17 4:11 UTC (permalink / raw)
To: Alan Tull; +Cc: Moritz Fischer, linux-fpga, linux-kernel, linux-api, Xu Yilun
In-Reply-To: <CANk1AXQCp2ozUQDWz__MuiUeDLvGvrfqj3KUYmBa5Z34oxG8NQ@mail.gmail.com>
On Thu, May 16, 2019 at 12:53:00PM -0500, Alan Tull wrote:
> On Thu, May 16, 2019 at 12:36 PM Alan Tull <atull@kernel.org> wrote:
> >
> > On Mon, Apr 29, 2019 at 4:12 AM Wu Hao <hao.wu@intel.com> wrote:
>
> Hi Hao,
>
> Most of this patchset looks ready to go upstream or nearly so with
> pretty straightforward changes . Patches 17 and 18 need minor changes
> and please change the scnprintf in the other patches. The patches
> that had nontrivial changes are the power and thermal ones involving
> hwmon. I'm hoping to send up the patchset minus the hwmon patches in
> the next version if there's no unforseen issues. If the hwmon patches
> are ready then also, that's great, but otherwise those patches don't
> need to hold up all the rest of the patchset. How's that sound?
Hi Alan
Thanks for your time for reviewing this patchset.
This sounds good to me. Only thing here is, I need to split the patch which
updates documentation into 2 patches (to remove hwmon description in doc),
but for sure, it should be very easy. :)
Thanks
Hao
>
> Alan
>
> > >
> > > This patch adds virtualization support description for DFL based
> > > FPGA devices (based on PCIe SRIOV), and introductions to new
> > > interfaces added by new dfl private feature drivers.
> > >
> > > Signed-off-by: Xu Yilun <yilun.xu@intel.com>
> > > Signed-off-by: Wu Hao <hao.wu@intel.com>
> >
> > Acked-by: Alan Tull <atull@kernel.org>
> >
> > Thanks,
> > Alan
^ permalink raw reply
* Re: [PATCH] Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.
From: Andy Lutomirski @ 2019-05-17 4:59 UTC (permalink / raw)
To: Felipe; +Cc: David S. Miller, Al Viro, LKML, Network Development, Linux API
In-Reply-To: <20190517032505.19921-1-felipe@felipegasper.com>
> On May 16, 2019, at 8:25 PM, Felipe <felipe@felipegasper.com> wrote:
>
> Author: Felipe Gasper <felipe@felipegasper.com>
> Date: Thu May 16 12:16:53 2019 -0500
>
> Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.
>
> This adds the ability for Netlink to report a socket’s UID along with the
> other UNIX socket diagnostic information that is already available. This will
> allow diagnostic tools greater insight into which users control which socket.
>
> Signed-off-by: Felipe Gasper <felipe@felipegasper.com>
>
> diff --git a/include/uapi/linux/unix_diag.h b/include/uapi/linux/unix_diag.h
> index 5c502fd..a198857 100644
> --- a/include/uapi/linux/unix_diag.h
> +++ b/include/uapi/linux/unix_diag.h
> @@ -20,6 +20,7 @@ struct unix_diag_req {
> #define UDIAG_SHOW_ICONS 0x00000008 /* show pending connections */
> #define UDIAG_SHOW_RQLEN 0x00000010 /* show skb receive queue len */
> #define UDIAG_SHOW_MEMINFO 0x00000020 /* show memory info of a socket */
> +#define UDIAG_SHOW_UID 0x00000040 /* show socket's UID */
>
> struct unix_diag_msg {
> __u8 udiag_family;
> @@ -40,6 +41,7 @@ enum {
> UNIX_DIAG_RQLEN,
> UNIX_DIAG_MEMINFO,
> UNIX_DIAG_SHUTDOWN,
> + UNIX_DIAG_UID,
>
> __UNIX_DIAG_MAX,
> };
> diff --git a/net/unix/diag.c b/net/unix/diag.c
> index 3183d9b..011f56c 100644
> --- a/net/unix/diag.c
> +++ b/net/unix/diag.c
> @@ -110,6 +110,11 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb)
> return nla_put(nlskb, UNIX_DIAG_RQLEN, sizeof(rql), &rql);
> }
>
> +static int sk_diag_dump_uid(struct sock *sk, struct sk_buff *nlskb)
> +{
> + return nla_put(nlskb, UNIX_DIAG_UID, sizeof(kuid_t), &(sk->sk_uid));
That type is called *k* uid_t because it’s internal to the kernel. You
probably want from_kuid_munged(), which will fix it up for an
appropriate userns. Presumably you want sk’s netns’s userns.
^ permalink raw reply
* Re: [PATCH 0/4] uapi, vfs: Change the mount API UAPI [ver #2]
From: Christian Brauner @ 2019-05-17 6:54 UTC (permalink / raw)
To: Dmitry V. Levin, Al Viro
Cc: David Howells, torvalds, Arnd Bergmann, linux-fsdevel,
linux-kernel, linux-api
In-Reply-To: <20190516202331.GA29908@altlinux.org>
On May 16, 2019 10:23:31 PM GMT+02:00, "Dmitry V. Levin" <ldv@altlinux.org> wrote:
>[looks like linux-abi is a typo, Cc'ed linux-api instead]
>
>On Thu, May 16, 2019 at 05:50:22PM +0100, Al Viro wrote:
>> [linux-abi cc'd]
>>
>> On Thu, May 16, 2019 at 06:31:52PM +0200, Christian Brauner wrote:
>> > On Thu, May 16, 2019 at 05:22:59PM +0100, Al Viro wrote:
>> > > On Thu, May 16, 2019 at 12:52:04PM +0100, David Howells wrote:
>> > > >
>> > > > Hi Linus, Al,
>> > > >
>> > > > Here are some patches that make changes to the mount API UAPI
>and two of
>> > > > them really need applying, before -rc1 - if they're going to be
>applied at
>> > > > all.
>> > >
>> > > I'm fine with 2--4, but I'm not convinced that cloexec-by-default
>crusade
>> > > makes any sense. Could somebody give coherent arguments in
>favour of
>> > > abandoning the existing conventions?
>> >
>> > So as I said in the commit message. From a userspace perspective
>it's
>> > more of an issue if one accidently leaks an fd to a task during
>exec.
>> >
>> > Also, most of the time one does not want to inherit an fd during an
>> > exec. It is a hazzle to always have to specify an extra flag.
>> >
>> > As Al pointed out to me open() semantics are not going anywhere.
>Sure,
>> > no argument there at all.
>> > But the idea of making fds cloexec by default is only targeted at
>fds
>> > that come from separate syscalls. fsopen(), open_tree_clone(), etc.
>they
>> > all return fds independent of open() so it's really easy to have
>them
>> > cloexec by default without regressing anyone and we also remove the
>need
>> > for a bunch of separate flags for each syscall to turn them into
>> > cloexec-fds. I mean, those for syscalls came with 4 separate flags
>to be
>> > able to specify that the returned fd should be made cloexec. The
>other
>> > way around, cloexec by default, fcntl() to remove the cloexec bit
>is way
>> > saner imho.
>>
>> Re separate flags - it is, in principle, a valid argument. OTOH, I'm
>not
>> sure if they need to be separate - they all have the same value and
>> I don't see any reason for that to change...
>>
>> Only tangentially related, but I wonder if something like
>close_range(from, to)
>> would be a more useful approach... That kind of open-coded loops is
>not
>> rare in userland and kernel-side code can do them much cheaper.
>Something
>> like
>> /* that exec is sensitive */
>> unshare(CLONE_FILES);
>> /* we don't want anything past stderr here */
>> close_range(3, ~0U);
>> execve(....);
>> on the userland side of thing. Comments?
>
>glibc people need a syscall to implement closefrom properly, see
>https://sourceware.org/bugzilla/show_bug.cgi?id=10353#c14
I have a prototype for close_range().
I'll send it out after rc1.
Christian
^ permalink raw reply
* Re: [PATCH 0/4] uapi, vfs: Change the mount API UAPI [ver #2]
From: Christian Brauner @ 2019-05-17 7:01 UTC (permalink / raw)
To: Al Viro
Cc: David Howells, torvalds, Arnd Bergmann, linux-fsdevel,
linux-kernel, linux-api
In-Reply-To: <20190516165021.GD17978@ZenIV.linux.org.uk>
On May 16, 2019 6:50:22 PM GMT+02:00, Al Viro <viro@zeniv.linux.org.uk> wrote:
>[linux-abi cc'd]
>
>On Thu, May 16, 2019 at 06:31:52PM +0200, Christian Brauner wrote:
>> On Thu, May 16, 2019 at 05:22:59PM +0100, Al Viro wrote:
>> > On Thu, May 16, 2019 at 12:52:04PM +0100, David Howells wrote:
>> > >
>> > > Hi Linus, Al,
>> > >
>> > > Here are some patches that make changes to the mount API UAPI and
>two of
>> > > them really need applying, before -rc1 - if they're going to be
>applied at
>> > > all.
>> >
>> > I'm fine with 2--4, but I'm not convinced that cloexec-by-default
>crusade
>> > makes any sense. Could somebody give coherent arguments in favour
>of
>> > abandoning the existing conventions?
>>
>> So as I said in the commit message. From a userspace perspective it's
>> more of an issue if one accidently leaks an fd to a task during exec.
>>
>> Also, most of the time one does not want to inherit an fd during an
>> exec. It is a hazzle to always have to specify an extra flag.
>>
>> As Al pointed out to me open() semantics are not going anywhere.
>Sure,
>> no argument there at all.
>> But the idea of making fds cloexec by default is only targeted at fds
>> that come from separate syscalls. fsopen(), open_tree_clone(), etc.
>they
>> all return fds independent of open() so it's really easy to have them
>> cloexec by default without regressing anyone and we also remove the
>need
>> for a bunch of separate flags for each syscall to turn them into
>> cloexec-fds. I mean, those for syscalls came with 4 separate flags to
>be
>> able to specify that the returned fd should be made cloexec. The
>other
>> way around, cloexec by default, fcntl() to remove the cloexec bit is
>way
>> saner imho.
>
>Re separate flags - it is, in principle, a valid argument. OTOH, I'm
>not
>sure if they need to be separate - they all have the same value and
>I don't see any reason for that to change...
One last thing I'd like to point out is that
we already have syscalls and ioctls that
return cloexec fds. So the consistency
argument is kinda dead.
If you still prefer to have cloexec flags
for the 4 new syscalls then yes,
if they could at least all have the same name
(FSMOUNT_CLOEXEC?) that would be good.
>
>Only tangentially related, but I wonder if something like
>close_range(from, to)
>would be a more useful approach... That kind of open-coded loops is
>not
>rare in userland and kernel-side code can do them much cheaper.
>Something
>like
> /* that exec is sensitive */
> unshare(CLONE_FILES);
> /* we don't want anything past stderr here */
> close_range(3, ~0U);
> execve(....);
>on the userland side of thing. Comments?
Said it before but, the list was mistyped so again:
I think that's a great idea.
I have a prototype for close_range(start, end, flags).
I'll wait after rc1 and then send it out.
Christian
^ permalink raw reply
* Re: [PATCH 0/4] uapi, vfs: Change the mount API UAPI [ver #2]
From: David Howells @ 2019-05-17 7:13 UTC (permalink / raw)
To: Christian Brauner
Cc: dhowells, Al Viro, torvalds, Arnd Bergmann, linux-fsdevel,
linux-kernel, linux-api
In-Reply-To: <F67AF221-C576-4424-88D7-7C6074D0A6C6@brauner.io>
Christian Brauner <christian@brauner.io> wrote:
> If you still prefer to have cloexec flags
> for the 4 new syscalls then yes,
> if they could at least all have the same name
> (FSMOUNT_CLOEXEC?) that would be good.
They don't all have the same value (see OPEN_TREE_CLOEXEC).
Note that I also don't want to blindly #define them to O_CLOEXEC because it's
not necessarily the same value on all arches. Currently it can be 02000000,
010000000 or 0x400000 for instance, which means that if it's sharing a mask
with other flags, at least three bits have to be reserved for it or we have to
have arch-dependent bit juggling.
One thing I like about your approach of just making them O_CLOEXEC by default
and removing the constants is that it avoids this mess entirely.
David
^ permalink raw reply
* Re: [PATCH 0/4] uapi, vfs: Change the mount API UAPI [ver #2]
From: Miklos Szeredi @ 2019-05-17 7:25 UTC (permalink / raw)
To: David Howells
Cc: Christian Brauner, Al Viro, Linus Torvalds, Arnd Bergmann,
linux-fsdevel, linux-kernel, linux-api
In-Reply-To: <11455.1558077206@warthog.procyon.org.uk>
On Fri, May 17, 2019 at 9:13 AM David Howells <dhowells@redhat.com> wrote:
>
> Christian Brauner <christian@brauner.io> wrote:
>
> > If you still prefer to have cloexec flags
> > for the 4 new syscalls then yes,
> > if they could at least all have the same name
> > (FSMOUNT_CLOEXEC?) that would be good.
>
> They don't all have the same value (see OPEN_TREE_CLOEXEC).
>
> Note that I also don't want to blindly #define them to O_CLOEXEC because it's
> not necessarily the same value on all arches. Currently it can be 02000000,
> 010000000 or 0x400000 for instance, which means that if it's sharing a mask
> with other flags, at least three bits have to be reserved for it or we have to
> have arch-dependent bit juggling.
>
> One thing I like about your approach of just making them O_CLOEXEC by default
> and removing the constants is that it avoids this mess entirely.
+1.
Confusion caused by inconsistency of naming is going to hurt more than
inconsistency of semantics wrt. open(2).
Thanks,
Miklos
^ permalink raw reply
* Re: [PATCH 0/4] uapi, vfs: Change the mount API UAPI [ver #2]
From: Christian Brauner @ 2019-05-17 7:27 UTC (permalink / raw)
Cc: dhowells, Al Viro, torvalds, Arnd Bergmann, linux-fsdevel,
linux-kernel, linux-api
In-Reply-To: <11455.1558077206@warthog.procyon.org.uk>
On May 17, 2019 9:13:26 AM GMT+02:00, David Howells <dhowells@redhat.com> wrote:
>Christian Brauner <christian@brauner.io> wrote:
>
>> If you still prefer to have cloexec flags
>> for the 4 new syscalls then yes,
>> if they could at least all have the same name
>> (FSMOUNT_CLOEXEC?) that would be good.
>
>They don't all have the same value (see OPEN_TREE_CLOEXEC).
>
>Note that I also don't want to blindly #define them to O_CLOEXEC
>because it's
>not necessarily the same value on all arches. Currently it can be
>02000000,
>010000000 or 0x400000 for instance, which means that if it's sharing a
>mask
>with other flags, at least three bits have to be reserved for it or we
>have to
>have arch-dependent bit juggling.
Ugh. Right, I forgot about that entirely.
Christian
>
>One thing I like about your approach of just making them O_CLOEXEC by
>default
>and removing the constants is that it avoids this mess entirely.
>
>David
^ permalink raw reply
* Re: [PATCH] Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.
From: Felipe Gasper @ 2019-05-17 14:11 UTC (permalink / raw)
To: Andy Lutomirski
Cc: David S. Miller, Al Viro, LKML, Network Development, Linux API
In-Reply-To: <CALCETrUaTamZ1ZGbWpu+4kDAEFRqyESoa_4tgwpAmMh3NVQ4pQ@mail.gmail.com>
> On May 17, 2019, at 12:59 AM, Andy Lutomirski <luto@kernel.org> wrote:
>
>> On May 16, 2019, at 8:25 PM, Felipe <felipe@felipegasper.com> wrote:
>>
>> Author: Felipe Gasper <felipe@felipegasper.com>
>> Date: Thu May 16 12:16:53 2019 -0500
>>
>> Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.
>>
>> This adds the ability for Netlink to report a socket’s UID along with the
>> other UNIX socket diagnostic information that is already available. This will
>> allow diagnostic tools greater insight into which users control which socket.
>>
>> Signed-off-by: Felipe Gasper <felipe@felipegasper.com>
>>
>> diff --git a/include/uapi/linux/unix_diag.h b/include/uapi/linux/unix_diag.h
>> index 5c502fd..a198857 100644
>> --- a/include/uapi/linux/unix_diag.h
>> +++ b/include/uapi/linux/unix_diag.h
>> @@ -20,6 +20,7 @@ struct unix_diag_req {
>> #define UDIAG_SHOW_ICONS 0x00000008 /* show pending connections */
>> #define UDIAG_SHOW_RQLEN 0x00000010 /* show skb receive queue len */
>> #define UDIAG_SHOW_MEMINFO 0x00000020 /* show memory info of a socket */
>> +#define UDIAG_SHOW_UID 0x00000040 /* show socket's UID */
>>
>> struct unix_diag_msg {
>> __u8 udiag_family;
>> @@ -40,6 +41,7 @@ enum {
>> UNIX_DIAG_RQLEN,
>> UNIX_DIAG_MEMINFO,
>> UNIX_DIAG_SHUTDOWN,
>> + UNIX_DIAG_UID,
>>
>> __UNIX_DIAG_MAX,
>> };
>> diff --git a/net/unix/diag.c b/net/unix/diag.c
>> index 3183d9b..011f56c 100644
>> --- a/net/unix/diag.c
>> +++ b/net/unix/diag.c
>> @@ -110,6 +110,11 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb)
>> return nla_put(nlskb, UNIX_DIAG_RQLEN, sizeof(rql), &rql);
>> }
>>
>> +static int sk_diag_dump_uid(struct sock *sk, struct sk_buff *nlskb)
>> +{
>> + return nla_put(nlskb, UNIX_DIAG_UID, sizeof(kuid_t), &(sk->sk_uid));
>
> That type is called *k* uid_t because it’s internal to the kernel. You
> probably want from_kuid_munged(), which will fix it up for an
> appropriate userns. Presumably you want sk’s netns’s userns.
Thank you for pointing this out.
Would it suffice to get the userns as: “sk_user_ns(sk)”?
Or would it be better to pass struct netlink_callback *cb from unix_diag_dump() to sk_diag_dump() to sk_diag_fill(), then to the new function to add the UID?
cheers,
-Felipe Gasper
^ permalink raw reply
* [PATCH v3 0/2] initramfs: add support for xattrs in the initial ram disk
From: Roberto Sassu @ 2019-05-17 16:55 UTC (permalink / raw)
To: viro
Cc: linux-security-module, linux-integrity, initramfs, linux-api,
linux-fsdevel, linux-kernel, zohar, silviu.vlasceanu,
dmitry.kasatkin, takondra, kamensky, hpa, arnd, rob,
james.w.mcmechan, niveditas98, Roberto Sassu
This patch set aims at solving the following use case: appraise files from
the initial ram disk. To do that, IMA checks the signature/hash from the
security.ima xattr. Unfortunately, this use case cannot be implemented
currently, as the CPIO format does not support xattrs.
This proposal consists in marshaling pathnames and xattrs in a file called
.xattr-list. They are unmarshaled by the CPIO parser after all files have
been extracted, or before the next ram disk is processed.
The difference from v1 (https://lkml.org/lkml/2018/11/22/1182) is that all
xattrs are stored in a single file and not per file (solves the file name
limitation issue, as it is not necessary to add a suffix to files
containing xattrs).
The difference with another proposal
(https://lore.kernel.org/patchwork/cover/888071/) is that xattrs can be
included in an image without changing the image format, as opposed to
defining a new one. As seen from the discussion, if a new format has to be
defined, it should fix the issues of the existing format, which requires
more time.
To fulfill both requirements, adding support for xattrs in a short time and
defining a new image format properly, this patch set takes an incremental
approach: it introduces a parser of xattrs that can be used either if
xattrs are in a regular file or directly added to the image (this patch set
reuses patch 9/15 of the existing proposal); in addition, it introduces a
wrapper of the xattr parser, to read xattrs from a file.
The changes introduced by this patch set don't cause any compatibility
issue: kernels without the xattr parser simply extracts .xattr-list and
don't unmarshal xattrs; kernels with the xattr parser don't unmarshal
xattrs if .xattr-list is not found in the image.
>From the kernel space perspective, backporting this functionality to older
kernels should be very easy. It is sufficient to add two calls to the new
function do_readxattrs(). From the user space perspective, no change is
required for the use case. A new dracut module (module-setup.sh) will
execute:
getfattr --absolute-names -d -h -R -e hex -m security.ima \
<file list> | xattr.awk -b > ${initdir}/.xattr-list
where xattr.awk is the script that marshals xattrs (see patch 3/3). The
same can be done with the initramfs-tools ram disk generator.
Changelog
v2:
- replace ksys_lsetxattr() with kern_path() and vfs_setxattr()
(suggested by Jann Horn)
- replace ksys_open()/ksys_read()/ksys_close() with
filp_open()/kernel_read()/fput()
(suggested by Jann Horn)
- use path variable instead of name_buf in do_readxattrs()
- set last byte of str to 0 in do_readxattrs()
- call do_readxattrs() in do_name() before replacing an existing
.xattr-list
- pass pathname to do_setxattrs()
Mimi Zohar (1):
initramfs: set extended attributes
Roberto Sassu (1):
initramfs: introduce do_readxattrs()
init/initramfs.c | 170 ++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 168 insertions(+), 2 deletions(-)
--
2.17.1
^ permalink raw reply
* [PATCH v3 1/2] initramfs: set extended attributes
From: Roberto Sassu @ 2019-05-17 16:55 UTC (permalink / raw)
To: viro
Cc: linux-security-module, linux-integrity, initramfs, linux-api,
linux-fsdevel, linux-kernel, zohar, silviu.vlasceanu,
dmitry.kasatkin, takondra, kamensky, hpa, arnd, rob,
james.w.mcmechan, niveditas98, Roberto Sassu
In-Reply-To: <20190517165519.11507-1-roberto.sassu@huawei.com>
From: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch adds xattrs to a file, with name and value taken from a supplied
buffer. The data format is:
<xattr #N data len (ASCII, 8 chars)><xattr #N name>\0<xattr #N value>
[kamensky: fixed restoring of xattrs for symbolic links by using
sys_lsetxattr() instead of sys_setxattr()]
[sassu: removed state management, kept only do_setxattrs(), replaced
sys_lsetxattr() with vfs_setxattr(), added check for
xattr_entry_size, added check for hdr->c_size, replaced strlen()
with strnlen(); moved do_setxattrs() before do_name()]
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Taras Kondratiuk <takondra@cisco.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
---
init/initramfs.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 69 insertions(+), 2 deletions(-)
diff --git a/init/initramfs.c b/init/initramfs.c
index 435a428c2af1..0c6dd1d5d3f6 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -10,6 +10,8 @@
#include <linux/syscalls.h>
#include <linux/utime.h>
#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/xattr.h>
static ssize_t __init xwrite(int fd, const char *p, size_t count)
{
@@ -146,7 +148,8 @@ static __initdata time64_t mtime;
static __initdata unsigned long ino, major, minor, nlink;
static __initdata umode_t mode;
-static __initdata unsigned long body_len, name_len;
+static __initdata u32 name_len, xattr_len;
+static __initdata u64 body_len;
static __initdata uid_t uid;
static __initdata gid_t gid;
static __initdata unsigned rdev;
@@ -218,7 +221,7 @@ static void __init read_into(char *buf, unsigned size, enum state next)
}
}
-static __initdata char *header_buf, *symlink_buf, *name_buf;
+static __initdata char *header_buf, *symlink_buf, *name_buf, *xattr_buf;
static int __init do_start(void)
{
@@ -315,6 +318,70 @@ static int __init maybe_link(void)
return 0;
}
+struct xattr_hdr {
+ char c_size[8]; /* total size including c_size field */
+ char c_data[]; /* <name>\0<value> */
+};
+
+static int __init __maybe_unused do_setxattrs(char *pathname)
+{
+ char *buf = xattr_buf;
+ char *bufend = buf + xattr_len;
+ struct xattr_hdr *hdr;
+ char str[sizeof(hdr->c_size) + 1];
+ struct path path;
+
+ if (!xattr_len)
+ return 0;
+
+ str[sizeof(hdr->c_size)] = 0;
+
+ while (buf < bufend) {
+ char *xattr_name, *xattr_value;
+ unsigned long xattr_entry_size;
+ unsigned long xattr_name_size, xattr_value_size;
+ int ret;
+
+ if (buf + sizeof(hdr->c_size) > bufend) {
+ error("malformed xattrs");
+ break;
+ }
+
+ hdr = (struct xattr_hdr *)buf;
+ memcpy(str, hdr->c_size, sizeof(hdr->c_size));
+ ret = kstrtoul(str, 16, &xattr_entry_size);
+ buf += xattr_entry_size;
+ if (ret || buf > bufend || !xattr_entry_size) {
+ error("malformed xattrs");
+ break;
+ }
+
+ xattr_name = hdr->c_data;
+ xattr_name_size = strnlen(xattr_name,
+ xattr_entry_size - sizeof(hdr->c_size));
+ if (xattr_name_size == xattr_entry_size - sizeof(hdr->c_size)) {
+ error("malformed xattrs");
+ break;
+ }
+
+ xattr_value = xattr_name + xattr_name_size + 1;
+ xattr_value_size = buf - xattr_value;
+
+ ret = kern_path(pathname, 0, &path);
+ if (!ret) {
+ ret = vfs_setxattr(path.dentry, xattr_name, xattr_value,
+ xattr_value_size, 0);
+
+ path_put(&path);
+ }
+
+ pr_debug("%s: %s size: %lu val: %s (ret: %d)\n", pathname,
+ xattr_name, xattr_value_size, xattr_value, ret);
+ }
+
+ return 0;
+}
+
static __initdata int wfd;
static int __init do_name(void)
--
2.17.1
^ permalink raw reply related
* [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: Roberto Sassu @ 2019-05-17 16:55 UTC (permalink / raw)
To: viro
Cc: linux-security-module, linux-integrity, initramfs, linux-api,
linux-fsdevel, linux-kernel, zohar, silviu.vlasceanu,
dmitry.kasatkin, takondra, kamensky, hpa, arnd, rob,
james.w.mcmechan, niveditas98, Roberto Sassu
In-Reply-To: <20190517165519.11507-1-roberto.sassu@huawei.com>
This patch adds support for an alternative method to add xattrs to files in
the rootfs filesystem. Instead of extracting them directly from the ram
disk image, they are extracted from a regular file called .xattr-list, that
can be added by any ram disk generator available today. The file format is:
<file #N data len (ASCII, 10 chars)><file #N path>\0
<xattr #N data len (ASCII, 8 chars)><xattr #N name>\0<xattr #N value>
.xattr-list can be generated by executing:
$ getfattr --absolute-names -d -h -R -e hex -m - \
<file list> | xattr.awk -b > ${initdir}/.xattr-list
where the content of the xattr.awk script is:
#! /usr/bin/awk -f
{
if (!length($0)) {
printf("%.10x%s\0", len, file);
for (x in xattr) {
printf("%.8x%s\0", xattr_len[x], x);
for (i = 0; i < length(xattr[x]) / 2; i++) {
printf("%c", strtonum("0x"substr(xattr[x], i * 2 + 1, 2)));
}
}
i = 0;
delete xattr;
delete xattr_len;
next;
};
if (i == 0) {
file=$3;
len=length(file) + 8 + 1;
}
if (i > 0) {
split($0, a, "=");
xattr[a[1]]=substr(a[2], 3);
xattr_len[a[1]]=length(a[1]) + 1 + 8 + length(xattr[a[1]]) / 2;
len+=xattr_len[a[1]];
};
i++;
}
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
---
init/initramfs.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 99 insertions(+)
diff --git a/init/initramfs.c b/init/initramfs.c
index 0c6dd1d5d3f6..6ec018c6279a 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -13,6 +13,8 @@
#include <linux/namei.h>
#include <linux/xattr.h>
+#define XATTR_LIST_FILENAME ".xattr-list"
+
static ssize_t __init xwrite(int fd, const char *p, size_t count)
{
ssize_t out = 0;
@@ -382,6 +384,97 @@ static int __init __maybe_unused do_setxattrs(char *pathname)
return 0;
}
+struct path_hdr {
+ char p_size[10]; /* total size including p_size field */
+ char p_data[]; /* <path>\0<xattrs> */
+};
+
+static int __init do_readxattrs(void)
+{
+ struct path_hdr hdr;
+ char *path = NULL;
+ char str[sizeof(hdr.p_size) + 1];
+ unsigned long file_entry_size;
+ size_t size, path_size, total_size;
+ struct kstat st;
+ struct file *file;
+ loff_t pos;
+ int ret;
+
+ ret = vfs_lstat(XATTR_LIST_FILENAME, &st);
+ if (ret < 0)
+ return ret;
+
+ total_size = st.size;
+
+ file = filp_open(XATTR_LIST_FILENAME, O_RDONLY, 0);
+ if (IS_ERR(file))
+ return PTR_ERR(file);
+
+ pos = file->f_pos;
+
+ while (total_size) {
+ size = kernel_read(file, (char *)&hdr, sizeof(hdr), &pos);
+ if (size != sizeof(hdr)) {
+ ret = -EIO;
+ goto out;
+ }
+
+ total_size -= size;
+
+ str[sizeof(hdr.p_size)] = 0;
+ memcpy(str, hdr.p_size, sizeof(hdr.p_size));
+ ret = kstrtoul(str, 16, &file_entry_size);
+ if (ret < 0)
+ goto out;
+
+ file_entry_size -= sizeof(sizeof(hdr.p_size));
+ if (file_entry_size > total_size) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ path = vmalloc(file_entry_size);
+ if (!path) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ size = kernel_read(file, path, file_entry_size, &pos);
+ if (size != file_entry_size) {
+ ret = -EIO;
+ goto out_free;
+ }
+
+ total_size -= size;
+
+ path_size = strnlen(path, file_entry_size);
+ if (path_size == file_entry_size) {
+ ret = -EINVAL;
+ goto out_free;
+ }
+
+ xattr_buf = path + path_size + 1;
+ xattr_len = file_entry_size - path_size - 1;
+
+ ret = do_setxattrs(path);
+ vfree(path);
+ path = NULL;
+
+ if (ret < 0)
+ break;
+ }
+out_free:
+ vfree(path);
+out:
+ fput(file);
+
+ if (ret < 0)
+ error("Unable to parse xattrs");
+
+ return ret;
+}
+
static __initdata int wfd;
static int __init do_name(void)
@@ -391,6 +484,11 @@ static int __init do_name(void)
if (strcmp(collected, "TRAILER!!!") == 0) {
free_hash();
return 0;
+ } else if (strcmp(collected, XATTR_LIST_FILENAME) == 0) {
+ struct kstat st;
+
+ if (!vfs_lstat(collected, &st))
+ do_readxattrs();
}
clean_path(collected, mode);
if (S_ISREG(mode)) {
@@ -562,6 +660,7 @@ static char * __init unpack_to_rootfs(char *buf, unsigned long len)
buf += my_inptr;
len -= my_inptr;
}
+ do_readxattrs();
dir_utime();
kfree(name_buf);
kfree(symlink_buf);
--
2.17.1
^ permalink raw reply related
* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: hpa @ 2019-05-17 20:18 UTC (permalink / raw)
To: Roberto Sassu, viro
Cc: linux-security-module, linux-integrity, initramfs, linux-api,
linux-fsdevel, linux-kernel, zohar, silviu.vlasceanu,
dmitry.kasatkin, takondra, kamensky, arnd, rob, james.w.mcmechan,
niveditas98
In-Reply-To: <20190517165519.11507-3-roberto.sassu@huawei.com>
On May 17, 2019 9:55:19 AM PDT, Roberto Sassu <roberto.sassu@huawei.com> wrote:
>This patch adds support for an alternative method to add xattrs to
>files in
>the rootfs filesystem. Instead of extracting them directly from the ram
>disk image, they are extracted from a regular file called .xattr-list,
>that
>can be added by any ram disk generator available today. The file format
>is:
>
><file #N data len (ASCII, 10 chars)><file #N path>\0
><xattr #N data len (ASCII, 8 chars)><xattr #N name>\0<xattr #N value>
>
>.xattr-list can be generated by executing:
>
>$ getfattr --absolute-names -d -h -R -e hex -m - \
> <file list> | xattr.awk -b > ${initdir}/.xattr-list
>
>where the content of the xattr.awk script is:
>
>#! /usr/bin/awk -f
>{
> if (!length($0)) {
> printf("%.10x%s\0", len, file);
> for (x in xattr) {
> printf("%.8x%s\0", xattr_len[x], x);
> for (i = 0; i < length(xattr[x]) / 2; i++) {
> printf("%c", strtonum("0x"substr(xattr[x], i * 2 + 1, 2)));
> }
> }
> i = 0;
> delete xattr;
> delete xattr_len;
> next;
> };
> if (i == 0) {
> file=$3;
> len=length(file) + 8 + 1;
> }
> if (i > 0) {
> split($0, a, "=");
> xattr[a[1]]=substr(a[2], 3);
> xattr_len[a[1]]=length(a[1]) + 1 + 8 + length(xattr[a[1]]) / 2;
> len+=xattr_len[a[1]];
> };
> i++;
>}
>
>Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
>---
> init/initramfs.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 99 insertions(+)
>
>diff --git a/init/initramfs.c b/init/initramfs.c
>index 0c6dd1d5d3f6..6ec018c6279a 100644
>--- a/init/initramfs.c
>+++ b/init/initramfs.c
>@@ -13,6 +13,8 @@
> #include <linux/namei.h>
> #include <linux/xattr.h>
>
>+#define XATTR_LIST_FILENAME ".xattr-list"
>+
> static ssize_t __init xwrite(int fd, const char *p, size_t count)
> {
> ssize_t out = 0;
>@@ -382,6 +384,97 @@ static int __init __maybe_unused do_setxattrs(char
>*pathname)
> return 0;
> }
>
>+struct path_hdr {
>+ char p_size[10]; /* total size including p_size field */
>+ char p_data[]; /* <path>\0<xattrs> */
>+};
>+
>+static int __init do_readxattrs(void)
>+{
>+ struct path_hdr hdr;
>+ char *path = NULL;
>+ char str[sizeof(hdr.p_size) + 1];
>+ unsigned long file_entry_size;
>+ size_t size, path_size, total_size;
>+ struct kstat st;
>+ struct file *file;
>+ loff_t pos;
>+ int ret;
>+
>+ ret = vfs_lstat(XATTR_LIST_FILENAME, &st);
>+ if (ret < 0)
>+ return ret;
>+
>+ total_size = st.size;
>+
>+ file = filp_open(XATTR_LIST_FILENAME, O_RDONLY, 0);
>+ if (IS_ERR(file))
>+ return PTR_ERR(file);
>+
>+ pos = file->f_pos;
>+
>+ while (total_size) {
>+ size = kernel_read(file, (char *)&hdr, sizeof(hdr), &pos);
>+ if (size != sizeof(hdr)) {
>+ ret = -EIO;
>+ goto out;
>+ }
>+
>+ total_size -= size;
>+
>+ str[sizeof(hdr.p_size)] = 0;
>+ memcpy(str, hdr.p_size, sizeof(hdr.p_size));
>+ ret = kstrtoul(str, 16, &file_entry_size);
>+ if (ret < 0)
>+ goto out;
>+
>+ file_entry_size -= sizeof(sizeof(hdr.p_size));
>+ if (file_entry_size > total_size) {
>+ ret = -EINVAL;
>+ goto out;
>+ }
>+
>+ path = vmalloc(file_entry_size);
>+ if (!path) {
>+ ret = -ENOMEM;
>+ goto out;
>+ }
>+
>+ size = kernel_read(file, path, file_entry_size, &pos);
>+ if (size != file_entry_size) {
>+ ret = -EIO;
>+ goto out_free;
>+ }
>+
>+ total_size -= size;
>+
>+ path_size = strnlen(path, file_entry_size);
>+ if (path_size == file_entry_size) {
>+ ret = -EINVAL;
>+ goto out_free;
>+ }
>+
>+ xattr_buf = path + path_size + 1;
>+ xattr_len = file_entry_size - path_size - 1;
>+
>+ ret = do_setxattrs(path);
>+ vfree(path);
>+ path = NULL;
>+
>+ if (ret < 0)
>+ break;
>+ }
>+out_free:
>+ vfree(path);
>+out:
>+ fput(file);
>+
>+ if (ret < 0)
>+ error("Unable to parse xattrs");
>+
>+ return ret;
>+}
>+
> static __initdata int wfd;
>
> static int __init do_name(void)
>@@ -391,6 +484,11 @@ static int __init do_name(void)
> if (strcmp(collected, "TRAILER!!!") == 0) {
> free_hash();
> return 0;
>+ } else if (strcmp(collected, XATTR_LIST_FILENAME) == 0) {
>+ struct kstat st;
>+
>+ if (!vfs_lstat(collected, &st))
>+ do_readxattrs();
> }
> clean_path(collected, mode);
> if (S_ISREG(mode)) {
>@@ -562,6 +660,7 @@ static char * __init unpack_to_rootfs(char *buf,
>unsigned long len)
> buf += my_inptr;
> len -= my_inptr;
> }
>+ do_readxattrs();
> dir_utime();
> kfree(name_buf);
> kfree(symlink_buf);
Ok... I just realized this does not work for a modular initramfs, composed at load time from multiple files, which is a very real problem. Should be easy enough to deal with: instead of one large file, use one companion file per source file, perhaps something like filename..xattrs (suggesting double dots to make it less likely to conflict with a "real" file.) No leading dot, as it makes it more likely that archivers will sort them before the file proper.
A side benefit is that the format can be simpler as there is no need to encode the filename.
A technically cleaner solution still, but which would need archiver modifications, would be to encode the xattrs as an optionally nameless file (just an empty string) with a new file mode value, immediately following the original file. The advantage there is that the archiver itself could support xattrs and other extended metadata (which has been requested elsewhere); the disadvantage obviously is that that it requires new support in the archiver. However, at least it ought to be simpler since it is still a higher protocol level than the cpio archive itself.
There's already one special case in cpio, which is the "!!!TRAILER!!!" filename; although I don't think it is part of the formal spec, to the extent there is one, I would expect that in practice it is always encoded with a mode of 0, which incidentally could be used to unbreak the case where such a filename actually exists. So one way to support such extended metadata would be to set mode to 0 and use the filename to encode the type of metadata. I wonder how existing GNU or BSD cpio (the BSD one is better maintained these days) would deal with reading such a file; it would at least not be a regression if it just read it still, possibly with warnings. It could also be possible to use bits 17:16 in the mode, which are traditionally always zero (mode_t being 16 bits), but I believe are present in most or all of the cpio formats for historical reasons. It might be accepted better by existing implementations to use one of these high bits combined with S_IFREG, I dont know.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
^ permalink raw reply
* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: Arvind Sankar @ 2019-05-17 21:02 UTC (permalink / raw)
To: hpa
Cc: Roberto Sassu, viro, linux-security-module, linux-integrity,
initramfs, linux-api, linux-fsdevel, linux-kernel, zohar,
silviu.vlasceanu, dmitry.kasatkin, takondra, kamensky, arnd, rob,
james.w.mcmechan, niveditas98
In-Reply-To: <CD9A4F89-7CA5-4329-A06A-F8DEB87905A5@zytor.com>
On Fri, May 17, 2019 at 01:18:11PM -0700, hpa@zytor.com wrote:
>
> Ok... I just realized this does not work for a modular initramfs, composed at load time from multiple files, which is a very real problem. Should be easy enough to deal with: instead of one large file, use one companion file per source file, perhaps something like filename..xattrs (suggesting double dots to make it less likely to conflict with a "real" file.) No leading dot, as it makes it more likely that archivers will sort them before the file proper.
This version of the patch was changed from the previous one exactly to deal with this case --
it allows for the bootloader to load multiple initramfs archives, each
with its own .xattr-list file, and to have that work properly.
Could you elaborate on the issue that you see?
^ permalink raw reply
* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: Arvind Sankar @ 2019-05-17 21:10 UTC (permalink / raw)
To: Arvind Sankar
Cc: hpa, Roberto Sassu, viro, linux-security-module, linux-integrity,
initramfs, linux-api, linux-fsdevel, linux-kernel, zohar,
silviu.vlasceanu, dmitry.kasatkin, takondra, kamensky, arnd, rob,
james.w.mcmechan, niveditas98
In-Reply-To: <20190517210219.GA5998@rani.riverdale.lan>
On Fri, May 17, 2019 at 05:02:20PM -0400, Arvind Sankar wrote:
> On Fri, May 17, 2019 at 01:18:11PM -0700, hpa@zytor.com wrote:
> >
> > Ok... I just realized this does not work for a modular initramfs, composed at load time from multiple files, which is a very real problem. Should be easy enough to deal with: instead of one large file, use one companion file per source file, perhaps something like filename..xattrs (suggesting double dots to make it less likely to conflict with a "real" file.) No leading dot, as it makes it more likely that archivers will sort them before the file proper.
> This version of the patch was changed from the previous one exactly to deal with this case --
> it allows for the bootloader to load multiple initramfs archives, each
> with its own .xattr-list file, and to have that work properly.
> Could you elaborate on the issue that you see?
Roberto, are you missing a changelog entry for v2->v3 change?
^ permalink raw reply
* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: Rob Landley @ 2019-05-17 21:17 UTC (permalink / raw)
To: hpa, Roberto Sassu, viro
Cc: linux-security-module, linux-integrity, initramfs, linux-api,
linux-fsdevel, linux-kernel, zohar, silviu.vlasceanu,
dmitry.kasatkin, takondra, kamensky, arnd, james.w.mcmechan,
niveditas98
In-Reply-To: <CD9A4F89-7CA5-4329-A06A-F8DEB87905A5@zytor.com>
On 5/17/19 3:18 PM, hpa@zytor.com wrote:
> Ok... I just realized this does not work for a modular initramfs, composed at load time from multiple files, which is a very real problem. Should be easy enough to deal with: instead of one large file, use one companion file per source file, perhaps something like filename..xattrs (suggesting double dots to make it less likely to conflict with a "real" file.) No leading dot, as it makes it more likely that archivers will sort them before the file proper.
>
> A side benefit is that the format can be simpler as there is no need to encode the filename.
>
> A technically cleaner solution still, but which would need archiver modifications, would be to encode the xattrs as an optionally nameless file (just an empty string) with a new file mode value, immediately following the original file. The advantage there is that the archiver itself could support xattrs and other extended metadata (which has been requested elsewhere); the disadvantage obviously is that that it requires new support in the archiver. However, at least it ought to be simpler since it is still a higher protocol level than the cpio archive itself.
>
> There's already one special case in cpio, which is the "!!!TRAILER!!!" filename; although I don't think it is part of the formal spec, to the extent there is one, I would expect that in practice it is always encoded with a mode of 0, which incidentally could be used to unbreak the case where such a filename actually exists. So one way to support such extended metadata would be to set mode to 0 and use the filename to encode the type of metadata. I wonder how existing GNU or BSD cpio (the BSD one is better maintained these days) would deal with reading such a file; it would at least not be a regression if it just read it still, possibly with warnings. It could also be possible to use bits 17:16 in the mode, which are traditionally always zero (mode_t being 16 bits), but I believe are pres
ent in most or all of the cpio formats for historical reasons. It might be accepted better by existing implementations to use one of these high bits combined with S_IFREG, I dont know.
>
I'll happily modify toybox cpio to understand xattrs (compress and decompress),
the android guys do a lot with xattrs already. I tapped out of _this_ discussion
from disgust with the proposed encoding.
Rob
^ permalink raw reply
* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: H. Peter Anvin @ 2019-05-17 21:41 UTC (permalink / raw)
To: Roberto Sassu, viro
Cc: linux-security-module, linux-integrity, initramfs, linux-api,
linux-fsdevel, linux-kernel, zohar, silviu.vlasceanu,
dmitry.kasatkin, takondra, kamensky, arnd, rob, james.w.mcmechan,
niveditas98
In-Reply-To: <CD9A4F89-7CA5-4329-A06A-F8DEB87905A5@zytor.com>
On 5/17/19 1:18 PM, hpa@zytor.com wrote:
>
> Ok... I just realized this does not work for a modular initramfs, composed at load time from multiple files, which is a very real problem. Should be easy enough to deal with: instead of one large file, use one companion file per source file, perhaps something like filename..xattrs (suggesting double dots to make it less likely to conflict with a "real" file.) No leading dot, as it makes it more likely that archivers will sort them before the file proper.
>
> A side benefit is that the format can be simpler as there is no need to encode the filename.
>
> A technically cleaner solution still, but which would need archiver modifications, would be to encode the xattrs as an optionally nameless file (just an empty string) with a new file mode value, immediately following the original file. The advantage there is that the archiver itself could support xattrs and other extended metadata (which has been requested elsewhere); the disadvantage obviously is that that it requires new support in the archiver. However, at least it ought to be simpler since it is still a higher protocol level than the cpio archive itself.
>
> There's already one special case in cpio, which is the "!!!TRAILER!!!" filename; although I don't think it is part of the formal spec, to the extent there is one, I would expect that in practice it is always encoded with a mode of 0, which incidentally could be used to unbreak the case where such a filename actually exists. So one way to support such extended metadata would be to set mode to 0 and use the filename to encode the type of metadata. I wonder how existing GNU or BSD cpio (the BSD one is better maintained these days) would deal with reading such a file; it would at least not be a regression if it just read it still, possibly with warnings. It could also be possible to use bits 17:16 in the mode, which are traditionally always zero (mode_t being 16 bits), but I believe are pres
ent in most or all of the cpio formats for historical reasons. It might be accepted better by existing implementations to use one of these high bits combined with S_IFREG, I dont know.
>
Correction: it's just !!!TRAILER!!!.
I tested with GNU cpio, BSD cpio, scpio and pax.
With a mode of 0:
- GNU cpio errors, but extracts all the other files.
- BSD cpio extracts them as regular files.
- scpio and pax abort.
With a mode of 0x18000 (bit 16 + S_IFREG), all of them happily extracted
the data as regular files.
-hpa
^ permalink raw reply
* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: H. Peter Anvin @ 2019-05-17 21:47 UTC (permalink / raw)
To: Arvind Sankar
Cc: Roberto Sassu, viro, linux-security-module, linux-integrity,
initramfs, linux-api, linux-fsdevel, linux-kernel, zohar,
silviu.vlasceanu, dmitry.kasatkin, takondra, kamensky, arnd, rob,
james.w.mcmechan, niveditas98
In-Reply-To: <20190517210219.GA5998@rani.riverdale.lan>
On 5/17/19 2:02 PM, Arvind Sankar wrote:
> On Fri, May 17, 2019 at 01:18:11PM -0700, hpa@zytor.com wrote:
>>
>> Ok... I just realized this does not work for a modular initramfs, composed at load time from multiple files, which is a very real problem. Should be easy enough to deal with: instead of one large file, use one companion file per source file, perhaps something like filename..xattrs (suggesting double dots to make it less likely to conflict with a "real" file.) No leading dot, as it makes it more likely that archivers will sort them before the file proper.
> This version of the patch was changed from the previous one exactly to deal with this case --
> it allows for the bootloader to load multiple initramfs archives, each
> with its own .xattr-list file, and to have that work properly.
> Could you elaborate on the issue that you see?
>
Well, for one thing, how do you define "cpio archive", each with its own
.xattr-list file? Second, that would seem to depend on the ordering, no,
in which case you depend critically on .xattr-list file following the
files, which most archivers won't do.
Either way it seems cleaner to have this per file; especially if/as it
can be done without actually mucking up the format.
I need to run, but I'll post a more detailed explanation of what I did
in a little bit.
-hpa
^ permalink raw reply
* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: Arvind Sankar @ 2019-05-17 22:17 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Arvind Sankar, Roberto Sassu, viro, linux-security-module,
linux-integrity, initramfs, linux-api, linux-fsdevel,
linux-kernel, zohar, silviu.vlasceanu, dmitry.kasatkin, takondra,
kamensky, arnd, rob, james.w.mcmechan, niveditas98
In-Reply-To: <d48f35a1-aab1-2f20-2e91-5e81a84b107f@zytor.com>
On Fri, May 17, 2019 at 02:47:31PM -0700, H. Peter Anvin wrote:
> On 5/17/19 2:02 PM, Arvind Sankar wrote:
> > On Fri, May 17, 2019 at 01:18:11PM -0700, hpa@zytor.com wrote:
> >>
> >> Ok... I just realized this does not work for a modular initramfs, composed at load time from multiple files, which is a very real problem. Should be easy enough to deal with: instead of one large file, use one companion file per source file, perhaps something like filename..xattrs (suggesting double dots to make it less likely to conflict with a "real" file.) No leading dot, as it makes it more likely that archivers will sort them before the file proper.
> > This version of the patch was changed from the previous one exactly to deal with this case --
> > it allows for the bootloader to load multiple initramfs archives, each
> > with its own .xattr-list file, and to have that work properly.
> > Could you elaborate on the issue that you see?
> >
>
> Well, for one thing, how do you define "cpio archive", each with its own
> .xattr-list file? Second, that would seem to depend on the ordering, no,
> in which case you depend critically on .xattr-list file following the
> files, which most archivers won't do.
>
> Either way it seems cleaner to have this per file; especially if/as it
> can be done without actually mucking up the format.
>
> I need to run, but I'll post a more detailed explanation of what I did
> in a little bit.
>
> -hpa
>
Not sure what you mean by how do I define it? Each cpio archive will
contain its own .xattr-list file with signatures for the files within
it, that was the idea.
You need to review the code more closely I think -- it does not depend
on the .xattr-list file following the files to which it applies.
The code first extracts .xattr-list as though it was a regular file. If
a later dupe shows up (presumably from a second archive, although the
patch will actually allow a second one in the same archive), it will
then process the existing .xattr-list file and apply the attributes
listed within it. It then will proceed to read the second one and
overwrite the first one with it (this is the normal behaviour in the
kernel cpio parser). At the end once all the archives have been
extracted, if there is an .xattr-list file in the rootfs it will be
parsed (it would've been the last one encountered, which hasn't been
parsed yet, just extracted).
Regarding the idea to use the high 16 bits of the mode field in
the header that's another possibility. It would just require additional
support in the program that actually creates the archive though, which
the current patch doesn't.
^ permalink raw reply
* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: kbuild test robot @ 2019-05-17 23:09 UTC (permalink / raw)
Cc: kbuild-all, viro, linux-security-module, linux-integrity,
initramfs, linux-api, linux-fsdevel, linux-kernel, zohar,
silviu.vlasceanu, dmitry.kasatkin, takondra, kamensky, hpa, arnd,
rob, james.w.mcmechan, niveditas98, Roberto Sassu
In-Reply-To: <20190517165519.11507-3-roberto.sassu@huawei.com>
[-- Attachment #1: Type: text/plain, Size: 3884 bytes --]
Hi Roberto,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linus/master]
[also build test ERROR on v5.1 next-20190517]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Roberto-Sassu/initramfs-set-extended-attributes/20190518-055846
config: sparc64-allyesconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=sparc64
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
All error/warnings (new ones prefixed by >>):
init/initramfs.c: In function 'do_readxattrs':
>> init/initramfs.c:437:10: error: implicit declaration of function 'vmalloc'; did you mean 'kvmalloc'? [-Werror=implicit-function-declaration]
path = vmalloc(file_entry_size);
^~~~~~~
kvmalloc
>> init/initramfs.c:437:8: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
path = vmalloc(file_entry_size);
^
>> init/initramfs.c:461:3: error: implicit declaration of function 'vfree'; did you mean 'kvfree'? [-Werror=implicit-function-declaration]
vfree(path);
^~~~~
kvfree
cc1: some warnings being treated as errors
vim +437 init/initramfs.c
391
392 static int __init do_readxattrs(void)
393 {
394 struct path_hdr hdr;
395 char *path = NULL;
396 char str[sizeof(hdr.p_size) + 1];
397 unsigned long file_entry_size;
398 size_t size, path_size, total_size;
399 struct kstat st;
400 struct file *file;
401 loff_t pos;
402 int ret;
403
404 ret = vfs_lstat(XATTR_LIST_FILENAME, &st);
405 if (ret < 0)
406 return ret;
407
408 total_size = st.size;
409
410 file = filp_open(XATTR_LIST_FILENAME, O_RDONLY, 0);
411 if (IS_ERR(file))
412 return PTR_ERR(file);
413
414 pos = file->f_pos;
415
416 while (total_size) {
417 size = kernel_read(file, (char *)&hdr, sizeof(hdr), &pos);
418 if (size != sizeof(hdr)) {
419 ret = -EIO;
420 goto out;
421 }
422
423 total_size -= size;
424
425 str[sizeof(hdr.p_size)] = 0;
426 memcpy(str, hdr.p_size, sizeof(hdr.p_size));
427 ret = kstrtoul(str, 16, &file_entry_size);
428 if (ret < 0)
429 goto out;
430
431 file_entry_size -= sizeof(sizeof(hdr.p_size));
432 if (file_entry_size > total_size) {
433 ret = -EINVAL;
434 goto out;
435 }
436
> 437 path = vmalloc(file_entry_size);
438 if (!path) {
439 ret = -ENOMEM;
440 goto out;
441 }
442
443 size = kernel_read(file, path, file_entry_size, &pos);
444 if (size != file_entry_size) {
445 ret = -EIO;
446 goto out_free;
447 }
448
449 total_size -= size;
450
451 path_size = strnlen(path, file_entry_size);
452 if (path_size == file_entry_size) {
453 ret = -EINVAL;
454 goto out_free;
455 }
456
457 xattr_buf = path + path_size + 1;
458 xattr_len = file_entry_size - path_size - 1;
459
460 ret = do_setxattrs(path);
> 461 vfree(path);
462 path = NULL;
463
464 if (ret < 0)
465 break;
466 }
467 out_free:
468 vfree(path);
469 out:
470 fput(file);
471
472 if (ret < 0)
473 error("Unable to parse xattrs");
474
475 return ret;
476 }
477
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 57391 bytes --]
^ permalink raw reply
* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: kbuild test robot @ 2019-05-18 1:02 UTC (permalink / raw)
Cc: kbuild-all, viro, linux-security-module, linux-integrity,
initramfs, linux-api, linux-fsdevel, linux-kernel, zohar,
silviu.vlasceanu, dmitry.kasatkin, takondra, kamensky, hpa, arnd,
rob, james.w.mcmechan, niveditas98, Roberto Sassu
In-Reply-To: <20190517165519.11507-3-roberto.sassu@huawei.com>
[-- Attachment #1: Type: text/plain, Size: 5380 bytes --]
Hi Roberto,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linus/master]
[also build test ERROR on v5.1 next-20190517]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Roberto-Sassu/initramfs-set-extended-attributes/20190518-055846
config: m68k-allyesconfig (attached as .config)
compiler: m68k-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=m68k
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
All error/warnings (new ones prefixed by >>):
init/initramfs.c: In function 'do_readxattrs':
init/initramfs.c:437:10: error: implicit declaration of function 'vmalloc'; did you mean 'kvmalloc'? [-Werror=implicit-function-declaration]
path = vmalloc(file_entry_size);
^~~~~~~
kvmalloc
init/initramfs.c:437:8: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
path = vmalloc(file_entry_size);
^
init/initramfs.c:461:3: error: implicit declaration of function 'vfree'; did you mean 'kvfree'? [-Werror=implicit-function-declaration]
vfree(path);
^~~~~
kvfree
In file included from include/asm-generic/io.h:891:0,
from arch/m68k/include/asm/io.h:11,
from include/linux/kexec.h:19,
from init/initramfs.c:694:
include/linux/vmalloc.h: At top level:
>> include/linux/vmalloc.h:77:14: error: conflicting types for 'vmalloc'
extern void *vmalloc(unsigned long size);
^~~~~~~
init/initramfs.c:437:10: note: previous implicit declaration of 'vmalloc' was here
path = vmalloc(file_entry_size);
^~~~~~~
In file included from include/asm-generic/io.h:891:0,
from arch/m68k/include/asm/io.h:11,
from include/linux/kexec.h:19,
from init/initramfs.c:694:
>> include/linux/vmalloc.h:102:13: warning: conflicting types for 'vfree'
extern void vfree(const void *addr);
^~~~~
init/initramfs.c:461:3: note: previous implicit declaration of 'vfree' was here
vfree(path);
^~~~~
cc1: some warnings being treated as errors
vim +/vmalloc +77 include/linux/vmalloc.h
db64fe02 Nick Piggin 2008-10-18 76
^1da177e Linus Torvalds 2005-04-16 @77 extern void *vmalloc(unsigned long size);
e1ca7788 Dave Young 2010-10-26 78 extern void *vzalloc(unsigned long size);
83342314 Nick Piggin 2006-06-23 79 extern void *vmalloc_user(unsigned long size);
930fc45a Christoph Lameter 2005-10-29 80 extern void *vmalloc_node(unsigned long size, int node);
e1ca7788 Dave Young 2010-10-26 81 extern void *vzalloc_node(unsigned long size, int node);
^1da177e Linus Torvalds 2005-04-16 82 extern void *vmalloc_exec(unsigned long size);
^1da177e Linus Torvalds 2005-04-16 83 extern void *vmalloc_32(unsigned long size);
83342314 Nick Piggin 2006-06-23 84 extern void *vmalloc_32_user(unsigned long size);
dd0fc66f Al Viro 2005-10-07 85 extern void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot);
d0a21265 David Rientjes 2011-01-13 86 extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
d0a21265 David Rientjes 2011-01-13 87 unsigned long start, unsigned long end, gfp_t gfp_mask,
cb9e3c29 Andrey Ryabinin 2015-02-13 88 pgprot_t prot, unsigned long vm_flags, int node,
cb9e3c29 Andrey Ryabinin 2015-02-13 89 const void *caller);
1f5307b1 Michal Hocko 2017-05-08 90 #ifndef CONFIG_MMU
a7c3e901 Michal Hocko 2017-05-08 91 extern void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags);
8594a21c Michal Hocko 2017-05-12 92 static inline void *__vmalloc_node_flags_caller(unsigned long size, int node,
8594a21c Michal Hocko 2017-05-12 93 gfp_t flags, void *caller)
1f5307b1 Michal Hocko 2017-05-08 94 {
8594a21c Michal Hocko 2017-05-12 95 return __vmalloc_node_flags(size, node, flags);
1f5307b1 Michal Hocko 2017-05-08 96 }
8594a21c Michal Hocko 2017-05-12 97 #else
8594a21c Michal Hocko 2017-05-12 98 extern void *__vmalloc_node_flags_caller(unsigned long size,
8594a21c Michal Hocko 2017-05-12 99 int node, gfp_t flags, void *caller);
1f5307b1 Michal Hocko 2017-05-08 100 #endif
cb9e3c29 Andrey Ryabinin 2015-02-13 101
b3bdda02 Christoph Lameter 2008-02-04 @102 extern void vfree(const void *addr);
bf22e37a Andrey Ryabinin 2016-12-12 103 extern void vfree_atomic(const void *addr);
^1da177e Linus Torvalds 2005-04-16 104
:::::: The code at line 77 was first introduced by commit
:::::: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2
:::::: TO: Linus Torvalds <torvalds@ppc970.osdl.org>
:::::: CC: Linus Torvalds <torvalds@ppc970.osdl.org>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 49863 bytes --]
^ permalink raw reply
* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: Rob Landley @ 2019-05-18 2:16 UTC (permalink / raw)
To: H. Peter Anvin, Roberto Sassu, viro
Cc: linux-security-module, linux-integrity, initramfs, linux-api,
linux-fsdevel, linux-kernel, zohar, silviu.vlasceanu,
dmitry.kasatkin, takondra, kamensky, arnd, james.w.mcmechan,
niveditas98
In-Reply-To: <69ef1f55-9fc1-7ee0-371f-3dbc77551dc0@zytor.com>
On 5/17/19 4:41 PM, H. Peter Anvin wrote:
> On 5/17/19 1:18 PM, hpa@zytor.com wrote:
>>
>> Ok... I just realized this does not work for a modular initramfs, composed at load time from multiple files, which is a very real problem. Should be easy enough to deal with: instead of one large file, use one companion file per source file, perhaps something like filename..xattrs (suggesting double dots to make it less likely to conflict with a "real" file.) No leading dot, as it makes it more likely that archivers will sort them before the file proper.
>>
>> A side benefit is that the format can be simpler as there is no need to encode the filename.
>>
>> A technically cleaner solution still, but which would need archiver modifications, would be to encode the xattrs as an optionally nameless file (just an empty string) with a new file mode value, immediately following the original file. The advantage there is that the archiver itself could support xattrs and other extended metadata (which has been requested elsewhere); the disadvantage obviously is that that it requires new support in the archiver. However, at least it ought to be simpler since it is still a higher protocol level than the cpio archive itself.
>>
>> There's already one special case in cpio, which is the "!!!TRAILER!!!" filename; although I don't think it is part of the formal spec, to the extent there is one, I would expect that in practice it is always encoded with a mode of 0, which incidentally could be used to unbreak the case where such a filename actually exists. So one way to support such extended metadata would be to set mode to 0 and use the filename to encode the type of metadata. I wonder how existing GNU or BSD cpio (the BSD one is better maintained these days) would deal with reading such a file; it would at least not be a regression if it just read it still, possibly with warnings. It could also be possible to use bits 17:16 in the mode, which are traditionally always zero (mode_t being 16 bits), but I believe are pre
sent in most or all of the cpio formats for historical reasons. It might be accepted better by existing implementations to use one of these high bits combined with S_IFREG, I dont know.
>
>
> Correction: it's just !!!TRAILER!!!.
We documented it as "TRAILER!!!" without leading !!!, and that its purpose is to
flush hardlinks:
https://www.kernel.org/doc/Documentation/early-userspace/buffer-format.txt
That's what toybox cpio has been producing. Kernel consumes it just fine. Just
checked busybox cpio and that's what they're producing as well...
Rob
^ permalink raw reply
* [PATCH v2] net: Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.
From: Felipe Gasper @ 2019-05-18 3:48 UTC (permalink / raw)
To: davem, viro, linux-kernel, netdev, linux-api
Author: Felipe Gasper <felipe@felipegasper.com>
Date: Fri May 17 16:54:40 2019 -0500
net: Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.
This adds the ability for Netlink to report a socket’s UID along with the
other UNIX diagnostic information that is already available. This will
allow diagnostic tools greater insight into which users control which socket.
Signed-off-by: Felipe Gasper <felipe@felipegasper.com>
diff --git a/include/uapi/linux/unix_diag.h b/include/uapi/linux/unix_diag.h
index 5c502fd..a198857 100644
--- a/include/uapi/linux/unix_diag.h
+++ b/include/uapi/linux/unix_diag.h
@@ -20,6 +20,7 @@ struct unix_diag_req {
#define UDIAG_SHOW_ICONS 0x00000008 /* show pending connections */
#define UDIAG_SHOW_RQLEN 0x00000010 /* show skb receive queue len */
#define UDIAG_SHOW_MEMINFO 0x00000020 /* show memory info of a socket */
+#define UDIAG_SHOW_UID 0x00000040 /* show socket's UID */
struct unix_diag_msg {
__u8 udiag_family;
@@ -40,6 +41,7 @@ enum {
UNIX_DIAG_RQLEN,
UNIX_DIAG_MEMINFO,
UNIX_DIAG_SHUTDOWN,
+ UNIX_DIAG_UID,
__UNIX_DIAG_MAX,
};
diff --git a/net/unix/diag.c b/net/unix/diag.c
index 3183d9b..e40f348 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -4,9 +4,11 @@
#include <linux/unix_diag.h>
#include <linux/skbuff.h>
#include <linux/module.h>
+#include <linux/uidgid.h>
#include <net/netlink.h>
#include <net/af_unix.h>
#include <net/tcp_states.h>
+#include <net/sock.h>
static int sk_diag_dump_name(struct sock *sk, struct sk_buff *nlskb)
{
@@ -110,6 +112,12 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb)
return nla_put(nlskb, UNIX_DIAG_RQLEN, sizeof(rql), &rql);
}
+static int sk_diag_dump_uid(struct sock *sk, struct sk_buff *nlskb)
+{
+ uid_t uid = from_kuid_munged(sk_user_ns(sk), sock_i_uid(sk));
+ return nla_put(nlskb, UNIX_DIAG_UID, sizeof(uid_t), &uid);
+}
+
static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_req *req,
u32 portid, u32 seq, u32 flags, int sk_ino)
{
@@ -156,6 +164,10 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r
if (nla_put_u8(skb, UNIX_DIAG_SHUTDOWN, sk->sk_shutdown))
goto out_nlmsg_trim;
+ if ((req->udiag_show & UDIAG_SHOW_UID) &&
+ sk_diag_dump_uid(sk, skb))
+ goto out_nlmsg_trim;
+
nlmsg_end(skb, nlh);
return 0;
^ permalink raw reply related
* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: kbuild test robot @ 2019-05-18 5:49 UTC (permalink / raw)
Cc: kbuild-all, viro, linux-security-module, linux-integrity,
initramfs, linux-api, linux-fsdevel, linux-kernel, zohar,
silviu.vlasceanu, dmitry.kasatkin, takondra, kamensky, hpa, arnd,
rob, james.w.mcmechan, niveditas98, Roberto Sassu
In-Reply-To: <20190517165519.11507-3-roberto.sassu@huawei.com>
Hi Roberto,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on linus/master]
[also build test WARNING on v5.1 next-20190517]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Roberto-Sassu/initramfs-set-extended-attributes/20190518-055846
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
sparse warnings: (new ones prefixed by >>)
init/initramfs.c:24:45: sparse: sparse: incorrect type in argument 2 (different address spaces) @@ expected char const [noderef] <asn:1> *buf @@ got f] <asn:1> *buf @@
init/initramfs.c:24:45: sparse: expected char const [noderef] <asn:1> *buf
init/initramfs.c:24:45: sparse: got char const *p
init/initramfs.c:115:36: sparse: sparse: incorrect type in argument 2 (different address spaces) @@ expected char const [noderef] <asn:1> *filename @@ got n:1> *filename @@
init/initramfs.c:115:36: sparse: expected char const [noderef] <asn:1> *filename
init/initramfs.c:115:36: sparse: got char *filename
init/initramfs.c:303:24: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected char const [noderef] <asn:1> *name @@ got n:1> *name @@
init/initramfs.c:303:24: sparse: expected char const [noderef] <asn:1> *name
init/initramfs.c:303:24: sparse: got char *path
init/initramfs.c:305:36: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected char const [noderef] <asn:1> *pathname @@ got n:1> *pathname @@
init/initramfs.c:305:36: sparse: expected char const [noderef] <asn:1> *pathname
init/initramfs.c:305:36: sparse: got char *path
init/initramfs.c:307:37: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected char const [noderef] <asn:1> *pathname @@ got n:1> *pathname @@
init/initramfs.c:307:37: sparse: expected char const [noderef] <asn:1> *pathname
init/initramfs.c:307:37: sparse: got char *path
init/initramfs.c:317:43: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected char const [noderef] <asn:1> *oldname @@ got n:1> *oldname @@
init/initramfs.c:317:43: sparse: expected char const [noderef] <asn:1> *oldname
init/initramfs.c:317:43: sparse: got char *old
init/initramfs.c:317:48: sparse: sparse: incorrect type in argument 2 (different address spaces) @@ expected char const [noderef] <asn:1> *newname @@ got char char const [noderef] <asn:1> *newname @@
init/initramfs.c:317:48: sparse: expected char const [noderef] <asn:1> *newname
init/initramfs.c:317:48: sparse: got char *static [toplevel] [assigned] collected
init/initramfs.c:404:25: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected char const [noderef] <asn:1> *name @@ got n:1> *name @@
init/initramfs.c:404:25: sparse: expected char const [noderef] <asn:1> *name
init/initramfs.c:404:25: sparse: got char *
>> init/initramfs.c:490:32: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected char const [noderef] <asn:1> *name @@ got char char const [noderef] <asn:1> *name @@
init/initramfs.c:490:32: sparse: expected char const [noderef] <asn:1> *name
init/initramfs.c:490:32: sparse: got char *static [toplevel] [assigned] collected
init/initramfs.c:500:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected char const [noderef] <asn:1> *filename @@ got char char const [noderef] <asn:1> *filename @@
init/initramfs.c:500:41: sparse: expected char const [noderef] <asn:1> *filename
init/initramfs.c:500:41: sparse: got char *static [toplevel] [assigned] collected
init/initramfs.c:512:28: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected char const [noderef] <asn:1> *pathname @@ got char char const [noderef] <asn:1> *pathname @@
init/initramfs.c:512:28: sparse: expected char const [noderef] <asn:1> *pathname
init/initramfs.c:512:28: sparse: got char *static [toplevel] [assigned] collected
init/initramfs.c:513:28: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected char const [noderef] <asn:1> *filename @@ got char char const [noderef] <asn:1> *filename @@
init/initramfs.c:513:28: sparse: expected char const [noderef] <asn:1> *filename
init/initramfs.c:513:28: sparse: got char *static [toplevel] [assigned] collected
init/initramfs.c:514:28: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected char const [noderef] <asn:1> *filename @@ got char char const [noderef] <asn:1> *filename @@
init/initramfs.c:514:28: sparse: expected char const [noderef] <asn:1> *filename
init/initramfs.c:514:28: sparse: got char *static [toplevel] [assigned] collected
init/initramfs.c:519:36: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected char const [noderef] <asn:1> *filename @@ got char char const [noderef] <asn:1> *filename @@
init/initramfs.c:519:36: sparse: expected char const [noderef] <asn:1> *filename
init/initramfs.c:519:36: sparse: got char *static [toplevel] [assigned] collected
init/initramfs.c:520:36: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected char const [noderef] <asn:1> *filename @@ got char char const [noderef] <asn:1> *filename @@
init/initramfs.c:520:36: sparse: expected char const [noderef] <asn:1> *filename
init/initramfs.c:520:36: sparse: got char *static [toplevel] [assigned] collected
init/initramfs.c:521:36: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected char const [noderef] <asn:1> *filename @@ got char char const [noderef] <asn:1> *filename @@
init/initramfs.c:521:36: sparse: expected char const [noderef] <asn:1> *filename
init/initramfs.c:521:36: sparse: got char *static [toplevel] [assigned] collected
init/initramfs.c:552:32: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected char const [noderef] <asn:1> *oldname @@ got n:1> *oldname @@
init/initramfs.c:552:32: sparse: expected char const [noderef] <asn:1> *oldname
init/initramfs.c:552:32: sparse: got char *
init/initramfs.c:552:53: sparse: sparse: incorrect type in argument 2 (different address spaces) @@ expected char const [noderef] <asn:1> *newname @@ got char char const [noderef] <asn:1> *newname @@
init/initramfs.c:552:53: sparse: expected char const [noderef] <asn:1> *newname
init/initramfs.c:552:53: sparse: got char *static [toplevel] [assigned] collected
init/initramfs.c:553:21: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected char const [noderef] <asn:1> *filename @@ got char char const [noderef] <asn:1> *filename @@
init/initramfs.c:553:21: sparse: expected char const [noderef] <asn:1> *filename
init/initramfs.c:553:21: sparse: got char *static [toplevel] [assigned] collected
vim +490 init/initramfs.c
310
311 static int __init maybe_link(void)
312 {
313 if (nlink >= 2) {
314 char *old = find_link(major, minor, ino, mode, collected);
315 if (old) {
316 clean_path(collected, 0);
> 317 return (ksys_link(old, collected) < 0) ? -1 : 1;
318 }
319 }
320 return 0;
321 }
322
323 struct xattr_hdr {
324 char c_size[8]; /* total size including c_size field */
325 char c_data[]; /* <name>\0<value> */
326 };
327
328 static int __init __maybe_unused do_setxattrs(char *pathname)
329 {
330 char *buf = xattr_buf;
331 char *bufend = buf + xattr_len;
332 struct xattr_hdr *hdr;
333 char str[sizeof(hdr->c_size) + 1];
334 struct path path;
335
336 if (!xattr_len)
337 return 0;
338
339 str[sizeof(hdr->c_size)] = 0;
340
341 while (buf < bufend) {
342 char *xattr_name, *xattr_value;
343 unsigned long xattr_entry_size;
344 unsigned long xattr_name_size, xattr_value_size;
345 int ret;
346
347 if (buf + sizeof(hdr->c_size) > bufend) {
348 error("malformed xattrs");
349 break;
350 }
351
352 hdr = (struct xattr_hdr *)buf;
353 memcpy(str, hdr->c_size, sizeof(hdr->c_size));
354 ret = kstrtoul(str, 16, &xattr_entry_size);
355 buf += xattr_entry_size;
356 if (ret || buf > bufend || !xattr_entry_size) {
357 error("malformed xattrs");
358 break;
359 }
360
361 xattr_name = hdr->c_data;
362 xattr_name_size = strnlen(xattr_name,
363 xattr_entry_size - sizeof(hdr->c_size));
364 if (xattr_name_size == xattr_entry_size - sizeof(hdr->c_size)) {
365 error("malformed xattrs");
366 break;
367 }
368
369 xattr_value = xattr_name + xattr_name_size + 1;
370 xattr_value_size = buf - xattr_value;
371
372 ret = kern_path(pathname, 0, &path);
373 if (!ret) {
374 ret = vfs_setxattr(path.dentry, xattr_name, xattr_value,
375 xattr_value_size, 0);
376
377 path_put(&path);
378 }
379
380 pr_debug("%s: %s size: %lu val: %s (ret: %d)\n", pathname,
381 xattr_name, xattr_value_size, xattr_value, ret);
382 }
383
384 return 0;
385 }
386
387 struct path_hdr {
388 char p_size[10]; /* total size including p_size field */
389 char p_data[]; /* <path>\0<xattrs> */
390 };
391
392 static int __init do_readxattrs(void)
393 {
394 struct path_hdr hdr;
395 char *path = NULL;
396 char str[sizeof(hdr.p_size) + 1];
397 unsigned long file_entry_size;
398 size_t size, path_size, total_size;
399 struct kstat st;
400 struct file *file;
401 loff_t pos;
402 int ret;
403
404 ret = vfs_lstat(XATTR_LIST_FILENAME, &st);
405 if (ret < 0)
406 return ret;
407
408 total_size = st.size;
409
410 file = filp_open(XATTR_LIST_FILENAME, O_RDONLY, 0);
411 if (IS_ERR(file))
412 return PTR_ERR(file);
413
414 pos = file->f_pos;
415
416 while (total_size) {
417 size = kernel_read(file, (char *)&hdr, sizeof(hdr), &pos);
418 if (size != sizeof(hdr)) {
419 ret = -EIO;
420 goto out;
421 }
422
423 total_size -= size;
424
425 str[sizeof(hdr.p_size)] = 0;
426 memcpy(str, hdr.p_size, sizeof(hdr.p_size));
427 ret = kstrtoul(str, 16, &file_entry_size);
428 if (ret < 0)
429 goto out;
430
431 file_entry_size -= sizeof(sizeof(hdr.p_size));
432 if (file_entry_size > total_size) {
433 ret = -EINVAL;
434 goto out;
435 }
436
437 path = vmalloc(file_entry_size);
438 if (!path) {
439 ret = -ENOMEM;
440 goto out;
441 }
442
443 size = kernel_read(file, path, file_entry_size, &pos);
444 if (size != file_entry_size) {
445 ret = -EIO;
446 goto out_free;
447 }
448
449 total_size -= size;
450
451 path_size = strnlen(path, file_entry_size);
452 if (path_size == file_entry_size) {
453 ret = -EINVAL;
454 goto out_free;
455 }
456
457 xattr_buf = path + path_size + 1;
458 xattr_len = file_entry_size - path_size - 1;
459
460 ret = do_setxattrs(path);
461 vfree(path);
462 path = NULL;
463
464 if (ret < 0)
465 break;
466 }
467 out_free:
468 vfree(path);
469 out:
470 fput(file);
471
472 if (ret < 0)
473 error("Unable to parse xattrs");
474
475 return ret;
476 }
477
478 static __initdata int wfd;
479
480 static int __init do_name(void)
481 {
482 state = SkipIt;
483 next_state = Reset;
484 if (strcmp(collected, "TRAILER!!!") == 0) {
485 free_hash();
486 return 0;
487 } else if (strcmp(collected, XATTR_LIST_FILENAME) == 0) {
488 struct kstat st;
489
> 490 if (!vfs_lstat(collected, &st))
491 do_readxattrs();
492 }
493 clean_path(collected, mode);
494 if (S_ISREG(mode)) {
495 int ml = maybe_link();
496 if (ml >= 0) {
497 int openflags = O_WRONLY|O_CREAT;
498 if (ml != 1)
499 openflags |= O_TRUNC;
500 wfd = ksys_open(collected, openflags, mode);
501
502 if (wfd >= 0) {
503 ksys_fchown(wfd, uid, gid);
504 ksys_fchmod(wfd, mode);
505 if (body_len)
506 ksys_ftruncate(wfd, body_len);
507 vcollected = kstrdup(collected, GFP_KERNEL);
508 state = CopyFile;
509 }
510 }
511 } else if (S_ISDIR(mode)) {
512 ksys_mkdir(collected, mode);
513 ksys_chown(collected, uid, gid);
514 ksys_chmod(collected, mode);
515 dir_add(collected, mtime);
516 } else if (S_ISBLK(mode) || S_ISCHR(mode) ||
517 S_ISFIFO(mode) || S_ISSOCK(mode)) {
518 if (maybe_link() == 0) {
519 ksys_mknod(collected, mode, rdev);
520 ksys_chown(collected, uid, gid);
521 ksys_chmod(collected, mode);
522 do_utime(collected, mtime);
523 }
524 }
525 return 0;
526 }
527
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
^ permalink raw reply
* Re: [PATCH v1 1/2] pid: add pidfd_open()
From: Joel Fernandes @ 2019-05-18 9:48 UTC (permalink / raw)
To: Christian Brauner
Cc: jannh, oleg, viro, torvalds, linux-kernel, arnd, akpm, cyphar,
dhowells, ebiederm, elena.reshetova, keescook, luto, luto, tglx,
linux-alpha, linux-arm-kernel, linux-ia64, linux-m68k, linux-mips,
linux-parisc, linuxppc-dev, linux-s390, linux-sh, sparclinux,
linux-xtensa, linux-api, linux-arch, linux-kselftest, dancol,
serge, su
In-Reply-To: <20190516135944.7205-1-christian@brauner.io>
Hi Christian,
For next revision, could you also CC surenb@google.com as well? He is also
working on the low memory killer. And also suggest CC to
kernel-team@android.com. And mentioned some comments below, thanks.
On Thu, May 16, 2019 at 03:59:42PM +0200, Christian Brauner wrote:
[snip]
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 20881598bdfa..4afca3d6dcb8 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -38,6 +38,7 @@
> #include <linux/syscalls.h>
> #include <linux/proc_ns.h>
> #include <linux/proc_fs.h>
> +#include <linux/sched/signal.h>
> #include <linux/sched/task.h>
> #include <linux/idr.h>
>
> @@ -451,6 +452,55 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
> return idr_get_next(&ns->idr, &nr);
> }
>
> +/**
> + * pidfd_open() - Open new pid file descriptor.
> + *
> + * @pid: pid for which to retrieve a pidfd
> + * @flags: flags to pass
> + *
> + * This creates a new pid file descriptor with the O_CLOEXEC flag set for
> + * the process identified by @pid. Currently, the process identified by
> + * @pid must be a thread-group leader. This restriction currently exists
> + * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
> + * be used with CLONE_THREAD) and pidfd polling (only supports thread group
> + * leaders).
> + *
> + * Return: On success, a cloexec pidfd is returned.
> + * On error, a negative errno number will be returned.
> + */
> +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> +{
> + int fd, ret;
> + struct pid *p;
> + struct task_struct *tsk;
> +
> + if (flags)
> + return -EINVAL;
> +
> + if (pid <= 0)
> + return -EINVAL;
> +
> + p = find_get_pid(pid);
> + if (!p)
> + return -ESRCH;
> +
> + ret = 0;
> + rcu_read_lock();
> + /*
> + * If this returns non-NULL the pid was used as a thread-group
> + * leader. Note, we race with exec here: If it changes the
> + * thread-group leader we might return the old leader.
> + */
> + tsk = pid_task(p, PIDTYPE_TGID);
Just trying to understand the comment here. The issue is that we might either
return the new leader, or the old leader depending on the overlap with
concurrent de_thread right? In either case, we don't care though.
I suggest to remove the "Note..." part of the comment since it doesn't seem the
race is relevant here unless we are doing something else with tsk in the
function, but if you want to keep it that's also fine. Comment text should
probably should be 'return the new leader' though.
> + if (!tsk)
> + ret = -ESRCH;
Perhaps -EINVAL? AFAICS, this can only happen if a CLONE_THREAD pid was
passed as argument to pidfd_open which is invalid. But let me know what you
had in mind..
thanks,
- Joel
> + rcu_read_unlock();
> +
> + fd = ret ?: pidfd_create(p);
> + put_pid(p);
> + return fd;
> +}
> +
> void __init pid_idr_init(void)
> {
> /* Verify no one has done anything silly: */
> --
> 2.21.0
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox