Linux userland API discussions
 help / color / mirror / Atom feed
* Re: [PATCH v2 04/18] fpga: dfl: fme: support 512bit data width PR
From: Wu Hao @ 2019-05-17  3:50 UTC (permalink / raw)
  To: Alan Tull
  Cc: Scott Wood, Moritz Fischer, linux-fpga, linux-kernel, linux-api,
	Ananda Ravuri, Xu Yilun
In-Reply-To: <CANk1AXSuacSP79gtxiOUdf3pKgq0-eWXoeoQdTYX4QV4=NedDA@mail.gmail.com>

On Thu, May 16, 2019 at 12:35:27PM -0500, Alan Tull wrote:
> On Mon, Apr 29, 2019 at 4:12 AM Wu Hao <hao.wu@intel.com> wrote:
> 
> It looks like this addressed the review comments.  Adding my Ack.  Is
> there anything else on this patch?

Nothing else, just addressed the review comments. : )

Thanks for the review and ack.

Hao

> 
> Alan
> 
> >
> > In early partial reconfiguration private feature, it only
> > supports 32bit data width when writing data to hardware for
> > PR. 512bit data width PR support is an important optimization
> > for some specific solutions (e.g. XEON with FPGA integrated),
> > it allows driver to use AVX512 instruction to improve the
> > performance of partial reconfiguration. e.g. programming one
> > 100MB bitstream image via this 512bit data width PR hardware
> > only takes ~300ms, but 32bit revision requires ~3s per test
> > result.
> >
> > Please note now this optimization is only done on revision 2
> > of this PR private feature which is only used in integrated
> > solution that AVX512 is always supported. This revision 2
> > hardware doesn't support 32bit PR.
> >
> > Signed-off-by: Ananda Ravuri <ananda.ravuri@intel.com>
> > Signed-off-by: Xu Yilun <yilun.xu@intel.com>
> > Signed-off-by: Wu Hao <hao.wu@intel.com>
> 
> Acked-by: Alan Tull <atull@kernel.org>
> 
> 
> > ---
> > v2: check AVX512 support using cpu_feature_enabled()
> >     fix other comments from Scott Wood <swood@redhat.com>
> > ---
> >  drivers/fpga/dfl-fme-main.c |   3 ++
> >  drivers/fpga/dfl-fme-mgr.c  | 113 +++++++++++++++++++++++++++++++++++++-------
> >  drivers/fpga/dfl-fme-pr.c   |  43 +++++++++++------
> >  drivers/fpga/dfl-fme.h      |   2 +
> >  drivers/fpga/dfl.h          |   5 ++
> >  5 files changed, 135 insertions(+), 31 deletions(-)
> >
> > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> > index 086ad24..076d74f 100644
> > --- a/drivers/fpga/dfl-fme-main.c
> > +++ b/drivers/fpga/dfl-fme-main.c
> > @@ -21,6 +21,8 @@
> >  #include "dfl.h"
> >  #include "dfl-fme.h"
> >
> > +#define DRV_VERSION    "0.8"
> > +
> >  static ssize_t ports_num_show(struct device *dev,
> >                               struct device_attribute *attr, char *buf)
> >  {
> > @@ -277,3 +279,4 @@ static int fme_remove(struct platform_device *pdev)
> >  MODULE_AUTHOR("Intel Corporation");
> >  MODULE_LICENSE("GPL v2");
> >  MODULE_ALIAS("platform:dfl-fme");
> > +MODULE_VERSION(DRV_VERSION);
> > diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c
> > index b3f7eee..d1a4ba5 100644
> > --- a/drivers/fpga/dfl-fme-mgr.c
> > +++ b/drivers/fpga/dfl-fme-mgr.c
> > @@ -22,14 +22,18 @@
> >  #include <linux/io-64-nonatomic-lo-hi.h>
> >  #include <linux/fpga/fpga-mgr.h>
> >
> > +#include "dfl.h"
> >  #include "dfl-fme-pr.h"
> >
> > +#define DRV_VERSION    "0.8"
> > +
> >  /* FME Partial Reconfiguration Sub Feature Register Set */
> >  #define FME_PR_DFH             0x0
> >  #define FME_PR_CTRL            0x8
> >  #define FME_PR_STS             0x10
> >  #define FME_PR_DATA            0x18
> >  #define FME_PR_ERR             0x20
> > +#define FME_PR_512_DATA                0x40 /* Data Register for 512bit datawidth PR */
> >  #define FME_PR_INTFC_ID_L      0xA8
> >  #define FME_PR_INTFC_ID_H      0xB0
> >
> > @@ -67,8 +71,43 @@
> >  #define PR_WAIT_TIMEOUT   8000000
> >  #define PR_HOST_STATUS_IDLE    0
> >
> > +#if defined(CONFIG_X86) && defined(CONFIG_AS_AVX512)
> > +
> > +#include <linux/cpufeature.h>
> > +#include <asm/fpu/api.h>
> > +
> > +static inline int is_cpu_avx512_enabled(void)
> > +{
> > +       return cpu_feature_enabled(X86_FEATURE_AVX512F);
> > +}
> > +
> > +static inline void copy512(const void *src, void __iomem *dst)
> > +{
> > +       kernel_fpu_begin();
> > +
> > +       asm volatile("vmovdqu64 (%0), %%zmm0;"
> > +                    "vmovntdq %%zmm0, (%1);"
> > +                    :
> > +                    : "r"(src), "r"(dst)
> > +                    : "memory");
> > +
> > +       kernel_fpu_end();
> > +}
> > +#else
> > +static inline int is_cpu_avx512_enabled(void)
> > +{
> > +       return 0;
> > +}
> > +
> > +static inline void copy512(const void *src, void __iomem *dst)
> > +{
> > +       WARN_ON_ONCE(1);
> > +}
> > +#endif
> > +
> >  struct fme_mgr_priv {
> >         void __iomem *ioaddr;
> > +       unsigned int pr_datawidth;
> >         u64 pr_error;
> >  };
> >
> > @@ -169,7 +208,7 @@ static int fme_mgr_write(struct fpga_manager *mgr,
> >         struct fme_mgr_priv *priv = mgr->priv;
> >         void __iomem *fme_pr = priv->ioaddr;
> >         u64 pr_ctrl, pr_status, pr_data;
> > -       int delay = 0, pr_credit, i = 0;
> > +       int ret = 0, delay = 0, pr_credit;
> >
> >         dev_dbg(dev, "start request\n");
> >
> > @@ -181,9 +220,9 @@ static int fme_mgr_write(struct fpga_manager *mgr,
> >
> >         /*
> >          * driver can push data to PR hardware using PR_DATA register once HW
> > -        * has enough pr_credit (> 1), pr_credit reduces one for every 32bit
> > -        * pr data write to PR_DATA register. If pr_credit <= 1, driver needs
> > -        * to wait for enough pr_credit from hardware by polling.
> > +        * has enough pr_credit (> 1), pr_credit reduces one for every pr data
> > +        * width write to PR_DATA register. If pr_credit <= 1, driver needs to
> > +        * wait for enough pr_credit from hardware by polling.
> >          */
> >         pr_status = readq(fme_pr + FME_PR_STS);
> >         pr_credit = FIELD_GET(FME_PR_STS_PR_CREDIT, pr_status);
> > @@ -192,7 +231,8 @@ static int fme_mgr_write(struct fpga_manager *mgr,
> >                 while (pr_credit <= 1) {
> >                         if (delay++ > PR_WAIT_TIMEOUT) {
> >                                 dev_err(dev, "PR_CREDIT timeout\n");
> > -                               return -ETIMEDOUT;
> > +                               ret = -ETIMEDOUT;
> > +                               goto done;
> >                         }
> >                         udelay(1);
> >
> > @@ -200,21 +240,27 @@ static int fme_mgr_write(struct fpga_manager *mgr,
> >                         pr_credit = FIELD_GET(FME_PR_STS_PR_CREDIT, pr_status);
> >                 }
> >
> > -               if (count < 4) {
> > -                       dev_err(dev, "Invalid PR bitstream size\n");
> > -                       return -EINVAL;
> > +               WARN_ON(count < priv->pr_datawidth);
> > +
> > +               switch (priv->pr_datawidth) {
> > +               case 4:
> > +                       pr_data = FIELD_PREP(FME_PR_DATA_PR_DATA_RAW,
> > +                                            *(u32 *)buf);
> > +                       writeq(pr_data, fme_pr + FME_PR_DATA);
> > +                       break;
> > +               case 64:
> > +                       copy512(buf, fme_pr + FME_PR_512_DATA);
> > +                       break;
> > +               default:
> > +                       WARN_ON_ONCE(1);
> >                 }
> > -
> > -               pr_data = 0;
> > -               pr_data |= FIELD_PREP(FME_PR_DATA_PR_DATA_RAW,
> > -                                     *(((u32 *)buf) + i));
> > -               writeq(pr_data, fme_pr + FME_PR_DATA);
> > -               count -= 4;
> > +               buf += priv->pr_datawidth;
> > +               count -= priv->pr_datawidth;
> >                 pr_credit--;
> > -               i++;
> >         }
> >
> > -       return 0;
> > +done:
> > +       return ret;
> >  }
> >
> >  static int fme_mgr_write_complete(struct fpga_manager *mgr,
> > @@ -279,6 +325,36 @@ static void fme_mgr_get_compat_id(void __iomem *fme_pr,
> >         id->id_h = readq(fme_pr + FME_PR_INTFC_ID_H);
> >  }
> >
> > +static u8 fme_mgr_get_pr_datawidth(struct device *dev, void __iomem *fme_pr)
> > +{
> > +       u8 revision = dfl_feature_revision(fme_pr);
> > +
> > +       if (revision < 2) {
> > +               /*
> > +                * revision 0 and 1 only support 32bit data width partial
> > +                * reconfiguration, so pr_datawidth is 4 (Byte).
> > +                */
> > +               return 4;
> > +       } else if (revision == 2) {
> > +               /*
> > +                * revision 2 hardware has optimization to support 512bit data
> > +                * width partial reconfiguration with AVX512 instructions. So
> > +                * pr_datawidth is 64 (Byte). As revision 2 hardware is only
> > +                * used in integrated solution, CPU supports AVX512 instructions
> > +                * for sure, but it still needs to check here as AVX512 could be
> > +                * disabled in kernel (e.g. using clearcpuid boot option).
> > +                */
> > +               if (is_cpu_avx512_enabled())
> > +                       return 64;
> > +
> > +               dev_err(dev, "revision 2: AVX512 is disabled\n");
> > +               return 0;
> > +       }
> > +
> > +       dev_err(dev, "revision %d is not supported yet\n", revision);
> > +       return 0;
> > +}
> > +
> >  static int fme_mgr_probe(struct platform_device *pdev)
> >  {
> >         struct dfl_fme_mgr_pdata *pdata = dev_get_platdata(&pdev->dev);
> > @@ -302,6 +378,10 @@ static int fme_mgr_probe(struct platform_device *pdev)
> >                         return PTR_ERR(priv->ioaddr);
> >         }
> >
> > +       priv->pr_datawidth = fme_mgr_get_pr_datawidth(dev, priv->ioaddr);
> > +       if (!priv->pr_datawidth)
> > +               return -ENODEV;
> > +
> >         compat_id = devm_kzalloc(dev, sizeof(*compat_id), GFP_KERNEL);
> >         if (!compat_id)
> >                 return -ENOMEM;
> > @@ -342,3 +422,4 @@ static int fme_mgr_remove(struct platform_device *pdev)
> >  MODULE_AUTHOR("Intel Corporation");
> >  MODULE_LICENSE("GPL v2");
> >  MODULE_ALIAS("platform:dfl-fme-mgr");
> > +MODULE_VERSION(DRV_VERSION);
> > diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c
> > index 3c71dc3..cd94ba8 100644
> > --- a/drivers/fpga/dfl-fme-pr.c
> > +++ b/drivers/fpga/dfl-fme-pr.c
> > @@ -83,7 +83,7 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
> >         if (copy_from_user(&port_pr, argp, minsz))
> >                 return -EFAULT;
> >
> > -       if (port_pr.argsz < minsz || port_pr.flags)
> > +       if (port_pr.argsz < minsz || port_pr.flags || !port_pr.buffer_size)
> >                 return -EINVAL;
> >
> >         /* get fme header region */
> > @@ -101,15 +101,25 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
> >                        port_pr.buffer_size))
> >                 return -EFAULT;
> >
> > +       mutex_lock(&pdata->lock);
> > +       fme = dfl_fpga_pdata_get_private(pdata);
> > +       /* fme device has been unregistered. */
> > +       if (!fme) {
> > +               ret = -EINVAL;
> > +               goto unlock_exit;
> > +       }
> > +
> >         /*
> >          * align PR buffer per PR bandwidth, as HW ignores the extra padding
> >          * data automatically.
> >          */
> > -       length = ALIGN(port_pr.buffer_size, 4);
> > +       length = ALIGN(port_pr.buffer_size, fme->pr_datawidth);
> >
> >         buf = vmalloc(length);
> > -       if (!buf)
> > -               return -ENOMEM;
> > +       if (!buf) {
> > +               ret = -ENOMEM;
> > +               goto unlock_exit;
> > +       }
> >
> >         if (copy_from_user(buf,
> >                            (void __user *)(unsigned long)port_pr.buffer_address,
> > @@ -127,18 +137,10 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
> >
> >         info->flags |= FPGA_MGR_PARTIAL_RECONFIG;
> >
> > -       mutex_lock(&pdata->lock);
> > -       fme = dfl_fpga_pdata_get_private(pdata);
> > -       /* fme device has been unregistered. */
> > -       if (!fme) {
> > -               ret = -EINVAL;
> > -               goto unlock_exit;
> > -       }
> > -
> >         region = dfl_fme_region_find(fme, port_pr.port_id);
> >         if (!region) {
> >                 ret = -EINVAL;
> > -               goto unlock_exit;
> > +               goto free_exit;
> >         }
> >
> >         fpga_image_info_free(region->info);
> > @@ -159,10 +161,10 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
> >                 fpga_bridges_put(&region->bridge_list);
> >
> >         put_device(&region->dev);
> > -unlock_exit:
> > -       mutex_unlock(&pdata->lock);
> >  free_exit:
> >         vfree(buf);
> > +unlock_exit:
> > +       mutex_unlock(&pdata->lock);
> >         return ret;
> >  }
> >
> > @@ -388,6 +390,17 @@ static int pr_mgmt_init(struct platform_device *pdev,
> >         mutex_lock(&pdata->lock);
> >         priv = dfl_fpga_pdata_get_private(pdata);
> >
> > +       /*
> > +        * Initialize PR data width.
> > +        * Only revision 2 supports 512bit datawidth for better performance,
> > +        * other revisions use default 32bit datawidth. This is used for
> > +        * buffer alignment.
> > +        */
> > +       if (dfl_feature_revision(feature->ioaddr) == 2)
> > +               priv->pr_datawidth = 64;
> > +       else
> > +               priv->pr_datawidth = 4;
> > +
> >         /* Initialize the region and bridge sub device list */
> >         INIT_LIST_HEAD(&priv->region_list);
> >         INIT_LIST_HEAD(&priv->bridge_list);
> > diff --git a/drivers/fpga/dfl-fme.h b/drivers/fpga/dfl-fme.h
> > index 5394a21..de20755 100644
> > --- a/drivers/fpga/dfl-fme.h
> > +++ b/drivers/fpga/dfl-fme.h
> > @@ -21,12 +21,14 @@
> >  /**
> >   * struct dfl_fme - dfl fme private data
> >   *
> > + * @pr_datawidth: data width for partial reconfiguration.
> >   * @mgr: FME's FPGA manager platform device.
> >   * @region_list: linked list of FME's FPGA regions.
> >   * @bridge_list: linked list of FME's FPGA bridges.
> >   * @pdata: fme platform device's pdata.
> >   */
> >  struct dfl_fme {
> > +       int pr_datawidth;
> >         struct platform_device *mgr;
> >         struct list_head region_list;
> >         struct list_head bridge_list;
> > diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
> > index a8b869e..8851c6c 100644
> > --- a/drivers/fpga/dfl.h
> > +++ b/drivers/fpga/dfl.h
> > @@ -331,6 +331,11 @@ static inline bool dfl_feature_is_port(void __iomem *base)
> >                 (FIELD_GET(DFH_ID, v) == DFH_ID_FIU_PORT);
> >  }
> >
> > +static inline u8 dfl_feature_revision(void __iomem *base)
> > +{
> > +       return (u8)FIELD_GET(DFH_REVISION, readq(base + DFH));
> > +}
> > +
> >  /**
> >   * struct dfl_fpga_enum_info - DFL FPGA enumeration information
> >   *
> > --
> > 1.8.3.1
> >

^ permalink raw reply

* Re: [PATCH v2 05/18] Documentation: fpga: dfl: add descriptions for virtualization and new interfaces.
From: Wu Hao @ 2019-05-17  4:11 UTC (permalink / raw)
  To: Alan Tull; +Cc: Moritz Fischer, linux-fpga, linux-kernel, linux-api, Xu Yilun
In-Reply-To: <CANk1AXQCp2ozUQDWz__MuiUeDLvGvrfqj3KUYmBa5Z34oxG8NQ@mail.gmail.com>

On Thu, May 16, 2019 at 12:53:00PM -0500, Alan Tull wrote:
> On Thu, May 16, 2019 at 12:36 PM Alan Tull <atull@kernel.org> wrote:
> >
> > On Mon, Apr 29, 2019 at 4:12 AM Wu Hao <hao.wu@intel.com> wrote:
> 
> Hi Hao,
> 
> Most of this patchset looks ready to go upstream or nearly so with
> pretty straightforward changes .  Patches 17 and 18 need minor changes
> and please change the scnprintf in the other patches.  The patches
> that had nontrivial changes are the power and thermal ones involving
> hwmon.  I'm hoping to send up the patchset minus the hwmon patches in
> the next version if there's no unforseen issues.  If the hwmon patches
> are ready then also, that's great, but otherwise those patches don't
> need to hold up all the rest of the patchset.  How's that sound?

Hi Alan

Thanks for your time for reviewing this patchset.

This sounds good to me. Only thing here is, I need to split the patch which
updates documentation into 2 patches (to remove hwmon description in doc),
but for sure, it should be very easy. :)

Thanks
Hao

> 
> Alan
> 
> > >
> > > This patch adds virtualization support description for DFL based
> > > FPGA devices (based on PCIe SRIOV), and introductions to new
> > > interfaces added by new dfl private feature drivers.
> > >
> > > Signed-off-by: Xu Yilun <yilun.xu@intel.com>
> > > Signed-off-by: Wu Hao <hao.wu@intel.com>
> >
> > Acked-by: Alan Tull <atull@kernel.org>
> >
> > Thanks,
> > Alan

^ permalink raw reply

* Re: [PATCH] Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.
From: Andy Lutomirski @ 2019-05-17  4:59 UTC (permalink / raw)
  To: Felipe; +Cc: David S. Miller, Al Viro, LKML, Network Development, Linux API
In-Reply-To: <20190517032505.19921-1-felipe@felipegasper.com>

> On May 16, 2019, at 8:25 PM, Felipe <felipe@felipegasper.com> wrote:
>
> Author: Felipe Gasper <felipe@felipegasper.com>
> Date:   Thu May 16 12:16:53 2019 -0500
>
>    Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.
>
>    This adds the ability for Netlink to report a socket’s UID along with the
>    other UNIX socket diagnostic information that is already available. This will
>    allow diagnostic tools greater insight into which users control which socket.
>
>    Signed-off-by: Felipe Gasper <felipe@felipegasper.com>
>
> diff --git a/include/uapi/linux/unix_diag.h b/include/uapi/linux/unix_diag.h
> index 5c502fd..a198857 100644
> --- a/include/uapi/linux/unix_diag.h
> +++ b/include/uapi/linux/unix_diag.h
> @@ -20,6 +20,7 @@ struct unix_diag_req {
> #define UDIAG_SHOW_ICONS    0x00000008    /* show pending connections */
> #define UDIAG_SHOW_RQLEN    0x00000010    /* show skb receive queue len */
> #define UDIAG_SHOW_MEMINFO    0x00000020    /* show memory info of a socket */
> +#define UDIAG_SHOW_UID        0x00000040    /* show socket's UID */
>
> struct unix_diag_msg {
>    __u8    udiag_family;
> @@ -40,6 +41,7 @@ enum {
>    UNIX_DIAG_RQLEN,
>    UNIX_DIAG_MEMINFO,
>    UNIX_DIAG_SHUTDOWN,
> +    UNIX_DIAG_UID,
>
>    __UNIX_DIAG_MAX,
> };
> diff --git a/net/unix/diag.c b/net/unix/diag.c
> index 3183d9b..011f56c 100644
> --- a/net/unix/diag.c
> +++ b/net/unix/diag.c
> @@ -110,6 +110,11 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb)
>    return nla_put(nlskb, UNIX_DIAG_RQLEN, sizeof(rql), &rql);
> }
>
> +static int sk_diag_dump_uid(struct sock *sk, struct sk_buff *nlskb)
> +{
> +    return nla_put(nlskb, UNIX_DIAG_UID, sizeof(kuid_t), &(sk->sk_uid));

That type is called *k* uid_t because it’s internal to the kernel. You
probably want from_kuid_munged(), which will fix it up for an
appropriate userns.  Presumably you want sk’s netns’s userns.

^ permalink raw reply

* Re: [PATCH 0/4] uapi, vfs: Change the mount API UAPI [ver #2]
From: Christian Brauner @ 2019-05-17  6:54 UTC (permalink / raw)
  To: Dmitry V. Levin, Al Viro
  Cc: David Howells, torvalds, Arnd Bergmann, linux-fsdevel,
	linux-kernel, linux-api
In-Reply-To: <20190516202331.GA29908@altlinux.org>

On May 16, 2019 10:23:31 PM GMT+02:00, "Dmitry V. Levin" <ldv@altlinux.org> wrote:
>[looks like linux-abi is a typo, Cc'ed linux-api instead]
>
>On Thu, May 16, 2019 at 05:50:22PM +0100, Al Viro wrote:
>> [linux-abi cc'd]
>> 
>> On Thu, May 16, 2019 at 06:31:52PM +0200, Christian Brauner wrote:
>> > On Thu, May 16, 2019 at 05:22:59PM +0100, Al Viro wrote:
>> > > On Thu, May 16, 2019 at 12:52:04PM +0100, David Howells wrote:
>> > > > 
>> > > > Hi Linus, Al,
>> > > > 
>> > > > Here are some patches that make changes to the mount API UAPI
>and two of
>> > > > them really need applying, before -rc1 - if they're going to be
>applied at
>> > > > all.
>> > > 
>> > > I'm fine with 2--4, but I'm not convinced that cloexec-by-default
>crusade
>> > > makes any sense.  Could somebody give coherent arguments in
>favour of
>> > > abandoning the existing conventions?
>> > 
>> > So as I said in the commit message. From a userspace perspective
>it's
>> > more of an issue if one accidently leaks an fd to a task during
>exec.
>> > 
>> > Also, most of the time one does not want to inherit an fd during an
>> > exec. It is a hazzle to always have to specify an extra flag.
>> > 
>> > As Al pointed out to me open() semantics are not going anywhere.
>Sure,
>> > no argument there at all.
>> > But the idea of making fds cloexec by default is only targeted at
>fds
>> > that come from separate syscalls. fsopen(), open_tree_clone(), etc.
>they
>> > all return fds independent of open() so it's really easy to have
>them
>> > cloexec by default without regressing anyone and we also remove the
>need
>> > for a bunch of separate flags for each syscall to turn them into
>> > cloexec-fds. I mean, those for syscalls came with 4 separate flags
>to be
>> > able to specify that the returned fd should be made cloexec. The
>other
>> > way around, cloexec by default, fcntl() to remove the cloexec bit
>is way
>> > saner imho.
>> 
>> Re separate flags - it is, in principle, a valid argument.  OTOH, I'm
>not
>> sure if they need to be separate - they all have the same value and
>> I don't see any reason for that to change...
>> 
>> Only tangentially related, but I wonder if something like
>close_range(from, to)
>> would be a more useful approach...  That kind of open-coded loops is
>not
>> rare in userland and kernel-side code can do them much cheaper. 
>Something
>> like
>> 	/* that exec is sensitive */
>> 	unshare(CLONE_FILES);
>> 	/* we don't want anything past stderr here */
>> 	close_range(3, ~0U);
>> 	execve(....);
>> on the userland side of thing.  Comments?
>
>glibc people need a syscall to implement closefrom properly, see
>https://sourceware.org/bugzilla/show_bug.cgi?id=10353#c14

I have a prototype for close_range().
I'll send it out after rc1.

Christian

^ permalink raw reply

* Re: [PATCH 0/4] uapi, vfs: Change the mount API UAPI [ver #2]
From: Christian Brauner @ 2019-05-17  7:01 UTC (permalink / raw)
  To: Al Viro
  Cc: David Howells, torvalds, Arnd Bergmann, linux-fsdevel,
	linux-kernel, linux-api
In-Reply-To: <20190516165021.GD17978@ZenIV.linux.org.uk>

On May 16, 2019 6:50:22 PM GMT+02:00, Al Viro <viro@zeniv.linux.org.uk> wrote:
>[linux-abi cc'd]
>
>On Thu, May 16, 2019 at 06:31:52PM +0200, Christian Brauner wrote:
>> On Thu, May 16, 2019 at 05:22:59PM +0100, Al Viro wrote:
>> > On Thu, May 16, 2019 at 12:52:04PM +0100, David Howells wrote:
>> > > 
>> > > Hi Linus, Al,
>> > > 
>> > > Here are some patches that make changes to the mount API UAPI and
>two of
>> > > them really need applying, before -rc1 - if they're going to be
>applied at
>> > > all.
>> > 
>> > I'm fine with 2--4, but I'm not convinced that cloexec-by-default
>crusade
>> > makes any sense.  Could somebody give coherent arguments in favour
>of
>> > abandoning the existing conventions?
>> 
>> So as I said in the commit message. From a userspace perspective it's
>> more of an issue if one accidently leaks an fd to a task during exec.
>> 
>> Also, most of the time one does not want to inherit an fd during an
>> exec. It is a hazzle to always have to specify an extra flag.
>> 
>> As Al pointed out to me open() semantics are not going anywhere.
>Sure,
>> no argument there at all.
>> But the idea of making fds cloexec by default is only targeted at fds
>> that come from separate syscalls. fsopen(), open_tree_clone(), etc.
>they
>> all return fds independent of open() so it's really easy to have them
>> cloexec by default without regressing anyone and we also remove the
>need
>> for a bunch of separate flags for each syscall to turn them into
>> cloexec-fds. I mean, those for syscalls came with 4 separate flags to
>be
>> able to specify that the returned fd should be made cloexec. The
>other
>> way around, cloexec by default, fcntl() to remove the cloexec bit is
>way
>> saner imho.
>
>Re separate flags - it is, in principle, a valid argument.  OTOH, I'm
>not
>sure if they need to be separate - they all have the same value and
>I don't see any reason for that to change...

One last thing I'd like to point out is that
we already have syscalls and ioctls that
return cloexec fds. So the consistency
argument is kinda dead.

If you still prefer to have cloexec flags
for the 4 new syscalls then yes,
if they could at least all have the same name
(FSMOUNT_CLOEXEC?) that would be good.

>
>Only tangentially related, but I wonder if something like
>close_range(from, to)
>would be a more useful approach...  That kind of open-coded loops is
>not
>rare in userland and kernel-side code can do them much cheaper. 
>Something
>like
>	/* that exec is sensitive */
>	unshare(CLONE_FILES);
>	/* we don't want anything past stderr here */
>	close_range(3, ~0U);
>	execve(....);
>on the userland side of thing.  Comments?

Said it before but, the list was mistyped so again:
I think that's a great idea.
I have a prototype for close_range(start, end, flags).
I'll wait after rc1 and then send it out.

Christian

^ permalink raw reply

* Re: [PATCH 0/4] uapi, vfs: Change the mount API UAPI [ver #2]
From: David Howells @ 2019-05-17  7:13 UTC (permalink / raw)
  To: Christian Brauner
  Cc: dhowells, Al Viro, torvalds, Arnd Bergmann, linux-fsdevel,
	linux-kernel, linux-api
In-Reply-To: <F67AF221-C576-4424-88D7-7C6074D0A6C6@brauner.io>

Christian Brauner <christian@brauner.io> wrote:

> If you still prefer to have cloexec flags
> for the 4 new syscalls then yes,
> if they could at least all have the same name
> (FSMOUNT_CLOEXEC?) that would be good.

They don't all have the same value (see OPEN_TREE_CLOEXEC).

Note that I also don't want to blindly #define them to O_CLOEXEC because it's
not necessarily the same value on all arches.  Currently it can be 02000000,
010000000 or 0x400000 for instance, which means that if it's sharing a mask
with other flags, at least three bits have to be reserved for it or we have to
have arch-dependent bit juggling.

One thing I like about your approach of just making them O_CLOEXEC by default
and removing the constants is that it avoids this mess entirely.

David

^ permalink raw reply

* Re: [PATCH 0/4] uapi, vfs: Change the mount API UAPI [ver #2]
From: Miklos Szeredi @ 2019-05-17  7:25 UTC (permalink / raw)
  To: David Howells
  Cc: Christian Brauner, Al Viro, Linus Torvalds, Arnd Bergmann,
	linux-fsdevel, linux-kernel, linux-api
In-Reply-To: <11455.1558077206@warthog.procyon.org.uk>

On Fri, May 17, 2019 at 9:13 AM David Howells <dhowells@redhat.com> wrote:
>
> Christian Brauner <christian@brauner.io> wrote:
>
> > If you still prefer to have cloexec flags
> > for the 4 new syscalls then yes,
> > if they could at least all have the same name
> > (FSMOUNT_CLOEXEC?) that would be good.
>
> They don't all have the same value (see OPEN_TREE_CLOEXEC).
>
> Note that I also don't want to blindly #define them to O_CLOEXEC because it's
> not necessarily the same value on all arches.  Currently it can be 02000000,
> 010000000 or 0x400000 for instance, which means that if it's sharing a mask
> with other flags, at least three bits have to be reserved for it or we have to
> have arch-dependent bit juggling.
>
> One thing I like about your approach of just making them O_CLOEXEC by default
> and removing the constants is that it avoids this mess entirely.

+1.

Confusion caused by inconsistency of naming is going to hurt more than
inconsistency of semantics wrt. open(2).

Thanks,
Miklos

^ permalink raw reply

* Re: [PATCH 0/4] uapi, vfs: Change the mount API UAPI [ver #2]
From: Christian Brauner @ 2019-05-17  7:27 UTC (permalink / raw)
  Cc: dhowells, Al Viro, torvalds, Arnd Bergmann, linux-fsdevel,
	linux-kernel, linux-api
In-Reply-To: <11455.1558077206@warthog.procyon.org.uk>

On May 17, 2019 9:13:26 AM GMT+02:00, David Howells <dhowells@redhat.com> wrote:
>Christian Brauner <christian@brauner.io> wrote:
>
>> If you still prefer to have cloexec flags
>> for the 4 new syscalls then yes,
>> if they could at least all have the same name
>> (FSMOUNT_CLOEXEC?) that would be good.
>
>They don't all have the same value (see OPEN_TREE_CLOEXEC).
>
>Note that I also don't want to blindly #define them to O_CLOEXEC
>because it's
>not necessarily the same value on all arches.  Currently it can be
>02000000,
>010000000 or 0x400000 for instance, which means that if it's sharing a
>mask
>with other flags, at least three bits have to be reserved for it or we
>have to
>have arch-dependent bit juggling.


Ugh. Right, I forgot about that entirely.

Christian

>
>One thing I like about your approach of just making them O_CLOEXEC by
>default
>and removing the constants is that it avoids this mess entirely.
>
>David

^ permalink raw reply

* Re: [PATCH] Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.
From: Felipe Gasper @ 2019-05-17 14:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David S. Miller, Al Viro, LKML, Network Development, Linux API
In-Reply-To: <CALCETrUaTamZ1ZGbWpu+4kDAEFRqyESoa_4tgwpAmMh3NVQ4pQ@mail.gmail.com>


> On May 17, 2019, at 12:59 AM, Andy Lutomirski <luto@kernel.org> wrote:
> 
>> On May 16, 2019, at 8:25 PM, Felipe <felipe@felipegasper.com> wrote:
>> 
>> Author: Felipe Gasper <felipe@felipegasper.com>
>> Date:   Thu May 16 12:16:53 2019 -0500
>> 
>>   Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.
>> 
>>   This adds the ability for Netlink to report a socket’s UID along with the
>>   other UNIX socket diagnostic information that is already available. This will
>>   allow diagnostic tools greater insight into which users control which socket.
>> 
>>   Signed-off-by: Felipe Gasper <felipe@felipegasper.com>
>> 
>> diff --git a/include/uapi/linux/unix_diag.h b/include/uapi/linux/unix_diag.h
>> index 5c502fd..a198857 100644
>> --- a/include/uapi/linux/unix_diag.h
>> +++ b/include/uapi/linux/unix_diag.h
>> @@ -20,6 +20,7 @@ struct unix_diag_req {
>> #define UDIAG_SHOW_ICONS    0x00000008    /* show pending connections */
>> #define UDIAG_SHOW_RQLEN    0x00000010    /* show skb receive queue len */
>> #define UDIAG_SHOW_MEMINFO    0x00000020    /* show memory info of a socket */
>> +#define UDIAG_SHOW_UID        0x00000040    /* show socket's UID */
>> 
>> struct unix_diag_msg {
>>   __u8    udiag_family;
>> @@ -40,6 +41,7 @@ enum {
>>   UNIX_DIAG_RQLEN,
>>   UNIX_DIAG_MEMINFO,
>>   UNIX_DIAG_SHUTDOWN,
>> +    UNIX_DIAG_UID,
>> 
>>   __UNIX_DIAG_MAX,
>> };
>> diff --git a/net/unix/diag.c b/net/unix/diag.c
>> index 3183d9b..011f56c 100644
>> --- a/net/unix/diag.c
>> +++ b/net/unix/diag.c
>> @@ -110,6 +110,11 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb)
>>   return nla_put(nlskb, UNIX_DIAG_RQLEN, sizeof(rql), &rql);
>> }
>> 
>> +static int sk_diag_dump_uid(struct sock *sk, struct sk_buff *nlskb)
>> +{
>> +    return nla_put(nlskb, UNIX_DIAG_UID, sizeof(kuid_t), &(sk->sk_uid));
> 
> That type is called *k* uid_t because it’s internal to the kernel. You
> probably want from_kuid_munged(), which will fix it up for an
> appropriate userns.  Presumably you want sk’s netns’s userns.

Thank you for pointing this out.

Would it suffice to get the userns as: “sk_user_ns(sk)”?

Or would it be better to pass struct netlink_callback *cb from unix_diag_dump() to sk_diag_dump() to sk_diag_fill(), then to the new function to add the UID?

cheers,
-Felipe Gasper

^ permalink raw reply

* [PATCH v3 0/2] initramfs: add support for xattrs in the initial ram disk
From: Roberto Sassu @ 2019-05-17 16:55 UTC (permalink / raw)
  To: viro
  Cc: linux-security-module, linux-integrity, initramfs, linux-api,
	linux-fsdevel, linux-kernel, zohar, silviu.vlasceanu,
	dmitry.kasatkin, takondra, kamensky, hpa, arnd, rob,
	james.w.mcmechan, niveditas98, Roberto Sassu

This patch set aims at solving the following use case: appraise files from
the initial ram disk. To do that, IMA checks the signature/hash from the
security.ima xattr. Unfortunately, this use case cannot be implemented
currently, as the CPIO format does not support xattrs.

This proposal consists in marshaling pathnames and xattrs in a file called
.xattr-list. They are unmarshaled by the CPIO parser after all files have
been extracted, or before the next ram disk is processed.

The difference from v1 (https://lkml.org/lkml/2018/11/22/1182) is that all
xattrs are stored in a single file and not per file (solves the file name
limitation issue, as it is not necessary to add a suffix to files
containing xattrs).

The difference with another proposal
(https://lore.kernel.org/patchwork/cover/888071/) is that xattrs can be
included in an image without changing the image format, as opposed to
defining a new one. As seen from the discussion, if a new format has to be
defined, it should fix the issues of the existing format, which requires
more time.

To fulfill both requirements, adding support for xattrs in a short time and
defining a new image format properly, this patch set takes an incremental
approach: it introduces a parser of xattrs that can be used either if
xattrs are in a regular file or directly added to the image (this patch set
reuses patch 9/15 of the existing proposal); in addition, it introduces a
wrapper of the xattr parser, to read xattrs from a file.

The changes introduced by this patch set don't cause any compatibility
issue: kernels without the xattr parser simply extracts .xattr-list and
don't unmarshal xattrs; kernels with the xattr parser don't unmarshal
xattrs if .xattr-list is not found in the image.

>From the kernel space perspective, backporting this functionality to older
kernels should be very easy. It is sufficient to add two calls to the new
function do_readxattrs(). From the user space perspective, no change is
required for the use case. A new dracut module (module-setup.sh) will
execute:

getfattr --absolute-names -d -h -R -e hex -m security.ima \
    <file list> | xattr.awk -b > ${initdir}/.xattr-list

where xattr.awk is the script that marshals xattrs (see patch 3/3). The
same can be done with the initramfs-tools ram disk generator.

Changelog

v2:
- replace ksys_lsetxattr() with kern_path() and vfs_setxattr()
  (suggested by Jann Horn)
- replace ksys_open()/ksys_read()/ksys_close() with
  filp_open()/kernel_read()/fput()
  (suggested by Jann Horn)
- use path variable instead of name_buf in do_readxattrs()
- set last byte of str to 0 in do_readxattrs()
- call do_readxattrs() in do_name() before replacing an existing
  .xattr-list
- pass pathname to do_setxattrs()


Mimi Zohar (1):
  initramfs: set extended attributes

Roberto Sassu (1):
  initramfs: introduce do_readxattrs()

 init/initramfs.c | 170 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 168 insertions(+), 2 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH v3 1/2] initramfs: set extended attributes
From: Roberto Sassu @ 2019-05-17 16:55 UTC (permalink / raw)
  To: viro
  Cc: linux-security-module, linux-integrity, initramfs, linux-api,
	linux-fsdevel, linux-kernel, zohar, silviu.vlasceanu,
	dmitry.kasatkin, takondra, kamensky, hpa, arnd, rob,
	james.w.mcmechan, niveditas98, Roberto Sassu
In-Reply-To: <20190517165519.11507-1-roberto.sassu@huawei.com>

From: Mimi Zohar <zohar@linux.vnet.ibm.com>

This patch adds xattrs to a file, with name and value taken from a supplied
buffer. The data format is:

<xattr #N data len (ASCII, 8 chars)><xattr #N name>\0<xattr #N value>

[kamensky: fixed restoring of xattrs for symbolic links by using
           sys_lsetxattr() instead of sys_setxattr()]

[sassu: removed state management, kept only do_setxattrs(), replaced
        sys_lsetxattr() with vfs_setxattr(), added check for
        xattr_entry_size, added check for hdr->c_size, replaced strlen()
        with strnlen(); moved do_setxattrs() before do_name()]

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Taras Kondratiuk <takondra@cisco.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
---
 init/initramfs.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 69 insertions(+), 2 deletions(-)

diff --git a/init/initramfs.c b/init/initramfs.c
index 435a428c2af1..0c6dd1d5d3f6 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -10,6 +10,8 @@
 #include <linux/syscalls.h>
 #include <linux/utime.h>
 #include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/xattr.h>
 
 static ssize_t __init xwrite(int fd, const char *p, size_t count)
 {
@@ -146,7 +148,8 @@ static __initdata time64_t mtime;
 
 static __initdata unsigned long ino, major, minor, nlink;
 static __initdata umode_t mode;
-static __initdata unsigned long body_len, name_len;
+static __initdata u32 name_len, xattr_len;
+static __initdata u64 body_len;
 static __initdata uid_t uid;
 static __initdata gid_t gid;
 static __initdata unsigned rdev;
@@ -218,7 +221,7 @@ static void __init read_into(char *buf, unsigned size, enum state next)
 	}
 }
 
-static __initdata char *header_buf, *symlink_buf, *name_buf;
+static __initdata char *header_buf, *symlink_buf, *name_buf, *xattr_buf;
 
 static int __init do_start(void)
 {
@@ -315,6 +318,70 @@ static int __init maybe_link(void)
 	return 0;
 }
 
+struct xattr_hdr {
+	char c_size[8]; /* total size including c_size field */
+	char c_data[];  /* <name>\0<value> */
+};
+
+static int __init __maybe_unused do_setxattrs(char *pathname)
+{
+	char *buf = xattr_buf;
+	char *bufend = buf + xattr_len;
+	struct xattr_hdr *hdr;
+	char str[sizeof(hdr->c_size) + 1];
+	struct path path;
+
+	if (!xattr_len)
+		return 0;
+
+	str[sizeof(hdr->c_size)] = 0;
+
+	while (buf < bufend) {
+		char *xattr_name, *xattr_value;
+		unsigned long xattr_entry_size;
+		unsigned long xattr_name_size, xattr_value_size;
+		int ret;
+
+		if (buf + sizeof(hdr->c_size) > bufend) {
+			error("malformed xattrs");
+			break;
+		}
+
+		hdr = (struct xattr_hdr *)buf;
+		memcpy(str, hdr->c_size, sizeof(hdr->c_size));
+		ret = kstrtoul(str, 16, &xattr_entry_size);
+		buf += xattr_entry_size;
+		if (ret || buf > bufend || !xattr_entry_size) {
+			error("malformed xattrs");
+			break;
+		}
+
+		xattr_name = hdr->c_data;
+		xattr_name_size = strnlen(xattr_name,
+					xattr_entry_size - sizeof(hdr->c_size));
+		if (xattr_name_size == xattr_entry_size - sizeof(hdr->c_size)) {
+			error("malformed xattrs");
+			break;
+		}
+
+		xattr_value = xattr_name + xattr_name_size + 1;
+		xattr_value_size = buf - xattr_value;
+
+		ret = kern_path(pathname, 0, &path);
+		if (!ret) {
+			ret = vfs_setxattr(path.dentry, xattr_name, xattr_value,
+					   xattr_value_size, 0);
+
+			path_put(&path);
+		}
+
+		pr_debug("%s: %s size: %lu val: %s (ret: %d)\n", pathname,
+			 xattr_name, xattr_value_size, xattr_value, ret);
+	}
+
+	return 0;
+}
+
 static __initdata int wfd;
 
 static int __init do_name(void)
-- 
2.17.1

^ permalink raw reply related

* [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: Roberto Sassu @ 2019-05-17 16:55 UTC (permalink / raw)
  To: viro
  Cc: linux-security-module, linux-integrity, initramfs, linux-api,
	linux-fsdevel, linux-kernel, zohar, silviu.vlasceanu,
	dmitry.kasatkin, takondra, kamensky, hpa, arnd, rob,
	james.w.mcmechan, niveditas98, Roberto Sassu
In-Reply-To: <20190517165519.11507-1-roberto.sassu@huawei.com>

This patch adds support for an alternative method to add xattrs to files in
the rootfs filesystem. Instead of extracting them directly from the ram
disk image, they are extracted from a regular file called .xattr-list, that
can be added by any ram disk generator available today. The file format is:

<file #N data len (ASCII, 10 chars)><file #N path>\0
<xattr #N data len (ASCII, 8 chars)><xattr #N name>\0<xattr #N value>

.xattr-list can be generated by executing:

$ getfattr --absolute-names -d -h -R -e hex -m - \
      <file list> | xattr.awk -b > ${initdir}/.xattr-list

where the content of the xattr.awk script is:

#! /usr/bin/awk -f
{
  if (!length($0)) {
    printf("%.10x%s\0", len, file);
    for (x in xattr) {
      printf("%.8x%s\0", xattr_len[x], x);
      for (i = 0; i < length(xattr[x]) / 2; i++) {
        printf("%c", strtonum("0x"substr(xattr[x], i * 2 + 1, 2)));
      }
    }
    i = 0;
    delete xattr;
    delete xattr_len;
    next;
  };
  if (i == 0) {
    file=$3;
    len=length(file) + 8 + 1;
  }
  if (i > 0) {
    split($0, a, "=");
    xattr[a[1]]=substr(a[2], 3);
    xattr_len[a[1]]=length(a[1]) + 1 + 8 + length(xattr[a[1]]) / 2;
    len+=xattr_len[a[1]];
  };
  i++;
}

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
---
 init/initramfs.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 99 insertions(+)

diff --git a/init/initramfs.c b/init/initramfs.c
index 0c6dd1d5d3f6..6ec018c6279a 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -13,6 +13,8 @@
 #include <linux/namei.h>
 #include <linux/xattr.h>
 
+#define XATTR_LIST_FILENAME ".xattr-list"
+
 static ssize_t __init xwrite(int fd, const char *p, size_t count)
 {
 	ssize_t out = 0;
@@ -382,6 +384,97 @@ static int __init __maybe_unused do_setxattrs(char *pathname)
 	return 0;
 }
 
+struct path_hdr {
+	char p_size[10]; /* total size including p_size field */
+	char p_data[];   /* <path>\0<xattrs> */
+};
+
+static int __init do_readxattrs(void)
+{
+	struct path_hdr hdr;
+	char *path = NULL;
+	char str[sizeof(hdr.p_size) + 1];
+	unsigned long file_entry_size;
+	size_t size, path_size, total_size;
+	struct kstat st;
+	struct file *file;
+	loff_t pos;
+	int ret;
+
+	ret = vfs_lstat(XATTR_LIST_FILENAME, &st);
+	if (ret < 0)
+		return ret;
+
+	total_size = st.size;
+
+	file = filp_open(XATTR_LIST_FILENAME, O_RDONLY, 0);
+	if (IS_ERR(file))
+		return PTR_ERR(file);
+
+	pos = file->f_pos;
+
+	while (total_size) {
+		size = kernel_read(file, (char *)&hdr, sizeof(hdr), &pos);
+		if (size != sizeof(hdr)) {
+			ret = -EIO;
+			goto out;
+		}
+
+		total_size -= size;
+
+		str[sizeof(hdr.p_size)] = 0;
+		memcpy(str, hdr.p_size, sizeof(hdr.p_size));
+		ret = kstrtoul(str, 16, &file_entry_size);
+		if (ret < 0)
+			goto out;
+
+		file_entry_size -= sizeof(sizeof(hdr.p_size));
+		if (file_entry_size > total_size) {
+			ret = -EINVAL;
+			goto out;
+		}
+
+		path = vmalloc(file_entry_size);
+		if (!path) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		size = kernel_read(file, path, file_entry_size, &pos);
+		if (size != file_entry_size) {
+			ret = -EIO;
+			goto out_free;
+		}
+
+		total_size -= size;
+
+		path_size = strnlen(path, file_entry_size);
+		if (path_size == file_entry_size) {
+			ret = -EINVAL;
+			goto out_free;
+		}
+
+		xattr_buf = path + path_size + 1;
+		xattr_len = file_entry_size - path_size - 1;
+
+		ret = do_setxattrs(path);
+		vfree(path);
+		path = NULL;
+
+		if (ret < 0)
+			break;
+	}
+out_free:
+	vfree(path);
+out:
+	fput(file);
+
+	if (ret < 0)
+		error("Unable to parse xattrs");
+
+	return ret;
+}
+
 static __initdata int wfd;
 
 static int __init do_name(void)
@@ -391,6 +484,11 @@ static int __init do_name(void)
 	if (strcmp(collected, "TRAILER!!!") == 0) {
 		free_hash();
 		return 0;
+	} else if (strcmp(collected, XATTR_LIST_FILENAME) == 0) {
+		struct kstat st;
+
+		if (!vfs_lstat(collected, &st))
+			do_readxattrs();
 	}
 	clean_path(collected, mode);
 	if (S_ISREG(mode)) {
@@ -562,6 +660,7 @@ static char * __init unpack_to_rootfs(char *buf, unsigned long len)
 		buf += my_inptr;
 		len -= my_inptr;
 	}
+	do_readxattrs();
 	dir_utime();
 	kfree(name_buf);
 	kfree(symlink_buf);
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: hpa @ 2019-05-17 20:18 UTC (permalink / raw)
  To: Roberto Sassu, viro
  Cc: linux-security-module, linux-integrity, initramfs, linux-api,
	linux-fsdevel, linux-kernel, zohar, silviu.vlasceanu,
	dmitry.kasatkin, takondra, kamensky, arnd, rob, james.w.mcmechan,
	niveditas98
In-Reply-To: <20190517165519.11507-3-roberto.sassu@huawei.com>

On May 17, 2019 9:55:19 AM PDT, Roberto Sassu <roberto.sassu@huawei.com> wrote:
>This patch adds support for an alternative method to add xattrs to
>files in
>the rootfs filesystem. Instead of extracting them directly from the ram
>disk image, they are extracted from a regular file called .xattr-list,
>that
>can be added by any ram disk generator available today. The file format
>is:
>
><file #N data len (ASCII, 10 chars)><file #N path>\0
><xattr #N data len (ASCII, 8 chars)><xattr #N name>\0<xattr #N value>
>
>.xattr-list can be generated by executing:
>
>$ getfattr --absolute-names -d -h -R -e hex -m - \
>      <file list> | xattr.awk -b > ${initdir}/.xattr-list
>
>where the content of the xattr.awk script is:
>
>#! /usr/bin/awk -f
>{
>  if (!length($0)) {
>    printf("%.10x%s\0", len, file);
>    for (x in xattr) {
>      printf("%.8x%s\0", xattr_len[x], x);
>      for (i = 0; i < length(xattr[x]) / 2; i++) {
>        printf("%c", strtonum("0x"substr(xattr[x], i * 2 + 1, 2)));
>      }
>    }
>    i = 0;
>    delete xattr;
>    delete xattr_len;
>    next;
>  };
>  if (i == 0) {
>    file=$3;
>    len=length(file) + 8 + 1;
>  }
>  if (i > 0) {
>    split($0, a, "=");
>    xattr[a[1]]=substr(a[2], 3);
>    xattr_len[a[1]]=length(a[1]) + 1 + 8 + length(xattr[a[1]]) / 2;
>    len+=xattr_len[a[1]];
>  };
>  i++;
>}
>
>Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
>---
> init/initramfs.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 99 insertions(+)
>
>diff --git a/init/initramfs.c b/init/initramfs.c
>index 0c6dd1d5d3f6..6ec018c6279a 100644
>--- a/init/initramfs.c
>+++ b/init/initramfs.c
>@@ -13,6 +13,8 @@
> #include <linux/namei.h>
> #include <linux/xattr.h>
> 
>+#define XATTR_LIST_FILENAME ".xattr-list"
>+
> static ssize_t __init xwrite(int fd, const char *p, size_t count)
> {
> 	ssize_t out = 0;
>@@ -382,6 +384,97 @@ static int __init __maybe_unused do_setxattrs(char
>*pathname)
> 	return 0;
> }
> 
>+struct path_hdr {
>+	char p_size[10]; /* total size including p_size field */
>+	char p_data[];   /* <path>\0<xattrs> */
>+};
>+
>+static int __init do_readxattrs(void)
>+{
>+	struct path_hdr hdr;
>+	char *path = NULL;
>+	char str[sizeof(hdr.p_size) + 1];
>+	unsigned long file_entry_size;
>+	size_t size, path_size, total_size;
>+	struct kstat st;
>+	struct file *file;
>+	loff_t pos;
>+	int ret;
>+
>+	ret = vfs_lstat(XATTR_LIST_FILENAME, &st);
>+	if (ret < 0)
>+		return ret;
>+
>+	total_size = st.size;
>+
>+	file = filp_open(XATTR_LIST_FILENAME, O_RDONLY, 0);
>+	if (IS_ERR(file))
>+		return PTR_ERR(file);
>+
>+	pos = file->f_pos;
>+
>+	while (total_size) {
>+		size = kernel_read(file, (char *)&hdr, sizeof(hdr), &pos);
>+		if (size != sizeof(hdr)) {
>+			ret = -EIO;
>+			goto out;
>+		}
>+
>+		total_size -= size;
>+
>+		str[sizeof(hdr.p_size)] = 0;
>+		memcpy(str, hdr.p_size, sizeof(hdr.p_size));
>+		ret = kstrtoul(str, 16, &file_entry_size);
>+		if (ret < 0)
>+			goto out;
>+
>+		file_entry_size -= sizeof(sizeof(hdr.p_size));
>+		if (file_entry_size > total_size) {
>+			ret = -EINVAL;
>+			goto out;
>+		}
>+
>+		path = vmalloc(file_entry_size);
>+		if (!path) {
>+			ret = -ENOMEM;
>+			goto out;
>+		}
>+
>+		size = kernel_read(file, path, file_entry_size, &pos);
>+		if (size != file_entry_size) {
>+			ret = -EIO;
>+			goto out_free;
>+		}
>+
>+		total_size -= size;
>+
>+		path_size = strnlen(path, file_entry_size);
>+		if (path_size == file_entry_size) {
>+			ret = -EINVAL;
>+			goto out_free;
>+		}
>+
>+		xattr_buf = path + path_size + 1;
>+		xattr_len = file_entry_size - path_size - 1;
>+
>+		ret = do_setxattrs(path);
>+		vfree(path);
>+		path = NULL;
>+
>+		if (ret < 0)
>+			break;
>+	}
>+out_free:
>+	vfree(path);
>+out:
>+	fput(file);
>+
>+	if (ret < 0)
>+		error("Unable to parse xattrs");
>+
>+	return ret;
>+}
>+
> static __initdata int wfd;
> 
> static int __init do_name(void)
>@@ -391,6 +484,11 @@ static int __init do_name(void)
> 	if (strcmp(collected, "TRAILER!!!") == 0) {
> 		free_hash();
> 		return 0;
>+	} else if (strcmp(collected, XATTR_LIST_FILENAME) == 0) {
>+		struct kstat st;
>+
>+		if (!vfs_lstat(collected, &st))
>+			do_readxattrs();
> 	}
> 	clean_path(collected, mode);
> 	if (S_ISREG(mode)) {
>@@ -562,6 +660,7 @@ static char * __init unpack_to_rootfs(char *buf,
>unsigned long len)
> 		buf += my_inptr;
> 		len -= my_inptr;
> 	}
>+	do_readxattrs();
> 	dir_utime();
> 	kfree(name_buf);
> 	kfree(symlink_buf);

Ok... I just realized this does not work for a modular initramfs, composed at load time from multiple files, which is a very real problem. Should be easy enough to deal with: instead of one large file, use one companion file per source file, perhaps something like filename..xattrs (suggesting double dots to make it less likely to conflict with a "real" file.) No leading dot, as it makes it more likely that archivers will sort them before the file proper.

A side benefit is that the format can be simpler as there is no need to encode the filename.

A technically cleaner solution still, but which would need archiver modifications, would be to encode the xattrs as an optionally nameless file (just an empty string) with a new file mode value, immediately following the original file. The advantage there is that the archiver itself could support xattrs and other extended metadata (which has been requested elsewhere); the disadvantage obviously is that that it requires new support in the archiver. However, at least it ought to be simpler since it is still a higher protocol level than the cpio archive itself.

There's already one special case in cpio, which is the "!!!TRAILER!!!" filename; although I don't think it is part of the formal spec, to the extent there is one, I would expect that in practice it is always encoded with a mode of 0, which incidentally could be used to unbreak the case where such a filename actually exists. So one way to support such extended metadata would be to set mode to 0 and use the filename to encode the type of metadata. I wonder how existing GNU or BSD cpio (the BSD one is better maintained these days) would deal with reading such a file; it would at least not be a regression if it just read it still, possibly with warnings. It could also be possible to use bits 17:16 in the mode, which are traditionally always zero (mode_t being 16 bits), but I believe are present in most or all of the cpio formats for historical reasons. It might be accepted better by existing implementations to use one of these high bits combined with S_IFREG, I dont know.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply

* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: Arvind Sankar @ 2019-05-17 21:02 UTC (permalink / raw)
  To: hpa
  Cc: Roberto Sassu, viro, linux-security-module, linux-integrity,
	initramfs, linux-api, linux-fsdevel, linux-kernel, zohar,
	silviu.vlasceanu, dmitry.kasatkin, takondra, kamensky, arnd, rob,
	james.w.mcmechan, niveditas98
In-Reply-To: <CD9A4F89-7CA5-4329-A06A-F8DEB87905A5@zytor.com>

On Fri, May 17, 2019 at 01:18:11PM -0700, hpa@zytor.com wrote:
> 
> Ok... I just realized this does not work for a modular initramfs, composed at load time from multiple files, which is a very real problem. Should be easy enough to deal with: instead of one large file, use one companion file per source file, perhaps something like filename..xattrs (suggesting double dots to make it less likely to conflict with a "real" file.) No leading dot, as it makes it more likely that archivers will sort them before the file proper.
This version of the patch was changed from the previous one exactly to deal with this case --
it allows for the bootloader to load multiple initramfs archives, each
with its own .xattr-list file, and to have that work properly.
Could you elaborate on the issue that you see?

^ permalink raw reply

* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: Arvind Sankar @ 2019-05-17 21:10 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: hpa, Roberto Sassu, viro, linux-security-module, linux-integrity,
	initramfs, linux-api, linux-fsdevel, linux-kernel, zohar,
	silviu.vlasceanu, dmitry.kasatkin, takondra, kamensky, arnd, rob,
	james.w.mcmechan, niveditas98
In-Reply-To: <20190517210219.GA5998@rani.riverdale.lan>

On Fri, May 17, 2019 at 05:02:20PM -0400, Arvind Sankar wrote:
> On Fri, May 17, 2019 at 01:18:11PM -0700, hpa@zytor.com wrote:
> > 
> > Ok... I just realized this does not work for a modular initramfs, composed at load time from multiple files, which is a very real problem. Should be easy enough to deal with: instead of one large file, use one companion file per source file, perhaps something like filename..xattrs (suggesting double dots to make it less likely to conflict with a "real" file.) No leading dot, as it makes it more likely that archivers will sort them before the file proper.
> This version of the patch was changed from the previous one exactly to deal with this case --
> it allows for the bootloader to load multiple initramfs archives, each
> with its own .xattr-list file, and to have that work properly.
> Could you elaborate on the issue that you see?
Roberto, are you missing a changelog entry for v2->v3 change?

^ permalink raw reply

* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: Rob Landley @ 2019-05-17 21:17 UTC (permalink / raw)
  To: hpa, Roberto Sassu, viro
  Cc: linux-security-module, linux-integrity, initramfs, linux-api,
	linux-fsdevel, linux-kernel, zohar, silviu.vlasceanu,
	dmitry.kasatkin, takondra, kamensky, arnd, james.w.mcmechan,
	niveditas98
In-Reply-To: <CD9A4F89-7CA5-4329-A06A-F8DEB87905A5@zytor.com>

On 5/17/19 3:18 PM, hpa@zytor.com wrote:
> Ok... I just realized this does not work for a modular initramfs, composed at load time from multiple files, which is a very real problem. Should be easy enough to deal with: instead of one large file, use one companion file per source file, perhaps something like filename..xattrs (suggesting double dots to make it less likely to conflict with a "real" file.) No leading dot, as it makes it more likely that archivers will sort them before the file proper.
> 
> A side benefit is that the format can be simpler as there is no need to encode the filename.
> 
> A technically cleaner solution still, but which would need archiver modifications, would be to encode the xattrs as an optionally nameless file (just an empty string) with a new file mode value, immediately following the original file. The advantage there is that the archiver itself could support xattrs and other extended metadata (which has been requested elsewhere); the disadvantage obviously is that that it requires new support in the archiver. However, at least it ought to be simpler since it is still a higher protocol level than the cpio archive itself.
> 
> There's already one special case in cpio, which is the "!!!TRAILER!!!" filename; although I don't think it is part of the formal spec, to the extent there is one, I would expect that in practice it is always encoded with a mode of 0, which incidentally could be used to unbreak the case where such a filename actually exists. So one way to support such extended metadata would be to set mode to 0 and use the filename to encode the type of metadata. I wonder how existing GNU or BSD cpio (the BSD one is better maintained these days) would deal with reading such a file; it would at least not be a regression if it just read it still, possibly with warnings. It could also be possible to use bits 17:16 in the mode, which are traditionally always zero (mode_t being 16 bits), but I believe are pres
 ent in most or all of the cpio formats for historical reasons. It might be accepted better by existing implementations to use one of these high bits combined with S_IFREG, I dont know.
> 

I'll happily modify toybox cpio to understand xattrs (compress and decompress),
the android guys do a lot with xattrs already. I tapped out of _this_ discussion
from disgust with the proposed encoding.

Rob

^ permalink raw reply

* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: H. Peter Anvin @ 2019-05-17 21:41 UTC (permalink / raw)
  To: Roberto Sassu, viro
  Cc: linux-security-module, linux-integrity, initramfs, linux-api,
	linux-fsdevel, linux-kernel, zohar, silviu.vlasceanu,
	dmitry.kasatkin, takondra, kamensky, arnd, rob, james.w.mcmechan,
	niveditas98
In-Reply-To: <CD9A4F89-7CA5-4329-A06A-F8DEB87905A5@zytor.com>

On 5/17/19 1:18 PM, hpa@zytor.com wrote:
> 
> Ok... I just realized this does not work for a modular initramfs, composed at load time from multiple files, which is a very real problem. Should be easy enough to deal with: instead of one large file, use one companion file per source file, perhaps something like filename..xattrs (suggesting double dots to make it less likely to conflict with a "real" file.) No leading dot, as it makes it more likely that archivers will sort them before the file proper.
> 
> A side benefit is that the format can be simpler as there is no need to encode the filename.
> 
> A technically cleaner solution still, but which would need archiver modifications, would be to encode the xattrs as an optionally nameless file (just an empty string) with a new file mode value, immediately following the original file. The advantage there is that the archiver itself could support xattrs and other extended metadata (which has been requested elsewhere); the disadvantage obviously is that that it requires new support in the archiver. However, at least it ought to be simpler since it is still a higher protocol level than the cpio archive itself.
> 
> There's already one special case in cpio, which is the "!!!TRAILER!!!" filename; although I don't think it is part of the formal spec, to the extent there is one, I would expect that in practice it is always encoded with a mode of 0, which incidentally could be used to unbreak the case where such a filename actually exists. So one way to support such extended metadata would be to set mode to 0 and use the filename to encode the type of metadata. I wonder how existing GNU or BSD cpio (the BSD one is better maintained these days) would deal with reading such a file; it would at least not be a regression if it just read it still, possibly with warnings. It could also be possible to use bits 17:16 in the mode, which are traditionally always zero (mode_t being 16 bits), but I believe are pres
 ent in most or all of the cpio formats for historical reasons. It might be accepted better by existing implementations to use one of these high bits combined with S_IFREG, I dont know.
> 

Correction: it's just !!!TRAILER!!!.

I tested with GNU cpio, BSD cpio, scpio and pax.

With a mode of 0:

	- GNU cpio errors, but extracts all the other files.
	- BSD cpio extracts them as regular files.
	- scpio and pax abort.

With a mode of 0x18000 (bit 16 + S_IFREG), all of them happily extracted
the data as regular files.

	-hpa

^ permalink raw reply

* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: H. Peter Anvin @ 2019-05-17 21:47 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Roberto Sassu, viro, linux-security-module, linux-integrity,
	initramfs, linux-api, linux-fsdevel, linux-kernel, zohar,
	silviu.vlasceanu, dmitry.kasatkin, takondra, kamensky, arnd, rob,
	james.w.mcmechan, niveditas98
In-Reply-To: <20190517210219.GA5998@rani.riverdale.lan>

On 5/17/19 2:02 PM, Arvind Sankar wrote:
> On Fri, May 17, 2019 at 01:18:11PM -0700, hpa@zytor.com wrote:
>>
>> Ok... I just realized this does not work for a modular initramfs, composed at load time from multiple files, which is a very real problem. Should be easy enough to deal with: instead of one large file, use one companion file per source file, perhaps something like filename..xattrs (suggesting double dots to make it less likely to conflict with a "real" file.) No leading dot, as it makes it more likely that archivers will sort them before the file proper.
> This version of the patch was changed from the previous one exactly to deal with this case --
> it allows for the bootloader to load multiple initramfs archives, each
> with its own .xattr-list file, and to have that work properly.
> Could you elaborate on the issue that you see?
> 

Well, for one thing, how do you define "cpio archive", each with its own
.xattr-list file? Second, that would seem to depend on the ordering, no,
in which case you depend critically on .xattr-list file following the
files, which most archivers won't do.

Either way it seems cleaner to have this per file; especially if/as it
can be done without actually mucking up the format.

I need to run, but I'll post a more detailed explanation of what I did
in a little bit.

	-hpa

^ permalink raw reply

* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: Arvind Sankar @ 2019-05-17 22:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Arvind Sankar, Roberto Sassu, viro, linux-security-module,
	linux-integrity, initramfs, linux-api, linux-fsdevel,
	linux-kernel, zohar, silviu.vlasceanu, dmitry.kasatkin, takondra,
	kamensky, arnd, rob, james.w.mcmechan, niveditas98
In-Reply-To: <d48f35a1-aab1-2f20-2e91-5e81a84b107f@zytor.com>

On Fri, May 17, 2019 at 02:47:31PM -0700, H. Peter Anvin wrote:
> On 5/17/19 2:02 PM, Arvind Sankar wrote:
> > On Fri, May 17, 2019 at 01:18:11PM -0700, hpa@zytor.com wrote:
> >>
> >> Ok... I just realized this does not work for a modular initramfs, composed at load time from multiple files, which is a very real problem. Should be easy enough to deal with: instead of one large file, use one companion file per source file, perhaps something like filename..xattrs (suggesting double dots to make it less likely to conflict with a "real" file.) No leading dot, as it makes it more likely that archivers will sort them before the file proper.
> > This version of the patch was changed from the previous one exactly to deal with this case --
> > it allows for the bootloader to load multiple initramfs archives, each
> > with its own .xattr-list file, and to have that work properly.
> > Could you elaborate on the issue that you see?
> > 
> 
> Well, for one thing, how do you define "cpio archive", each with its own
> .xattr-list file? Second, that would seem to depend on the ordering, no,
> in which case you depend critically on .xattr-list file following the
> files, which most archivers won't do.
> 
> Either way it seems cleaner to have this per file; especially if/as it
> can be done without actually mucking up the format.
> 
> I need to run, but I'll post a more detailed explanation of what I did
> in a little bit.
> 
> 	-hpa
> 
Not sure what you mean by how do I define it? Each cpio archive will
contain its own .xattr-list file with signatures for the files within
it, that was the idea.

You need to review the code more closely I think -- it does not depend
on the .xattr-list file following the files to which it applies.

The code first extracts .xattr-list as though it was a regular file. If
a later dupe shows up (presumably from a second archive, although the
patch will actually allow a second one in the same archive), it will
then process the existing .xattr-list file and apply the attributes
listed within it. It then will proceed to read the second one and
overwrite the first one with it (this is the normal behaviour in the
kernel cpio parser). At the end once all the archives have been
extracted, if there is an .xattr-list file in the rootfs it will be
parsed (it would've been the last one encountered, which hasn't been
parsed yet, just extracted).

Regarding the idea to use the high 16 bits of the mode field in
the header that's another possibility. It would just require additional
support in the program that actually creates the archive though, which
the current patch doesn't.

^ permalink raw reply

* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: kbuild test robot @ 2019-05-17 23:09 UTC (permalink / raw)
  Cc: kbuild-all, viro, linux-security-module, linux-integrity,
	initramfs, linux-api, linux-fsdevel, linux-kernel, zohar,
	silviu.vlasceanu, dmitry.kasatkin, takondra, kamensky, hpa, arnd,
	rob, james.w.mcmechan, niveditas98, Roberto Sassu
In-Reply-To: <20190517165519.11507-3-roberto.sassu@huawei.com>

[-- Attachment #1: Type: text/plain, Size: 3884 bytes --]

Hi Roberto,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.1 next-20190517]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Roberto-Sassu/initramfs-set-extended-attributes/20190518-055846
config: sparc64-allyesconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.2.0 make.cross ARCH=sparc64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   init/initramfs.c: In function 'do_readxattrs':
>> init/initramfs.c:437:10: error: implicit declaration of function 'vmalloc'; did you mean 'kvmalloc'? [-Werror=implicit-function-declaration]
      path = vmalloc(file_entry_size);
             ^~~~~~~
             kvmalloc
>> init/initramfs.c:437:8: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
      path = vmalloc(file_entry_size);
           ^
>> init/initramfs.c:461:3: error: implicit declaration of function 'vfree'; did you mean 'kvfree'? [-Werror=implicit-function-declaration]
      vfree(path);
      ^~~~~
      kvfree
   cc1: some warnings being treated as errors

vim +437 init/initramfs.c

   391	
   392	static int __init do_readxattrs(void)
   393	{
   394		struct path_hdr hdr;
   395		char *path = NULL;
   396		char str[sizeof(hdr.p_size) + 1];
   397		unsigned long file_entry_size;
   398		size_t size, path_size, total_size;
   399		struct kstat st;
   400		struct file *file;
   401		loff_t pos;
   402		int ret;
   403	
   404		ret = vfs_lstat(XATTR_LIST_FILENAME, &st);
   405		if (ret < 0)
   406			return ret;
   407	
   408		total_size = st.size;
   409	
   410		file = filp_open(XATTR_LIST_FILENAME, O_RDONLY, 0);
   411		if (IS_ERR(file))
   412			return PTR_ERR(file);
   413	
   414		pos = file->f_pos;
   415	
   416		while (total_size) {
   417			size = kernel_read(file, (char *)&hdr, sizeof(hdr), &pos);
   418			if (size != sizeof(hdr)) {
   419				ret = -EIO;
   420				goto out;
   421			}
   422	
   423			total_size -= size;
   424	
   425			str[sizeof(hdr.p_size)] = 0;
   426			memcpy(str, hdr.p_size, sizeof(hdr.p_size));
   427			ret = kstrtoul(str, 16, &file_entry_size);
   428			if (ret < 0)
   429				goto out;
   430	
   431			file_entry_size -= sizeof(sizeof(hdr.p_size));
   432			if (file_entry_size > total_size) {
   433				ret = -EINVAL;
   434				goto out;
   435			}
   436	
 > 437			path = vmalloc(file_entry_size);
   438			if (!path) {
   439				ret = -ENOMEM;
   440				goto out;
   441			}
   442	
   443			size = kernel_read(file, path, file_entry_size, &pos);
   444			if (size != file_entry_size) {
   445				ret = -EIO;
   446				goto out_free;
   447			}
   448	
   449			total_size -= size;
   450	
   451			path_size = strnlen(path, file_entry_size);
   452			if (path_size == file_entry_size) {
   453				ret = -EINVAL;
   454				goto out_free;
   455			}
   456	
   457			xattr_buf = path + path_size + 1;
   458			xattr_len = file_entry_size - path_size - 1;
   459	
   460			ret = do_setxattrs(path);
 > 461			vfree(path);
   462			path = NULL;
   463	
   464			if (ret < 0)
   465				break;
   466		}
   467	out_free:
   468		vfree(path);
   469	out:
   470		fput(file);
   471	
   472		if (ret < 0)
   473			error("Unable to parse xattrs");
   474	
   475		return ret;
   476	}
   477	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 57391 bytes --]

^ permalink raw reply

* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: kbuild test robot @ 2019-05-18  1:02 UTC (permalink / raw)
  Cc: kbuild-all, viro, linux-security-module, linux-integrity,
	initramfs, linux-api, linux-fsdevel, linux-kernel, zohar,
	silviu.vlasceanu, dmitry.kasatkin, takondra, kamensky, hpa, arnd,
	rob, james.w.mcmechan, niveditas98, Roberto Sassu
In-Reply-To: <20190517165519.11507-3-roberto.sassu@huawei.com>

[-- Attachment #1: Type: text/plain, Size: 5380 bytes --]

Hi Roberto,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.1 next-20190517]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Roberto-Sassu/initramfs-set-extended-attributes/20190518-055846
config: m68k-allyesconfig (attached as .config)
compiler: m68k-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.2.0 make.cross ARCH=m68k 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   init/initramfs.c: In function 'do_readxattrs':
   init/initramfs.c:437:10: error: implicit declaration of function 'vmalloc'; did you mean 'kvmalloc'? [-Werror=implicit-function-declaration]
      path = vmalloc(file_entry_size);
             ^~~~~~~
             kvmalloc
   init/initramfs.c:437:8: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
      path = vmalloc(file_entry_size);
           ^
   init/initramfs.c:461:3: error: implicit declaration of function 'vfree'; did you mean 'kvfree'? [-Werror=implicit-function-declaration]
      vfree(path);
      ^~~~~
      kvfree
   In file included from include/asm-generic/io.h:891:0,
                    from arch/m68k/include/asm/io.h:11,
                    from include/linux/kexec.h:19,
                    from init/initramfs.c:694:
   include/linux/vmalloc.h: At top level:
>> include/linux/vmalloc.h:77:14: error: conflicting types for 'vmalloc'
    extern void *vmalloc(unsigned long size);
                 ^~~~~~~
   init/initramfs.c:437:10: note: previous implicit declaration of 'vmalloc' was here
      path = vmalloc(file_entry_size);
             ^~~~~~~
   In file included from include/asm-generic/io.h:891:0,
                    from arch/m68k/include/asm/io.h:11,
                    from include/linux/kexec.h:19,
                    from init/initramfs.c:694:
>> include/linux/vmalloc.h:102:13: warning: conflicting types for 'vfree'
    extern void vfree(const void *addr);
                ^~~~~
   init/initramfs.c:461:3: note: previous implicit declaration of 'vfree' was here
      vfree(path);
      ^~~~~
   cc1: some warnings being treated as errors

vim +/vmalloc +77 include/linux/vmalloc.h

db64fe02 Nick Piggin       2008-10-18   76  
^1da177e Linus Torvalds    2005-04-16  @77  extern void *vmalloc(unsigned long size);
e1ca7788 Dave Young        2010-10-26   78  extern void *vzalloc(unsigned long size);
83342314 Nick Piggin       2006-06-23   79  extern void *vmalloc_user(unsigned long size);
930fc45a Christoph Lameter 2005-10-29   80  extern void *vmalloc_node(unsigned long size, int node);
e1ca7788 Dave Young        2010-10-26   81  extern void *vzalloc_node(unsigned long size, int node);
^1da177e Linus Torvalds    2005-04-16   82  extern void *vmalloc_exec(unsigned long size);
^1da177e Linus Torvalds    2005-04-16   83  extern void *vmalloc_32(unsigned long size);
83342314 Nick Piggin       2006-06-23   84  extern void *vmalloc_32_user(unsigned long size);
dd0fc66f Al Viro           2005-10-07   85  extern void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot);
d0a21265 David Rientjes    2011-01-13   86  extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
d0a21265 David Rientjes    2011-01-13   87  			unsigned long start, unsigned long end, gfp_t gfp_mask,
cb9e3c29 Andrey Ryabinin   2015-02-13   88  			pgprot_t prot, unsigned long vm_flags, int node,
cb9e3c29 Andrey Ryabinin   2015-02-13   89  			const void *caller);
1f5307b1 Michal Hocko      2017-05-08   90  #ifndef CONFIG_MMU
a7c3e901 Michal Hocko      2017-05-08   91  extern void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags);
8594a21c Michal Hocko      2017-05-12   92  static inline void *__vmalloc_node_flags_caller(unsigned long size, int node,
8594a21c Michal Hocko      2017-05-12   93  						gfp_t flags, void *caller)
1f5307b1 Michal Hocko      2017-05-08   94  {
8594a21c Michal Hocko      2017-05-12   95  	return __vmalloc_node_flags(size, node, flags);
1f5307b1 Michal Hocko      2017-05-08   96  }
8594a21c Michal Hocko      2017-05-12   97  #else
8594a21c Michal Hocko      2017-05-12   98  extern void *__vmalloc_node_flags_caller(unsigned long size,
8594a21c Michal Hocko      2017-05-12   99  					 int node, gfp_t flags, void *caller);
1f5307b1 Michal Hocko      2017-05-08  100  #endif
cb9e3c29 Andrey Ryabinin   2015-02-13  101  
b3bdda02 Christoph Lameter 2008-02-04 @102  extern void vfree(const void *addr);
bf22e37a Andrey Ryabinin   2016-12-12  103  extern void vfree_atomic(const void *addr);
^1da177e Linus Torvalds    2005-04-16  104  

:::::: The code at line 77 was first introduced by commit
:::::: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2

:::::: TO: Linus Torvalds <torvalds@ppc970.osdl.org>
:::::: CC: Linus Torvalds <torvalds@ppc970.osdl.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 49863 bytes --]

^ permalink raw reply

* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: Rob Landley @ 2019-05-18  2:16 UTC (permalink / raw)
  To: H. Peter Anvin, Roberto Sassu, viro
  Cc: linux-security-module, linux-integrity, initramfs, linux-api,
	linux-fsdevel, linux-kernel, zohar, silviu.vlasceanu,
	dmitry.kasatkin, takondra, kamensky, arnd, james.w.mcmechan,
	niveditas98
In-Reply-To: <69ef1f55-9fc1-7ee0-371f-3dbc77551dc0@zytor.com>

On 5/17/19 4:41 PM, H. Peter Anvin wrote:
> On 5/17/19 1:18 PM, hpa@zytor.com wrote:
>>
>> Ok... I just realized this does not work for a modular initramfs, composed at load time from multiple files, which is a very real problem. Should be easy enough to deal with: instead of one large file, use one companion file per source file, perhaps something like filename..xattrs (suggesting double dots to make it less likely to conflict with a "real" file.) No leading dot, as it makes it more likely that archivers will sort them before the file proper.
>>
>> A side benefit is that the format can be simpler as there is no need to encode the filename.
>>
>> A technically cleaner solution still, but which would need archiver modifications, would be to encode the xattrs as an optionally nameless file (just an empty string) with a new file mode value, immediately following the original file. The advantage there is that the archiver itself could support xattrs and other extended metadata (which has been requested elsewhere); the disadvantage obviously is that that it requires new support in the archiver. However, at least it ought to be simpler since it is still a higher protocol level than the cpio archive itself.
>>
>> There's already one special case in cpio, which is the "!!!TRAILER!!!" filename; although I don't think it is part of the formal spec, to the extent there is one, I would expect that in practice it is always encoded with a mode of 0, which incidentally could be used to unbreak the case where such a filename actually exists. So one way to support such extended metadata would be to set mode to 0 and use the filename to encode the type of metadata. I wonder how existing GNU or BSD cpio (the BSD one is better maintained these days) would deal with reading such a file; it would at least not be a regression if it just read it still, possibly with warnings. It could also be possible to use bits 17:16 in the mode, which are traditionally always zero (mode_t being 16 bits), but I believe are pre
 sent in most or all of the cpio formats for historical reasons. It might be accepted better by existing implementations to use one of these high bits combined with S_IFREG, I dont know.
>
> 
> Correction: it's just !!!TRAILER!!!.

We documented it as "TRAILER!!!" without leading !!!, and that its purpose is to
flush hardlinks:

  https://www.kernel.org/doc/Documentation/early-userspace/buffer-format.txt

That's what toybox cpio has been producing. Kernel consumes it just fine. Just
checked busybox cpio and that's what they're producing as well...

Rob

^ permalink raw reply

* [PATCH v2] net: Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.
From: Felipe Gasper @ 2019-05-18  3:48 UTC (permalink / raw)
  To: davem, viro, linux-kernel, netdev, linux-api

Author: Felipe Gasper <felipe@felipegasper.com>
Date:   Fri May 17 16:54:40 2019 -0500

   net: Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.

   This adds the ability for Netlink to report a socket’s UID along with the
   other UNIX diagnostic information that is already available. This will
   allow diagnostic tools greater insight into which users control which socket.

   Signed-off-by: Felipe Gasper <felipe@felipegasper.com>

diff --git a/include/uapi/linux/unix_diag.h b/include/uapi/linux/unix_diag.h
index 5c502fd..a198857 100644
--- a/include/uapi/linux/unix_diag.h
+++ b/include/uapi/linux/unix_diag.h
@@ -20,6 +20,7 @@ struct unix_diag_req {
 #define UDIAG_SHOW_ICONS	0x00000008	/* show pending connections */
 #define UDIAG_SHOW_RQLEN	0x00000010	/* show skb receive queue len */
 #define UDIAG_SHOW_MEMINFO	0x00000020	/* show memory info of a socket */
+#define UDIAG_SHOW_UID		0x00000040	/* show socket's UID */
 
 struct unix_diag_msg {
 	__u8	udiag_family;
@@ -40,6 +41,7 @@ enum {
 	UNIX_DIAG_RQLEN,
 	UNIX_DIAG_MEMINFO,
 	UNIX_DIAG_SHUTDOWN,
+	UNIX_DIAG_UID,
 
 	__UNIX_DIAG_MAX,
 };
diff --git a/net/unix/diag.c b/net/unix/diag.c
index 3183d9b..e40f348 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -4,9 +4,11 @@
 #include <linux/unix_diag.h>
 #include <linux/skbuff.h>
 #include <linux/module.h>
+#include <linux/uidgid.h>
 #include <net/netlink.h>
 #include <net/af_unix.h>
 #include <net/tcp_states.h>
+#include <net/sock.h>
 
 static int sk_diag_dump_name(struct sock *sk, struct sk_buff *nlskb)
 {
@@ -110,6 +112,12 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb)
 	return nla_put(nlskb, UNIX_DIAG_RQLEN, sizeof(rql), &rql);
 }
 
+static int sk_diag_dump_uid(struct sock *sk, struct sk_buff *nlskb)
+{
+	uid_t uid = from_kuid_munged(sk_user_ns(sk), sock_i_uid(sk));
+	return nla_put(nlskb, UNIX_DIAG_UID, sizeof(uid_t), &uid);
+}
+
 static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_req *req,
 		u32 portid, u32 seq, u32 flags, int sk_ino)
 {
@@ -156,6 +164,10 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r
 	if (nla_put_u8(skb, UNIX_DIAG_SHUTDOWN, sk->sk_shutdown))
 		goto out_nlmsg_trim;
 
+	if ((req->udiag_show & UDIAG_SHOW_UID) &&
+	    sk_diag_dump_uid(sk, skb))
+		goto out_nlmsg_trim;
+
 	nlmsg_end(skb, nlh);
 	return 0;
 

^ permalink raw reply related

* Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()
From: kbuild test robot @ 2019-05-18  5:49 UTC (permalink / raw)
  Cc: kbuild-all, viro, linux-security-module, linux-integrity,
	initramfs, linux-api, linux-fsdevel, linux-kernel, zohar,
	silviu.vlasceanu, dmitry.kasatkin, takondra, kamensky, hpa, arnd,
	rob, james.w.mcmechan, niveditas98, Roberto Sassu
In-Reply-To: <20190517165519.11507-3-roberto.sassu@huawei.com>

Hi Roberto,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.1 next-20190517]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Roberto-Sassu/initramfs-set-extended-attributes/20190518-055846
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

   init/initramfs.c:24:45: sparse: sparse: incorrect type in argument 2 (different address spaces) @@    expected char const [noderef] <asn:1> *buf @@    got f] <asn:1> *buf @@
   init/initramfs.c:24:45: sparse:    expected char const [noderef] <asn:1> *buf
   init/initramfs.c:24:45: sparse:    got char const *p
   init/initramfs.c:115:36: sparse: sparse: incorrect type in argument 2 (different address spaces) @@    expected char const [noderef] <asn:1> *filename @@    got n:1> *filename @@
   init/initramfs.c:115:36: sparse:    expected char const [noderef] <asn:1> *filename
   init/initramfs.c:115:36: sparse:    got char *filename
   init/initramfs.c:303:24: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected char const [noderef] <asn:1> *name @@    got n:1> *name @@
   init/initramfs.c:303:24: sparse:    expected char const [noderef] <asn:1> *name
   init/initramfs.c:303:24: sparse:    got char *path
   init/initramfs.c:305:36: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected char const [noderef] <asn:1> *pathname @@    got n:1> *pathname @@
   init/initramfs.c:305:36: sparse:    expected char const [noderef] <asn:1> *pathname
   init/initramfs.c:305:36: sparse:    got char *path
   init/initramfs.c:307:37: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected char const [noderef] <asn:1> *pathname @@    got n:1> *pathname @@
   init/initramfs.c:307:37: sparse:    expected char const [noderef] <asn:1> *pathname
   init/initramfs.c:307:37: sparse:    got char *path
   init/initramfs.c:317:43: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected char const [noderef] <asn:1> *oldname @@    got n:1> *oldname @@
   init/initramfs.c:317:43: sparse:    expected char const [noderef] <asn:1> *oldname
   init/initramfs.c:317:43: sparse:    got char *old
   init/initramfs.c:317:48: sparse: sparse: incorrect type in argument 2 (different address spaces) @@    expected char const [noderef] <asn:1> *newname @@    got char char const [noderef] <asn:1> *newname @@
   init/initramfs.c:317:48: sparse:    expected char const [noderef] <asn:1> *newname
   init/initramfs.c:317:48: sparse:    got char *static [toplevel] [assigned] collected
   init/initramfs.c:404:25: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected char const [noderef] <asn:1> *name @@    got n:1> *name @@
   init/initramfs.c:404:25: sparse:    expected char const [noderef] <asn:1> *name
   init/initramfs.c:404:25: sparse:    got char *
>> init/initramfs.c:490:32: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected char const [noderef] <asn:1> *name @@    got char char const [noderef] <asn:1> *name @@
   init/initramfs.c:490:32: sparse:    expected char const [noderef] <asn:1> *name
   init/initramfs.c:490:32: sparse:    got char *static [toplevel] [assigned] collected
   init/initramfs.c:500:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected char const [noderef] <asn:1> *filename @@    got char char const [noderef] <asn:1> *filename @@
   init/initramfs.c:500:41: sparse:    expected char const [noderef] <asn:1> *filename
   init/initramfs.c:500:41: sparse:    got char *static [toplevel] [assigned] collected
   init/initramfs.c:512:28: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected char const [noderef] <asn:1> *pathname @@    got char char const [noderef] <asn:1> *pathname @@
   init/initramfs.c:512:28: sparse:    expected char const [noderef] <asn:1> *pathname
   init/initramfs.c:512:28: sparse:    got char *static [toplevel] [assigned] collected
   init/initramfs.c:513:28: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected char const [noderef] <asn:1> *filename @@    got char char const [noderef] <asn:1> *filename @@
   init/initramfs.c:513:28: sparse:    expected char const [noderef] <asn:1> *filename
   init/initramfs.c:513:28: sparse:    got char *static [toplevel] [assigned] collected
   init/initramfs.c:514:28: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected char const [noderef] <asn:1> *filename @@    got char char const [noderef] <asn:1> *filename @@
   init/initramfs.c:514:28: sparse:    expected char const [noderef] <asn:1> *filename
   init/initramfs.c:514:28: sparse:    got char *static [toplevel] [assigned] collected
   init/initramfs.c:519:36: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected char const [noderef] <asn:1> *filename @@    got char char const [noderef] <asn:1> *filename @@
   init/initramfs.c:519:36: sparse:    expected char const [noderef] <asn:1> *filename
   init/initramfs.c:519:36: sparse:    got char *static [toplevel] [assigned] collected
   init/initramfs.c:520:36: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected char const [noderef] <asn:1> *filename @@    got char char const [noderef] <asn:1> *filename @@
   init/initramfs.c:520:36: sparse:    expected char const [noderef] <asn:1> *filename
   init/initramfs.c:520:36: sparse:    got char *static [toplevel] [assigned] collected
   init/initramfs.c:521:36: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected char const [noderef] <asn:1> *filename @@    got char char const [noderef] <asn:1> *filename @@
   init/initramfs.c:521:36: sparse:    expected char const [noderef] <asn:1> *filename
   init/initramfs.c:521:36: sparse:    got char *static [toplevel] [assigned] collected
   init/initramfs.c:552:32: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected char const [noderef] <asn:1> *oldname @@    got n:1> *oldname @@
   init/initramfs.c:552:32: sparse:    expected char const [noderef] <asn:1> *oldname
   init/initramfs.c:552:32: sparse:    got char *
   init/initramfs.c:552:53: sparse: sparse: incorrect type in argument 2 (different address spaces) @@    expected char const [noderef] <asn:1> *newname @@    got char char const [noderef] <asn:1> *newname @@
   init/initramfs.c:552:53: sparse:    expected char const [noderef] <asn:1> *newname
   init/initramfs.c:552:53: sparse:    got char *static [toplevel] [assigned] collected
   init/initramfs.c:553:21: sparse: sparse: incorrect type in argument 1 (different address spaces) @@    expected char const [noderef] <asn:1> *filename @@    got char char const [noderef] <asn:1> *filename @@
   init/initramfs.c:553:21: sparse:    expected char const [noderef] <asn:1> *filename
   init/initramfs.c:553:21: sparse:    got char *static [toplevel] [assigned] collected

vim +490 init/initramfs.c

   310	
   311	static int __init maybe_link(void)
   312	{
   313		if (nlink >= 2) {
   314			char *old = find_link(major, minor, ino, mode, collected);
   315			if (old) {
   316				clean_path(collected, 0);
 > 317				return (ksys_link(old, collected) < 0) ? -1 : 1;
   318			}
   319		}
   320		return 0;
   321	}
   322	
   323	struct xattr_hdr {
   324		char c_size[8]; /* total size including c_size field */
   325		char c_data[];  /* <name>\0<value> */
   326	};
   327	
   328	static int __init __maybe_unused do_setxattrs(char *pathname)
   329	{
   330		char *buf = xattr_buf;
   331		char *bufend = buf + xattr_len;
   332		struct xattr_hdr *hdr;
   333		char str[sizeof(hdr->c_size) + 1];
   334		struct path path;
   335	
   336		if (!xattr_len)
   337			return 0;
   338	
   339		str[sizeof(hdr->c_size)] = 0;
   340	
   341		while (buf < bufend) {
   342			char *xattr_name, *xattr_value;
   343			unsigned long xattr_entry_size;
   344			unsigned long xattr_name_size, xattr_value_size;
   345			int ret;
   346	
   347			if (buf + sizeof(hdr->c_size) > bufend) {
   348				error("malformed xattrs");
   349				break;
   350			}
   351	
   352			hdr = (struct xattr_hdr *)buf;
   353			memcpy(str, hdr->c_size, sizeof(hdr->c_size));
   354			ret = kstrtoul(str, 16, &xattr_entry_size);
   355			buf += xattr_entry_size;
   356			if (ret || buf > bufend || !xattr_entry_size) {
   357				error("malformed xattrs");
   358				break;
   359			}
   360	
   361			xattr_name = hdr->c_data;
   362			xattr_name_size = strnlen(xattr_name,
   363						xattr_entry_size - sizeof(hdr->c_size));
   364			if (xattr_name_size == xattr_entry_size - sizeof(hdr->c_size)) {
   365				error("malformed xattrs");
   366				break;
   367			}
   368	
   369			xattr_value = xattr_name + xattr_name_size + 1;
   370			xattr_value_size = buf - xattr_value;
   371	
   372			ret = kern_path(pathname, 0, &path);
   373			if (!ret) {
   374				ret = vfs_setxattr(path.dentry, xattr_name, xattr_value,
   375						   xattr_value_size, 0);
   376	
   377				path_put(&path);
   378			}
   379	
   380			pr_debug("%s: %s size: %lu val: %s (ret: %d)\n", pathname,
   381				 xattr_name, xattr_value_size, xattr_value, ret);
   382		}
   383	
   384		return 0;
   385	}
   386	
   387	struct path_hdr {
   388		char p_size[10]; /* total size including p_size field */
   389		char p_data[];   /* <path>\0<xattrs> */
   390	};
   391	
   392	static int __init do_readxattrs(void)
   393	{
   394		struct path_hdr hdr;
   395		char *path = NULL;
   396		char str[sizeof(hdr.p_size) + 1];
   397		unsigned long file_entry_size;
   398		size_t size, path_size, total_size;
   399		struct kstat st;
   400		struct file *file;
   401		loff_t pos;
   402		int ret;
   403	
   404		ret = vfs_lstat(XATTR_LIST_FILENAME, &st);
   405		if (ret < 0)
   406			return ret;
   407	
   408		total_size = st.size;
   409	
   410		file = filp_open(XATTR_LIST_FILENAME, O_RDONLY, 0);
   411		if (IS_ERR(file))
   412			return PTR_ERR(file);
   413	
   414		pos = file->f_pos;
   415	
   416		while (total_size) {
   417			size = kernel_read(file, (char *)&hdr, sizeof(hdr), &pos);
   418			if (size != sizeof(hdr)) {
   419				ret = -EIO;
   420				goto out;
   421			}
   422	
   423			total_size -= size;
   424	
   425			str[sizeof(hdr.p_size)] = 0;
   426			memcpy(str, hdr.p_size, sizeof(hdr.p_size));
   427			ret = kstrtoul(str, 16, &file_entry_size);
   428			if (ret < 0)
   429				goto out;
   430	
   431			file_entry_size -= sizeof(sizeof(hdr.p_size));
   432			if (file_entry_size > total_size) {
   433				ret = -EINVAL;
   434				goto out;
   435			}
   436	
   437			path = vmalloc(file_entry_size);
   438			if (!path) {
   439				ret = -ENOMEM;
   440				goto out;
   441			}
   442	
   443			size = kernel_read(file, path, file_entry_size, &pos);
   444			if (size != file_entry_size) {
   445				ret = -EIO;
   446				goto out_free;
   447			}
   448	
   449			total_size -= size;
   450	
   451			path_size = strnlen(path, file_entry_size);
   452			if (path_size == file_entry_size) {
   453				ret = -EINVAL;
   454				goto out_free;
   455			}
   456	
   457			xattr_buf = path + path_size + 1;
   458			xattr_len = file_entry_size - path_size - 1;
   459	
   460			ret = do_setxattrs(path);
   461			vfree(path);
   462			path = NULL;
   463	
   464			if (ret < 0)
   465				break;
   466		}
   467	out_free:
   468		vfree(path);
   469	out:
   470		fput(file);
   471	
   472		if (ret < 0)
   473			error("Unable to parse xattrs");
   474	
   475		return ret;
   476	}
   477	
   478	static __initdata int wfd;
   479	
   480	static int __init do_name(void)
   481	{
   482		state = SkipIt;
   483		next_state = Reset;
   484		if (strcmp(collected, "TRAILER!!!") == 0) {
   485			free_hash();
   486			return 0;
   487		} else if (strcmp(collected, XATTR_LIST_FILENAME) == 0) {
   488			struct kstat st;
   489	
 > 490			if (!vfs_lstat(collected, &st))
   491				do_readxattrs();
   492		}
   493		clean_path(collected, mode);
   494		if (S_ISREG(mode)) {
   495			int ml = maybe_link();
   496			if (ml >= 0) {
   497				int openflags = O_WRONLY|O_CREAT;
   498				if (ml != 1)
   499					openflags |= O_TRUNC;
   500				wfd = ksys_open(collected, openflags, mode);
   501	
   502				if (wfd >= 0) {
   503					ksys_fchown(wfd, uid, gid);
   504					ksys_fchmod(wfd, mode);
   505					if (body_len)
   506						ksys_ftruncate(wfd, body_len);
   507					vcollected = kstrdup(collected, GFP_KERNEL);
   508					state = CopyFile;
   509				}
   510			}
   511		} else if (S_ISDIR(mode)) {
   512			ksys_mkdir(collected, mode);
   513			ksys_chown(collected, uid, gid);
   514			ksys_chmod(collected, mode);
   515			dir_add(collected, mtime);
   516		} else if (S_ISBLK(mode) || S_ISCHR(mode) ||
   517			   S_ISFIFO(mode) || S_ISSOCK(mode)) {
   518			if (maybe_link() == 0) {
   519				ksys_mknod(collected, mode, rdev);
   520				ksys_chown(collected, uid, gid);
   521				ksys_chmod(collected, mode);
   522				do_utime(collected, mtime);
   523			}
   524		}
   525		return 0;
   526	}
   527	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH v1 1/2] pid: add pidfd_open()
From: Joel Fernandes @ 2019-05-18  9:48 UTC (permalink / raw)
  To: Christian Brauner
  Cc: jannh, oleg, viro, torvalds, linux-kernel, arnd, akpm, cyphar,
	dhowells, ebiederm, elena.reshetova, keescook, luto, luto, tglx,
	linux-alpha, linux-arm-kernel, linux-ia64, linux-m68k, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, linux-sh, sparclinux,
	linux-xtensa, linux-api, linux-arch, linux-kselftest, dancol,
	serge, su
In-Reply-To: <20190516135944.7205-1-christian@brauner.io>

Hi Christian,

For next revision, could you also CC surenb@google.com as well? He is also
working on the low memory killer. And also suggest CC to
kernel-team@android.com. And mentioned some comments below, thanks.

On Thu, May 16, 2019 at 03:59:42PM +0200, Christian Brauner wrote:
[snip]  
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 20881598bdfa..4afca3d6dcb8 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -38,6 +38,7 @@
>  #include <linux/syscalls.h>
>  #include <linux/proc_ns.h>
>  #include <linux/proc_fs.h>
> +#include <linux/sched/signal.h>
>  #include <linux/sched/task.h>
>  #include <linux/idr.h>
>  
> @@ -451,6 +452,55 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
>  	return idr_get_next(&ns->idr, &nr);
>  }
>  
> +/**
> + * pidfd_open() - Open new pid file descriptor.
> + *
> + * @pid:   pid for which to retrieve a pidfd
> + * @flags: flags to pass
> + *
> + * This creates a new pid file descriptor with the O_CLOEXEC flag set for
> + * the process identified by @pid. Currently, the process identified by
> + * @pid must be a thread-group leader. This restriction currently exists
> + * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
> + * be used with CLONE_THREAD) and pidfd polling (only supports thread group
> + * leaders).
> + *
> + * Return: On success, a cloexec pidfd is returned.
> + *         On error, a negative errno number will be returned.
> + */
> +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> +{
> +	int fd, ret;
> +	struct pid *p;
> +	struct task_struct *tsk;
> +
> +	if (flags)
> +		return -EINVAL;
> +
> +	if (pid <= 0)
> +		return -EINVAL;
> +
> +	p = find_get_pid(pid);
> +	if (!p)
> +		return -ESRCH;
> +
> +	ret = 0;
> +	rcu_read_lock();
> +	/*
> +	 * If this returns non-NULL the pid was used as a thread-group
> +	 * leader. Note, we race with exec here: If it changes the
> +	 * thread-group leader we might return the old leader.
> +	 */
> +	tsk = pid_task(p, PIDTYPE_TGID);

Just trying to understand the comment here. The issue is that we might either
return the new leader, or the old leader depending on the overlap with
concurrent de_thread right? In either case, we don't care though.

I suggest to remove the "Note..." part of the comment since it doesn't seem the
race is relevant here unless we are doing something else with tsk in the
function, but if you want to keep it that's also fine. Comment text should
probably should be 'return the new leader' though.

> +	if (!tsk)
> +		ret = -ESRCH;

Perhaps -EINVAL?  AFAICS, this can only happen if a CLONE_THREAD pid was
passed as argument to pidfd_open which is invalid. But let me know what you
had in mind..

thanks,

 - Joel

> +	rcu_read_unlock();
> +
> +	fd = ret ?: pidfd_create(p);
> +	put_pid(p);
> +	return fd;
> +}
> +
>  void __init pid_idr_init(void)
>  {
>  	/* Verify no one has done anything silly: */
> -- 
> 2.21.0
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox