Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH -next] iwlwifi: dbg: work around clang bug by marking debug strings static
From: Nick Desaulniers @ 2019-07-16 17:28 UTC (permalink / raw)
  To: Joe Perches, Kalle Valo
  Cc: Arnd Bergmann, Nathan Chancellor, Johannes Berg,
	Emmanuel Grumbach, Luca Coelho, Intel Linux Wireless,
	David S. Miller, Shahar S Matityahu, Sara Sharon, linux-wireless,
	netdev, LKML, clang-built-linux
In-Reply-To: <b219cf41933b2f965572af515cf9d3119293bfba.camel@perches.com>

On Thu, Jul 11, 2019 at 7:15 PM Joe Perches <joe@perches.com> wrote:
>
> On Thu, 2019-07-11 at 17:17 -0700, Nick Desaulniers wrote:
> > Commit r353569 in prerelease Clang-9 is producing a linkage failure:
> >
> > ld: drivers/net/wireless/intel/iwlwifi/fw/dbg.o:
> > in function `_iwl_fw_dbg_apply_point':
> > dbg.c:(.text+0x827a): undefined reference to `__compiletime_assert_2387'
> >
> > when the following configs are enabled:
> > - CONFIG_IWLWIFI
> > - CONFIG_IWLMVM
> > - CONFIG_KASAN
> >
> > Work around the issue for now by marking the debug strings as `static`,
> > which they probably should be any ways.
> >
> > Link: https://bugs.llvm.org/show_bug.cgi?id=42580
> > Link: https://github.com/ClangBuiltLinux/linux/issues/580
> > Reported-by: Arnd Bergmann <arnd@arndb.de>
> > Reported-by: Nathan Chancellor <natechancellor@gmail.com>
> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> > ---
> >  drivers/net/wireless/intel/iwlwifi/fw/dbg.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
> > index e411ac98290d..f8c90ea4e9b4 100644
> > --- a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
> > +++ b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
> > @@ -2438,7 +2438,7 @@ static void iwl_fw_dbg_info_apply(struct iwl_fw_runtime *fwrt,
> >  {
> >       u32 img_name_len = le32_to_cpu(dbg_info->img_name_len);
> >       u32 dbg_cfg_name_len = le32_to_cpu(dbg_info->dbg_cfg_name_len);
> > -     const char err_str[] =
> > +     static const char err_str[] =
> >               "WRT: ext=%d. Invalid %s name length %d, expected %d\n";
>
> Better still would be to use the format string directly
> in both locations instead of trying to deduplicate it
> via storing it into a separate pointer.
>
> Let the compiler/linker consolidate the format.
> It's smaller object code, allows format/argument verification,
> and is simpler for humans to understand.

Whichever Kalle prefers, I just want my CI green again.

>
> ---
> diff --git a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
> index e411ac98290d..25e6712932b8 100644
> --- a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
> +++ b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
> @@ -2438,17 +2438,17 @@ static void iwl_fw_dbg_info_apply(struct iwl_fw_runtime *fwrt,
>  {
>         u32 img_name_len = le32_to_cpu(dbg_info->img_name_len);
>         u32 dbg_cfg_name_len = le32_to_cpu(dbg_info->dbg_cfg_name_len);
> -       const char err_str[] =
> -               "WRT: ext=%d. Invalid %s name length %d, expected %d\n";
>
>         if (img_name_len != IWL_FW_INI_MAX_IMG_NAME_LEN) {
> -               IWL_WARN(fwrt, err_str, ext, "image", img_name_len,
> +               IWL_WARN(fwrt, "WRT: ext=%d. Invalid %s name length %d, expected %d\n",
> +                        ext, "image", img_name_len,
>                          IWL_FW_INI_MAX_IMG_NAME_LEN);
>                 return;
>         }
>
>         if (dbg_cfg_name_len != IWL_FW_INI_MAX_DBG_CFG_NAME_LEN) {
> -               IWL_WARN(fwrt, err_str, ext, "debug cfg", dbg_cfg_name_len,
> +               IWL_WARN(fwrt, "WRT: ext=%d. Invalid %s name length %d, expected %d\n",
> +                        ext, "debug cfg", dbg_cfg_name_len,
>                          IWL_FW_INI_MAX_DBG_CFG_NAME_LEN);
>                 return;
>         }
> @@ -2775,8 +2775,6 @@ static void _iwl_fw_dbg_apply_point(struct iwl_fw_runtime *fwrt,
>                 struct iwl_ucode_tlv *tlv = iter;
>                 void *ini_tlv = (void *)tlv->data;
>                 u32 type = le32_to_cpu(tlv->type);
> -               const char invalid_ap_str[] =
> -                       "WRT: ext=%d. Invalid apply point %d for %s\n";
>
>                 switch (type) {
>                 case IWL_UCODE_TLV_TYPE_DEBUG_INFO:
> @@ -2786,8 +2784,8 @@ static void _iwl_fw_dbg_apply_point(struct iwl_fw_runtime *fwrt,
>                         struct iwl_fw_ini_allocation_data *buf_alloc = ini_tlv;
>
>                         if (pnt != IWL_FW_INI_APPLY_EARLY) {
> -                               IWL_ERR(fwrt, invalid_ap_str, ext, pnt,
> -                                       "buffer allocation");
> +                               IWL_ERR(fwrt, "WRT: ext=%d. Invalid apply point %d for %s\n",
> +                                       ext, pnt, "buffer allocation");
>                                 goto next;
>                         }
>
> @@ -2797,8 +2795,8 @@ static void _iwl_fw_dbg_apply_point(struct iwl_fw_runtime *fwrt,
>                 }
>                 case IWL_UCODE_TLV_TYPE_HCMD:
>                         if (pnt < IWL_FW_INI_APPLY_AFTER_ALIVE) {
> -                               IWL_ERR(fwrt, invalid_ap_str, ext, pnt,
> -                                       "host command");
> +                               IWL_ERR(fwrt, "WRT: ext=%d. Invalid apply point %d for %s\n",
> +                                       ext, pnt, "host command");
>                                 goto next;
>                         }
>                         iwl_fw_dbg_send_hcmd(fwrt, tlv, ext);
>
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply

* Re: [PATCH bpf-next 1/2] bpf, libbpf: add a new API bpf_object__reuse_maps()
From: Anton Protopopov @ 2019-07-16 17:14 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Networking, bpf, open list, Andrii Nakryiko
In-Reply-To: <CAEf4BzYaiH_GhSJJjkGv4dGF7CbBrusTyShPP9DXvXjCLcmK+w@mail.gmail.com>

вт, 9 июл. 2019 г. в 13:40, Andrii Nakryiko <andrii.nakryiko@gmail.com>:
>
> On Mon, Jul 8, 2019 at 1:37 PM Anton Protopopov
> <a.s.protopopov@gmail.com> wrote:
> >
> > пн, 8 июл. 2019 г. в 13:54, Andrii Nakryiko <andrii.nakryiko@gmail.com>:
> > >
> > > On Fri, Jul 5, 2019 at 2:53 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > > >
> > > > On 07/05/2019 10:44 PM, Anton Protopopov wrote:
> > > > > Add a new API bpf_object__reuse_maps() which can be used to replace all maps in
> > > > > an object by maps pinned to a directory provided in the path argument.  Namely,
> > > > > each map M in the object will be replaced by a map pinned to path/M.name.
> > > > >
> > > > > Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
> > > > > ---
> > > > >  tools/lib/bpf/libbpf.c   | 34 ++++++++++++++++++++++++++++++++++
> > > > >  tools/lib/bpf/libbpf.h   |  2 ++
> > > > >  tools/lib/bpf/libbpf.map |  1 +
> > > > >  3 files changed, 37 insertions(+)
> > > > >
> > > > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > > > index 4907997289e9..84c9e8f7bfd3 100644
> > > > > --- a/tools/lib/bpf/libbpf.c
> > > > > +++ b/tools/lib/bpf/libbpf.c
> > > > > @@ -3144,6 +3144,40 @@ int bpf_object__unpin_maps(struct bpf_object *obj, const char *path)
> > > > >       return 0;
> > > > >  }
> > > > >
> > > > > +int bpf_object__reuse_maps(struct bpf_object *obj, const char *path)
> > >
> > > As is, bpf_object__reuse_maps() can be easily implemented by user
> > > applications, as it's only using public libbpf APIs, so I'm not 100%
> > > sure we need to add method like that to libbpf.
> >
> > The bpf_object__reuse_maps() can definitely be implemented by user
> > applications, however, to use it a user also needs to re-implement the
> > bpf_prog_load_xattr funciton, so it seemed to me that adding this
> > functionality to the library is a better way.
>
> I'm still not convinced. Looking at bpf_prog_load_xattr, I think some
> of what it's doing should be part of bpf_object__object_xattr anyway
> (all the expected type setting for programs).
>
> Besides that, there isn't much more than just bpf_object__open and
> bpf_object__load, to be honest. By doing open and load explicitly,
> user gets an opportunity to do whatever adjustment they need: reuse
> maps, adjust map sizes, etc. So I think we should improve
> bpf_object__open to "guess" program attach types and add map
> definition flags to allow reuse declaratively.
>
>
> >
> > >
> > > > > +{
> > > > > +     struct bpf_map *map;
> > > > > +
> > > > > +     if (!obj)
> > > > > +             return -ENOENT;
> > > > > +
> > > > > +     if (!path)
> > > > > +             return -EINVAL;
> > > > > +
> > > > > +     bpf_object__for_each_map(map, obj) {
> > > > > +             int len, err;
> > > > > +             int pinned_map_fd;
> > > > > +             char buf[PATH_MAX];
> > > >
> > > > We'd need to skip the case of bpf_map__is_internal(map) since they are always
> > > > recreated for the given object.
> > > >
> > > > > +             len = snprintf(buf, PATH_MAX, "%s/%s", path, bpf_map__name(map));
> > > > > +             if (len < 0) {
> > > > > +                     return -EINVAL;
> > > > > +             } else if (len >= PATH_MAX) {
> > > > > +                     return -ENAMETOOLONG;
> > > > > +             }
> > > > > +
> > > > > +             pinned_map_fd = bpf_obj_get(buf);
> > > > > +             if (pinned_map_fd < 0)
> > > > > +                     return pinned_map_fd;
> > > >
> > > > Should we rather have a new map definition attribute that tells to reuse
> > > > the map if it's pinned in bpf fs, and if not, we create it and later on
> > > > pin it? This is what iproute2 is doing and which we're making use of heavily.
> > >
> > > I'd like something like that as well. This would play nicely with
> > > recently added BTF-defined maps as well.
> > >
> > > I think it should be not just pin/don't pin flag, but rather pinning
> > > strategy, to accommodate various typical strategies of handling maps
> > > that are already pinned. So something like this:
> > >
> > > 1. BPF_PIN_NOTHING - default, don't pin;
> > > 2. BPF_PIN_EXCLUSIVE - pin, but if map is already pinned - fail;
> > > 3. BPF_PIN_SET - pin; if existing map exists, reset its state to be
> > > exact state of object's map;
> > > 4. BPF_PIN_MERGE - pin, if map exists, fill in NULL entries only (this
> > > is how Cilium is pinning PROG_ARRAY maps, if I understand correctly);
> > > 5. BPF_PIN_MERGE_OVERWRITE - pin, if map exists, overwrite non-NULL values.
> > >
> > > This list is only for illustrative purposes, ideally people that have
> > > a lot of experience using pinning for real-world use cases would chime
> > > in on what strategies are useful and make sense.
> >
> > My case was simply to reuse existing maps when reloading a program.
> > Does it make sense for you to add only the simplest cases of listed above?
>
> Of course, it's enum, so we can start with few clearly useful ones and
> then expand more if we ever have a need. But I think we still need a
> bit wider discussion and let people who use pinning to chime in.
>
> >
> > Also, libbpf doesn't use standard naming conventions for pinning maps.
>
> We talked about this in another thread related to BTF-defined maps. I
> think the way to go with this is to actually define a default pinning
> root path, but allow to override it on bpf_object__open, if user needs
> a different one.
>
> > Does it make sense to provide a list of already open maps to the
> > bpf_prog_load_xattr function as an attribute? In this case a user
> > can execute his own policy on pinning, but still will have an option
> > to reuse, reset, and merge maps.
>
> As explained above, I don't think there isn't much added value in
> bpf_prog_load, so I'd advise to just switch to explicit
> bpf_object__open + bpf_object__load and get maximum control and
> flexibility.

Thanks for your comments. I can see now that using
bpf_object__open/bpf_object__load makes better sense.

>
> >
> > >
> > > > In bpf_object__reuse_maps() bailing out if bpf_obj_get() fails is perhaps
> > > > too limiting for a generic API as new version of an object file may contain
> > > > new maps which are not yet present in bpf fs at that point.
> > > >
> > > > > +             err = bpf_map__reuse_fd(map, pinned_map_fd);
> > > > > +             if (err)
> > > > > +                     return err;
> > > > > +     }
> > > > > +
> > > > > +     return 0;
> > > > > +}
> > > > > +
> > > > >  int bpf_object__pin_programs(struct bpf_object *obj, const char *path)
> > > > >  {
> > > > >       struct bpf_program *prog;
> > > > > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > > > > index d639f47e3110..7fe465a1be76 100644
> > > > > --- a/tools/lib/bpf/libbpf.h
> > > > > +++ b/tools/lib/bpf/libbpf.h
> > > > > @@ -82,6 +82,8 @@ int bpf_object__variable_offset(const struct bpf_object *obj, const char *name,
> > > > >  LIBBPF_API int bpf_object__pin_maps(struct bpf_object *obj, const char *path);
> > > > >  LIBBPF_API int bpf_object__unpin_maps(struct bpf_object *obj,
> > > > >                                     const char *path);
> > > > > +LIBBPF_API int bpf_object__reuse_maps(struct bpf_object *obj,
> > > > > +                                   const char *path);
> > > > >  LIBBPF_API int bpf_object__pin_programs(struct bpf_object *obj,
> > > > >                                       const char *path);
> > > > >  LIBBPF_API int bpf_object__unpin_programs(struct bpf_object *obj,
> > > > > diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> > > > > index 2c6d835620d2..66a30be6696c 100644
> > > > > --- a/tools/lib/bpf/libbpf.map
> > > > > +++ b/tools/lib/bpf/libbpf.map
> > > > > @@ -172,5 +172,6 @@ LIBBPF_0.0.4 {
> > > > >               btf_dump__new;
> > > > >               btf__parse_elf;
> > > > >               bpf_object__load_xattr;
> > > > > +             bpf_object__reuse_maps;
> > > > >               libbpf_num_possible_cpus;
> > > > >  } LIBBPF_0.0.3;
> > > > >
> > > >

^ permalink raw reply

* Re: [PATCH net v2] skbuff: fix compilation warnings in skb_dump()
From: Nathan Chancellor @ 2019-07-16 16:51 UTC (permalink / raw)
  To: Qian Cai; +Cc: davem, willemb, joe, clang-built-linux, netdev, linux-kernel
In-Reply-To: <1563291785-6545-1-git-send-email-cai@lca.pw>

On Tue, Jul 16, 2019 at 11:43:05AM -0400, Qian Cai wrote:
> The commit 6413139dfc64 ("skbuff: increase verbosity when dumping skb
> data") introduced a few compilation warnings.
> 
> net/core/skbuff.c:766:32: warning: format specifies type 'unsigned
> short' but the argument has type 'unsigned int' [-Wformat]
>                        level, sk->sk_family, sk->sk_type,
> sk->sk_protocol);
>                                              ^~~~~~~~~~~
> net/core/skbuff.c:766:45: warning: format specifies type 'unsigned
> short' but the argument has type 'unsigned int' [-Wformat]
>                        level, sk->sk_family, sk->sk_type,
> sk->sk_protocol);
> ^~~~~~~~~~~~~~~
> 
> Fix them by using the proper types.
> 
> Fixes: 6413139dfc64 ("skbuff: increase verbosity when dumping skb data")
> Signed-off-by: Qian Cai <cai@lca.pw>

Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>

^ permalink raw reply

* Re: IPv6 L2TP issues related to 93531c67
From: David Ahern @ 2019-07-16 16:46 UTC (permalink / raw)
  To: Paul Donohue; +Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev
In-Reply-To: <20190716135646.GE2622@TopQuark.net>

On 7/16/19 7:56 AM, Paul Donohue wrote:
> After establishing an IPsec tunnel to carry the L2TP traffic, the first L2TP packet through the IPsec tunnel permanently breaks the associated L2TP tunnel.  Tearing down the IPsec tunnel does not restore functionality of the L2TP tunnel - I have to tear down and re-create the L2TP tunnel before it will work again.  In my real-world use case, I have two L2TP tunnels running over the same IPsec tunnel, and the first L2TP tunnel to send a packet after IPsec is established gets permanently broken, while the other L2TP tunnel works fine.
> 
> I've attached a modified version of the script which demonstrates this issue.

Thanks. I will take a look at get back to you.

^ permalink raw reply

* [PATCH 0/2] mmc: core: Fix Marvell WiFi reset by adding SDIO API to replug card
From: Douglas Anderson @ 2019-07-16 16:42 UTC (permalink / raw)
  To: Ulf Hansson, Kalle Valo, Adrian Hunter
  Cc: Ganapathi Bhat, linux-wireless, Brian Norris, Amitkumar Karwar,
	linux-rockchip, Wolfram Sang, Nishant Sarmukadam, netdev,
	Avri Altman, linux-mmc, davem, Xinming Hu, Douglas Anderson,
	Jiong Wu, Ritesh Harjani, linux-kernel, Thomas Gleixner,
	Greg Kroah-Hartman, Niklas Söderlund

As talked about in the thread at:

http://lkml.kernel.org/r/CAD=FV=X7P2F1k_zwHc0mbtfk55-rucTz_GoDH=PL6zWqKYcpuw@mail.gmail.com

...when the Marvell WiFi card tries to reset itself it kills
Bluetooth.  It was observed that we could re-init the card properly by
unbinding / rebinding the host controller.  It was also observed that
in the downstream Chrome OS codebase the solution used was
mmc_remove_host() / mmc_add_host(), which is similar to the solution
in this series.

So far I've only done testing of this series using the reset test
source that can be simulated via sysfs.  Specifically I ran this test:

for i in $(seq 1000); do
  echo "LOOP $i --------"
  echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset

  while true; do
    if ! ping -w15 -c1 "${GW}" >/dev/null 2>&1; then
      fail=$(( fail + 1 ))
      echo "Fail WiFi ${fail}"
      if [[ ${fail} == 3 ]]; then
        exit 1
      fi
    else
      fail=0
      break
    fi
  done

  hciconfig hci0 down
  sleep 1
  if ! hciconfig hci0 up; then
    echo "Fail BT"
    exit 1
  fi
done

I ran this several times and got several hundred iterations each
before a failure.  When I saw failures:

* Once I saw a "Fail BT"; manually resetting the card again fixed it.
  I didn't give it time to see if it would have detected this
  automatically.
* Once I saw the ping fail because (for some reason) my device only
  got an IPv6 address from my router and the IPv4 ping failed.  I
  changed my script to use 'ping6' to see if that would help.
* Once I saw the ping fail because the higher level network stack
  ("shill" in my case) seemed to crash.  A few minutes later the
  system recovered itself automatically.  https://crbug.com/984593 if
  you want more details.
* Sometimes while I was testing I saw "Fail WiFi 1" indicating a
  transitory failure.  Usually this was an association failure, but in
  one case I saw the device do "Firmware wakeup failed" after I
  triggered the reset.  This caused the driver to trigger a re-reset
  of itself which eventually recovered things.  This was good because
  it was an actual test of the normal reset flow (not the one
  triggered via sysfs).

Douglas Anderson (2):
  mmc: core: Add sdio_trigger_replug() API
  mwifiex: Make use of the new sdio_trigger_replug() API to reset

 drivers/mmc/core/core.c                     | 28 +++++++++++++++++++--
 drivers/mmc/core/sdio_io.c                  | 20 +++++++++++++++
 drivers/net/wireless/marvell/mwifiex/sdio.c | 14 +++--------
 include/linux/mmc/host.h                    | 15 ++++++++++-
 include/linux/mmc/sdio_func.h               |  2 ++
 5 files changed, 65 insertions(+), 14 deletions(-)

-- 
2.22.0.510.g264f2c817a-goog

^ permalink raw reply

* [PATCH 1/2] mmc: core: Add sdio_trigger_replug() API
From: Douglas Anderson @ 2019-07-16 16:42 UTC (permalink / raw)
  To: Ulf Hansson, Kalle Valo, Adrian Hunter
  Cc: Ganapathi Bhat, linux-wireless, Brian Norris, Amitkumar Karwar,
	linux-rockchip, Wolfram Sang, Nishant Sarmukadam, netdev,
	Avri Altman, linux-mmc, davem, Xinming Hu, Douglas Anderson,
	Jiong Wu, Ritesh Harjani, linux-kernel, Thomas Gleixner,
	Greg Kroah-Hartman, Niklas Söderlund
In-Reply-To: <20190716164209.62320-1-dianders@chromium.org>

When using Marvell WiFi SDIO cards, it is not uncommon for Linux WiFi
driver to fully lose the communication channel to the firmware running
on the card.  Presumably the firmware on the card has a bug or two in
it and occasionally crashes.

The Marvell WiFi driver attempts to recover from this problem.
Specifically the driver has the function mwifiex_sdio_card_reset()
which is called when communcation problems are found.  That function
attempts to reset the state of things by utilizing the mmc_hw_reset()
function.

The current solution is a bit complex because the Marvell WiFi driver
needs to manually deinit and reinit the WiFi driver around the reset
call.  This means it's going through a bunch of code paths that aren't
normally tested.  However, complexity isn't our only problem.  The
other (bigger) problem is that Marvell WiFi cards are often combo
WiFi/Bluetooth cards and Bluetooth runs on a second SDIO func.  While
the WiFi driver knows that it should re-init its own state around the
mmc_hw_reset() call there is no good way to inform the Bluetooth
driver.  That means that in Linux today when you reset the Marvell
WiFi driver you lose all Bluetooth communication.  Doh!

One way to fix the above problems is to leverage a more standard way
to reset the Marvell WiFi card where we go through the same code paths
as card unplug and the card plug.  In this patch we introduce a new
API call for doing just that: sdio_trigger_replug().  This API call
will trigger an unplug of the SDIO card followed by a plug of the
card.  As part of this the card will be nicely reset.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/mmc/core/core.c       | 28 ++++++++++++++++++++++++++--
 drivers/mmc/core/sdio_io.c    | 20 ++++++++++++++++++++
 include/linux/mmc/host.h      | 15 ++++++++++++++-
 include/linux/mmc/sdio_func.h |  2 ++
 4 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 9020cb2490f7..48a7d23aed26 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2164,6 +2164,12 @@ int mmc_sw_reset(struct mmc_host *host)
 }
 EXPORT_SYMBOL(mmc_sw_reset);
 
+void mmc_trigger_replug(struct mmc_host *host)
+{
+	host->trigger_replug_state = MMC_REPLUG_STATE_UNPLUG;
+	_mmc_detect_change(host, 0, false);
+}
+
 static int mmc_rescan_try_freq(struct mmc_host *host, unsigned freq)
 {
 	host->f_init = freq;
@@ -2217,6 +2223,11 @@ int _mmc_detect_card_removed(struct mmc_host *host)
 	if (!host->card || mmc_card_removed(host->card))
 		return 1;
 
+	if (host->trigger_replug_state == MMC_REPLUG_STATE_UNPLUG) {
+		mmc_card_set_removed(host->card);
+		return 1;
+	}
+
 	ret = host->bus_ops->alive(host);
 
 	/*
@@ -2329,8 +2340,21 @@ void mmc_rescan(struct work_struct *work)
 	mmc_bus_put(host);
 
 	mmc_claim_host(host);
-	if (mmc_card_is_removable(host) && host->ops->get_cd &&
-			host->ops->get_cd(host) == 0) {
+
+	/*
+	 * Move through the state machine if we're triggering an unplug
+	 * followed by a re-plug.
+	 */
+	if (host->trigger_replug_state == MMC_REPLUG_STATE_UNPLUG) {
+		host->trigger_replug_state = MMC_REPLUG_STATE_PLUG;
+		_mmc_detect_change(host, 0, false);
+	} else if (host->trigger_replug_state == MMC_REPLUG_STATE_PLUG) {
+		host->trigger_replug_state = MMC_REPLUG_STATE_NONE;
+	}
+
+	if (host->trigger_replug_state == MMC_REPLUG_STATE_PLUG ||
+	    (mmc_card_is_removable(host) && host->ops->get_cd &&
+			host->ops->get_cd(host) == 0)) {
 		mmc_power_off(host);
 		mmc_release_host(host);
 		goto out;
diff --git a/drivers/mmc/core/sdio_io.c b/drivers/mmc/core/sdio_io.c
index 2ba00acf64e6..1c5c2a3ebe5e 100644
--- a/drivers/mmc/core/sdio_io.c
+++ b/drivers/mmc/core/sdio_io.c
@@ -811,3 +811,23 @@ void sdio_retune_release(struct sdio_func *func)
 	mmc_retune_release(func->card->host);
 }
 EXPORT_SYMBOL_GPL(sdio_retune_release);
+
+/**
+ *	sdio_trigger_replug - trigger an "unplug" + "plug" of the card
+ *	@func: SDIO function attached to host
+ *
+ *	When you call this function we will schedule events that will
+ *	make it look like the card contining the given SDIO func was
+ *	unplugged and then re-plugged-in.  This is as close as possible
+ *	to a full reset of the card that can be achieved.
+ *
+ *	NOTE: routnine will temporarily make the card look as if it is
+ *	removable even if it is marked non-removable.
+ *
+ *	This function should be called while the host is claimed.
+ */
+void sdio_trigger_replug(struct sdio_func *func)
+{
+	mmc_trigger_replug(func->card->host);
+}
+EXPORT_SYMBOL(sdio_trigger_replug);
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index a9b12322c775..e0d41a1bcf17 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -410,6 +410,12 @@ struct mmc_host {
 
 	bool			trigger_card_event; /* card_event necessary */
 
+	/* state machine for triggering unplug/replug */
+#define MMC_REPLUG_STATE_NONE	0		/* not doing unplug/replug */
+#define MMC_REPLUG_STATE_UNPLUG	1		/* do unplug next */
+#define MMC_REPLUG_STATE_PLUG	2		/* do plug next */
+	u8			trigger_replug_state;
+
 	struct mmc_card		*card;		/* device attached to this host */
 
 	wait_queue_head_t	wq;
@@ -530,7 +536,12 @@ int mmc_regulator_get_supply(struct mmc_host *mmc);
 
 static inline int mmc_card_is_removable(struct mmc_host *host)
 {
-	return !(host->caps & MMC_CAP_NONREMOVABLE);
+	/*
+	 * A non-removable card briefly looks removable if code has forced
+	 * a re-plug of the card.
+	 */
+	return host->trigger_replug_state != MMC_REPLUG_STATE_NONE ||
+		!(host->caps & MMC_CAP_NONREMOVABLE);
 }
 
 static inline int mmc_card_keep_power(struct mmc_host *host)
@@ -583,4 +594,6 @@ static inline enum dma_data_direction mmc_get_dma_dir(struct mmc_data *data)
 int mmc_send_tuning(struct mmc_host *host, u32 opcode, int *cmd_error);
 int mmc_abort_tuning(struct mmc_host *host, u32 opcode);
 
+void mmc_trigger_replug(struct mmc_host *host);
+
 #endif /* LINUX_MMC_HOST_H */
diff --git a/include/linux/mmc/sdio_func.h b/include/linux/mmc/sdio_func.h
index 5a177f7a83c3..0d6c73768ae3 100644
--- a/include/linux/mmc/sdio_func.h
+++ b/include/linux/mmc/sdio_func.h
@@ -173,4 +173,6 @@ extern void sdio_retune_crc_enable(struct sdio_func *func);
 extern void sdio_retune_hold_now(struct sdio_func *func);
 extern void sdio_retune_release(struct sdio_func *func);
 
+extern void sdio_trigger_replug(struct sdio_func *func);
+
 #endif /* LINUX_MMC_SDIO_FUNC_H */
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH 2/2] mwifiex: Make use of the new sdio_trigger_replug() API to reset
From: Douglas Anderson @ 2019-07-16 16:42 UTC (permalink / raw)
  To: Ulf Hansson, Kalle Valo, Adrian Hunter
  Cc: Ganapathi Bhat, linux-wireless, Brian Norris, Amitkumar Karwar,
	linux-rockchip, Wolfram Sang, Nishant Sarmukadam, netdev,
	Avri Altman, linux-mmc, davem, Xinming Hu, Douglas Anderson,
	linux-kernel
In-Reply-To: <20190716164209.62320-1-dianders@chromium.org>

As described in the patch ("mmc: core: Add sdio_trigger_replug()
API"), the current mwifiex_sdio_card_reset() is broken in the cases
where we're running Bluetooth on a second SDIO func on the same card
as WiFi.  The problem goes away if we just use the
sdio_trigger_replug() API call.

NOTE: Even though with this new solution there is less of a reason to
do our work from a workqueue (the unplug / plug mechanism we're using
is possible for a human to perform at any time so the stack is
supposed to handle it without it needing to be called from a special
context), we still need a workqueue because the Marvell reset function
could called from a context where sleeping is invalid and thus we
can't claim the host.  One example is Marvell's wakeup_timer_fn().

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/net/wireless/marvell/mwifiex/sdio.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c b/drivers/net/wireless/marvell/mwifiex/sdio.c
index 24c041dad9f6..f77ad2615f08 100644
--- a/drivers/net/wireless/marvell/mwifiex/sdio.c
+++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
@@ -2218,14 +2218,6 @@ static void mwifiex_sdio_card_reset_work(struct mwifiex_adapter *adapter)
 {
 	struct sdio_mmc_card *card = adapter->card;
 	struct sdio_func *func = card->func;
-	int ret;
-
-	mwifiex_shutdown_sw(adapter);
-
-	/* power cycle the adapter */
-	sdio_claim_host(func);
-	mmc_hw_reset(func->card->host);
-	sdio_release_host(func);
 
 	/* Previous save_adapter won't be valid after this. We will cancel
 	 * pending work requests.
@@ -2233,9 +2225,9 @@ static void mwifiex_sdio_card_reset_work(struct mwifiex_adapter *adapter)
 	clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, &card->work_flags);
 	clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, &card->work_flags);
 
-	ret = mwifiex_reinit_sw(adapter);
-	if (ret)
-		dev_err(&func->dev, "reinit failed: %d\n", ret);
+	sdio_claim_host(func);
+	sdio_trigger_replug(func);
+	sdio_release_host(func);
 }
 
 /* This function read/write firmware */
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* Re: [RFC PATCH 0/5] PTP: add support for Intel's TGPIO controller
From: Richard Cochran @ 2019-07-16 16:41 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall
In-Reply-To: <20190716072038.8408-1-felipe.balbi@linux.intel.com>

On Tue, Jul 16, 2019 at 10:20:33AM +0300, Felipe Balbi wrote:
> TGPIO is a new IP which allows for time synchronization between systems
> without any other means of synchronization such as PTP or NTP. The
> driver is implemented as part of the PTP framework since its features
> covered most of what this controller can do.

Can you provide some background on this new HW?  Is the interface
copper wires between chips?  Or is it perhaps coax between hosts?

Thanks,
Richard

^ permalink raw reply

* Re: [RFC PATCH 4/5] PTP: Add flag for non-periodic output
From: Richard Cochran @ 2019-07-16 16:39 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall
In-Reply-To: <20190716072038.8408-5-felipe.balbi@linux.intel.com>

On Tue, Jul 16, 2019 at 10:20:37AM +0300, Felipe Balbi wrote:
> When this new flag is set, we can use single-shot output.
> 
> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
> ---
>  include/uapi/linux/ptp_clock.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/ptp_clock.h b/include/uapi/linux/ptp_clock.h
> index 674db7de64f3..439cbdfc3d9b 100644
> --- a/include/uapi/linux/ptp_clock.h
> +++ b/include/uapi/linux/ptp_clock.h
> @@ -67,7 +67,9 @@ struct ptp_perout_request {
>  	struct ptp_clock_time start;  /* Absolute start time. */
>  	struct ptp_clock_time period; /* Desired period, zero means disable. */
>  	unsigned int index;           /* Which channel to configure. */
> -	unsigned int flags;           /* Reserved for future use. */
> +
> +#define PTP_PEROUT_ONE_SHOT BIT(0)
> +	unsigned int flags;           /* Bit 0 -> oneshot output. */
>  	unsigned int rsv[4];          /* Reserved for future use. */

Unfortunately, the code never checked that .flags and .rsv are zero,
and so the de-facto ABI makes extending these fields impossible.  That
was my mistake from the beginning.

In order to actually support extensions, you will first have to
introduce a new ioctl.

Sorry,
Richard

>  };
>  
> -- 
> 2.22.0
> 

^ permalink raw reply

* Re: [PATCH bpf v2] selftests/bpf: skip nmi test when perf hw events are disabled
From: Alexei Starovoitov @ 2019-07-16 16:27 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Ilya Leoshkevich, bpf, Networking, gor, heiko.carstens, Y Song
In-Reply-To: <CAEf4Bzaf2Ys6H4h0rk6z+QhP-anonz=MBej5CaShXKL453MB4A@mail.gmail.com>

On Tue, Jul 16, 2019 at 07:57:04AM -0700, Andrii Nakryiko wrote:
> On Tue, Jul 16, 2019 at 3:56 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >
> > Some setups (e.g. virtual machines) might run with hardware perf events
> > disabled. If this is the case, skip the test_send_signal_nmi test.
> >
> > Add a separate test involving a software perf event. This allows testing
> > the perf event path regardless of hardware perf event support.
> >
> > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> > ---
> 
> LGTM!
> 
> Acked-by: Andrii Nakryiko <andriin@fb.com>

Applied, Thanks


^ permalink raw reply

* Re: [PATCH ghak90 V6 02/10] audit: add container id
From: Richard Guy Briggs @ 2019-07-16 16:26 UTC (permalink / raw)
  To: Paul Moore
  Cc: Tycho Andersen, nhorman, linux-api, containers, LKML, dhowells,
	Linux-Audit Mailing List, netfilter-devel, ebiederm, simo, netdev,
	linux-fsdevel, Eric Paris, Serge E. Hallyn
In-Reply-To: <CAHC9VhTaLqCo8rmAaySJQB+Pf-580=3mvX1rPmtEeb9o5Uy9Qg@mail.gmail.com>

On 2019-07-16 12:08, Paul Moore wrote:
> On Tue, Jul 16, 2019 at 11:37 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> > On 2019-07-15 17:09, Paul Moore wrote:
> > > On Mon, Jul 8, 2019 at 2:12 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > > On 2019-05-30 19:26, Paul Moore wrote:
> > >
> > > ...
> > >
> > > > > I like the creativity, but I worry that at some point these
> > > > > limitations are going to be raised (limits have a funny way of doing
> > > > > that over time) and we will be in trouble.  I say "trouble" because I
> > > > > want to be able to quickly do an audit container ID comparison and
> > > > > we're going to pay a penalty for these larger values (we'll need this
> > > > > when we add multiple auditd support and the requisite record routing).
> > > > >
> > > > > Thinking about this makes me also realize we probably need to think a
> > > > > bit longer about audit container ID conflicts between orchestrators.
> > > > > Right now we just take the value that is given to us by the
> > > > > orchestrator, but if we want to allow multiple container orchestrators
> > > > > to work without some form of cooperation in userspace (I think we have
> > > > > to assume the orchestrators will not talk to each other) we likely
> > > > > need to have some way to block reuse of an audit container ID.  We
> > > > > would either need to prevent the orchestrator from explicitly setting
> > > > > an audit container ID to a currently in use value, or instead generate
> > > > > the audit container ID in the kernel upon an event triggered by the
> > > > > orchestrator (e.g. a write to a /proc file).  I suspect we should
> > > > > start looking at the idr code, I think we will need to make use of it.
> > > >
> > > > To address this, I'd suggest that it is enforced to only allow the
> > > > setting of descendants and to maintain a master list of audit container
> > > > identifiers (with a hash table if necessary later) that includes the
> > > > container owner.
> > >
> > > We're discussing the audit container ID management policy elsewhere in
> > > this thread so I won't comment on that here, but I did want to say
> > > that we will likely need something better than a simple list of audit
> > > container IDs from the start.  It's common for systems to have
> > > thousands of containers now (or multiple thousands), which tells me
> > > that a list is a poor choice.  You mentioned a hash table, so I would
> > > suggest starting with that over the list for the initial patchset.
> >
> > I saw that as an internal incremental improvement that did not affect
> > the API, so I wanted to keep things a bit simpler (as you've requested
> > in the past) to get this going, and add that enhancement later.
> 
> In general a simple approach is a good way to start when the
> problem/use-case is not very well understood; in other words, don't
> spend a lot of time/effort optimizing something you don't yet
> understand.  In this case we know that people want to deploy a *lot*
> of containers on a single system so we should design the data
> structures appropriately.  A list is simply not a good fit here, I
> believe/hope you know that too.

Yes, I knew that, which is why I alluded to a hash table...

> > I'll start working on it now.  The hash table would simply point to
> > lists anyways unless you can recommend a better approach.
> 
> I assume when you say "point to lists" you are talking about using
> lists for the hash buckets?  If so, yes that should be fine at this
> point.  In general if the per-bucket lists become a bottleneck we can
> look at the size of the table (or make it tunable) or even use a
> different approach entirely.  Ultimately the data store is an
> implementation detail private to the audit subsystem in the kernel so
> we should be able to change it as necessary without breaking anything.

Yes, this is what I had in mind.  It would be tunable either by a macro
or a config option, so the exact value isn't a critical implementation
detail that can be easily tuned as we gain experience with it.  And yes,
the intent was that it was a non-user-perceivable implementation choice
other than performace metrics.

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: [PATCH bpf v3] selftests/bpf: fix "alu with different scalars 1" on s390
From: Alexei Starovoitov @ 2019-07-16 16:23 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: bpf, netdev, gor, heiko.carstens, daniel, ys114321
In-Reply-To: <20190716105353.21704-1-iii@linux.ibm.com>

On Tue, Jul 16, 2019 at 12:53:53PM +0200, Ilya Leoshkevich wrote:
> BPF_LDX_MEM is used to load the least significant byte of the retrieved
> test_val.index, however, on big-endian machines it ends up retrieving
> the most significant byte.
> 
> Change the test to load the whole int in order to make it
> endianness-independent.
> 
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>

Applied, Thanks


^ permalink raw reply

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
From: Alexei Starovoitov @ 2019-07-16 16:17 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Andrii Nakryiko, Daniel Borkmann, Edward Cree, Naveen N. Rao,
	Andrii Nakryiko, Jakub Kicinski, bpf, Networking, oss-drivers,
	Yonghong Song
In-Reply-To: <87wogitlbi.fsf@netronome.com>

On Tue, Jul 16, 2019 at 09:50:25AM +0100, Jiong Wang wrote:
> 
> Let me digest a little bit and do some coding, then I will come back. Some
> issues can only shown up during in-depth coding. I kind of feel handling
> aux reference in verifier layer is the part that will still introduce some
> un-clean code.

I'm still internalizing this discussion. Only want to point out
that I think it's better to have simpler algorithm that consumes more
memory and slower than more complex algorithm that is more cpu/memory efficient.
Here we're aiming at 10x improvement anyway, so extra cpu and memory
here and there are good trade-off to make.

> >> If there is no dead insn elimination opt, then we could just adjust
> >> offsets. When there is insn deleting, I feel the logic becomes more
> >> complex. One subprog could be completely deleted or partially deleted, so
> >> I feel just recalculate the whole subprog info as a side-product is
> >> much simpler.
> >
> > What's the situation where entirety of subprog can be deleted?
> 
> Suppose you have conditional jmp_imm, true path calls one subprog, false
> path calls the other. If insn walker later found it is also true, then the
> subprog at false path won't be marked as "seen", so it is entirely deleted.
> 
> I actually thought it is in theory one subprog could be deleted entirely,
> so if we support insn deletion inside verifier, then range info like
> line_info/subprog_info needs to consider one range is deleted.

I don't think dead code elim can remove subprogs.
cfg check rejects code with dead progs.
I don't think we have a test for such 'dead prog only due to verifier walk'
situation. I wonder what happens :)


^ permalink raw reply

* Re: [PATCH ghak90 V6 02/10] audit: add container id
From: Paul Moore @ 2019-07-16 16:08 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Tycho Andersen, Serge E. Hallyn, containers, linux-api,
	Linux-Audit Mailing List, linux-fsdevel, LKML, netdev,
	netfilter-devel, sgrubb, omosnace, dhowells, simo, Eric Paris,
	ebiederm, nhorman
In-Reply-To: <20190716153705.xx7dwrhliny5amut@madcap2.tricolour.ca>

On Tue, Jul 16, 2019 at 11:37 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2019-07-15 17:09, Paul Moore wrote:
> > On Mon, Jul 8, 2019 at 2:12 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > On 2019-05-30 19:26, Paul Moore wrote:
> >
> > ...
> >
> > > > I like the creativity, but I worry that at some point these
> > > > limitations are going to be raised (limits have a funny way of doing
> > > > that over time) and we will be in trouble.  I say "trouble" because I
> > > > want to be able to quickly do an audit container ID comparison and
> > > > we're going to pay a penalty for these larger values (we'll need this
> > > > when we add multiple auditd support and the requisite record routing).
> > > >
> > > > Thinking about this makes me also realize we probably need to think a
> > > > bit longer about audit container ID conflicts between orchestrators.
> > > > Right now we just take the value that is given to us by the
> > > > orchestrator, but if we want to allow multiple container orchestrators
> > > > to work without some form of cooperation in userspace (I think we have
> > > > to assume the orchestrators will not talk to each other) we likely
> > > > need to have some way to block reuse of an audit container ID.  We
> > > > would either need to prevent the orchestrator from explicitly setting
> > > > an audit container ID to a currently in use value, or instead generate
> > > > the audit container ID in the kernel upon an event triggered by the
> > > > orchestrator (e.g. a write to a /proc file).  I suspect we should
> > > > start looking at the idr code, I think we will need to make use of it.
> > >
> > > To address this, I'd suggest that it is enforced to only allow the
> > > setting of descendants and to maintain a master list of audit container
> > > identifiers (with a hash table if necessary later) that includes the
> > > container owner.
> >
> > We're discussing the audit container ID management policy elsewhere in
> > this thread so I won't comment on that here, but I did want to say
> > that we will likely need something better than a simple list of audit
> > container IDs from the start.  It's common for systems to have
> > thousands of containers now (or multiple thousands), which tells me
> > that a list is a poor choice.  You mentioned a hash table, so I would
> > suggest starting with that over the list for the initial patchset.
>
> I saw that as an internal incremental improvement that did not affect
> the API, so I wanted to keep things a bit simpler (as you've requested
> in the past) to get this going, and add that enhancement later.

In general a simple approach is a good way to start when the
problem/use-case is not very well understood; in other words, don't
spend a lot of time/effort optimizing something you don't yet
understand.  In this case we know that people want to deploy a *lot*
of containers on a single system so we should design the data
structures appropriately.  A list is simply not a good fit here, I
believe/hope you know that too.

> I'll start working on it now.  The hash table would simply point to
> lists anyways unless you can recommend a better approach.

I assume when you say "point to lists" you are talking about using
lists for the hash buckets?  If so, yes that should be fine at this
point.  In general if the per-bucket lists become a bottleneck we can
look at the size of the table (or make it tunable) or even use a
different approach entirely.  Ultimately the data store is an
implementation detail private to the audit subsystem in the kernel so
we should be able to change it as necessary without breaking anything.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* [PATCH net v2] skbuff: fix compilation warnings in skb_dump()
From: Qian Cai @ 2019-07-16 15:43 UTC (permalink / raw)
  To: davem; +Cc: willemb, joe, clang-built-linux, netdev, linux-kernel, Qian Cai

The commit 6413139dfc64 ("skbuff: increase verbosity when dumping skb
data") introduced a few compilation warnings.

net/core/skbuff.c:766:32: warning: format specifies type 'unsigned
short' but the argument has type 'unsigned int' [-Wformat]
                       level, sk->sk_family, sk->sk_type,
sk->sk_protocol);
                                             ^~~~~~~~~~~
net/core/skbuff.c:766:45: warning: format specifies type 'unsigned
short' but the argument has type 'unsigned int' [-Wformat]
                       level, sk->sk_family, sk->sk_type,
sk->sk_protocol);
^~~~~~~~~~~~~~~

Fix them by using the proper types.

Fixes: 6413139dfc64 ("skbuff: increase verbosity when dumping skb data")
Signed-off-by: Qian Cai <cai@lca.pw>
---

v2: Drop the checkpatch fix as it seems a bit more involved that it does not
    even like passing a variable to it, i.e., printk(level "msg"). Also,
    print_hex_dump() seems need a fix to be complete which can probably be done
    later.

 net/core/skbuff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6f1e31f674a3..0338820ee0ec 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -762,7 +762,7 @@ void skb_dump(const char *level, const struct sk_buff *skb, bool full_pkt)
 		printk("%sdev name=%s feat=0x%pNF\n",
 		       level, dev->name, &dev->features);
 	if (sk)
-		printk("%ssk family=%hu type=%hu proto=%hu\n",
+		printk("%ssk family=%hu type=%u proto=%u\n",
 		       level, sk->sk_family, sk->sk_type, sk->sk_protocol);
 
 	if (full_pkt && headroom)
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH ghak90 V6 02/10] audit: add container id
From: Richard Guy Briggs @ 2019-07-16 15:37 UTC (permalink / raw)
  To: Paul Moore
  Cc: Tycho Andersen, Serge E. Hallyn, containers, linux-api,
	Linux-Audit Mailing List, linux-fsdevel, LKML, netdev,
	netfilter-devel, sgrubb, omosnace, dhowells, simo, Eric Paris,
	ebiederm, nhorman
In-Reply-To: <CAHC9VhT0V+xi_6nAR5TsM2vs34LbgMeO=-W+MS_kqiXRRzneZQ@mail.gmail.com>

On 2019-07-15 17:09, Paul Moore wrote:
> On Mon, Jul 8, 2019 at 2:12 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > On 2019-05-30 19:26, Paul Moore wrote:
> 
> ...
> 
> > > I like the creativity, but I worry that at some point these
> > > limitations are going to be raised (limits have a funny way of doing
> > > that over time) and we will be in trouble.  I say "trouble" because I
> > > want to be able to quickly do an audit container ID comparison and
> > > we're going to pay a penalty for these larger values (we'll need this
> > > when we add multiple auditd support and the requisite record routing).
> > >
> > > Thinking about this makes me also realize we probably need to think a
> > > bit longer about audit container ID conflicts between orchestrators.
> > > Right now we just take the value that is given to us by the
> > > orchestrator, but if we want to allow multiple container orchestrators
> > > to work without some form of cooperation in userspace (I think we have
> > > to assume the orchestrators will not talk to each other) we likely
> > > need to have some way to block reuse of an audit container ID.  We
> > > would either need to prevent the orchestrator from explicitly setting
> > > an audit container ID to a currently in use value, or instead generate
> > > the audit container ID in the kernel upon an event triggered by the
> > > orchestrator (e.g. a write to a /proc file).  I suspect we should
> > > start looking at the idr code, I think we will need to make use of it.
> >
> > To address this, I'd suggest that it is enforced to only allow the
> > setting of descendants and to maintain a master list of audit container
> > identifiers (with a hash table if necessary later) that includes the
> > container owner.
> 
> We're discussing the audit container ID management policy elsewhere in
> this thread so I won't comment on that here, but I did want to say
> that we will likely need something better than a simple list of audit
> container IDs from the start.  It's common for systems to have
> thousands of containers now (or multiple thousands), which tells me
> that a list is a poor choice.  You mentioned a hash table, so I would
> suggest starting with that over the list for the initial patchset.

I saw that as an internal incremental improvement that did not affect
the API, so I wanted to keep things a bit simpler (as you've requested
in the past) to get this going, and add that enhancement later.

I'll start working on it now.  The hash table would simply point to
lists anyways unless you can recommend a better approach.

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* [PULL] virtio, vhost: fixes, features, performance
From: Michael S. Tsirkin @ 2019-07-16 15:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: kvm, virtualization, netdev, linux-kernel, aarcange,
	bharat.bhushan, bhelgaas, linux-arm-kernel, linux-mm,
	linux-parisc, davem, eric.auger, gustavo, hch, ihor.matushchak,
	James.Bottomley, jasowang, jean-philippe.brucker, jglisse, mst,
	natechancellor

The following changes since commit c1ea02f15ab5efb3e93fc3144d895410bf79fcf2:

  vhost: scsi: add weight support (2019-05-27 11:08:23 -0400)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 5e663f0410fa2f355042209154029842ba1abd43:

  virtio-mmio: add error check for platform_get_irq (2019-07-11 16:22:29 -0400)

----------------------------------------------------------------
virtio, vhost: fixes, features, performance

new iommu device
vhost guest memory access using vmap (just meta-data for now)
minor fixes

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Note: due to code driver changes the driver-core tree, the following
patch is needed when merging tree with commit 92ce7e83b4e5
("driver_find_device: Unify the match function with
class_find_device()") in the driver-core tree:

From: Nathan Chancellor <natechancellor@gmail.com>
Subject: [PATCH] iommu/virtio: Constify data parameter in viommu_match_node

After commit 92ce7e83b4e5 ("driver_find_device: Unify the match
function with class_find_device()") in the driver-core tree.

Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---
 drivers/iommu/virtio-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 4620dd221ffd..433f4d2ee956 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -839,7 +839,7 @@ static void viommu_put_resv_regions(struct device *dev, struct list_head *head)
 static struct iommu_ops viommu_ops;
 static struct virtio_driver virtio_iommu_drv;

-static int viommu_match_node(struct device *dev, void *data)
+static int viommu_match_node(struct device *dev, const void *data)
 {
 	return dev->parent->fwnode == data;
 }

----------------------------------------------------------------
Gustavo A. R. Silva (1):
      scsi: virtio_scsi: Use struct_size() helper

Ihor Matushchak (1):
      virtio-mmio: add error check for platform_get_irq

Jason Wang (6):
      vhost: generalize adding used elem
      vhost: fine grain userspace memory accessors
      vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch()
      vhost: introduce helpers to get the size of metadata area
      vhost: factor out setting vring addr and num
      vhost: access vq metadata through kernel virtual address

Jean-Philippe Brucker (7):
      dt-bindings: virtio-mmio: Add IOMMU description
      dt-bindings: virtio: Add virtio-pci-iommu node
      of: Allow the iommu-map property to omit untranslated devices
      PCI: OF: Initialize dev->fwnode appropriately
      iommu: Add virtio-iommu driver
      iommu/virtio: Add probe request
      iommu/virtio: Add event queue

Michael S. Tsirkin (1):
      vhost: fix clang build warning

 Documentation/devicetree/bindings/virtio/iommu.txt |   66 ++
 Documentation/devicetree/bindings/virtio/mmio.txt  |   30 +
 MAINTAINERS                                        |    7 +
 drivers/iommu/Kconfig                              |   11 +
 drivers/iommu/Makefile                             |    1 +
 drivers/iommu/virtio-iommu.c                       | 1158 ++++++++++++++++++++
 drivers/of/base.c                                  |   10 +-
 drivers/pci/of.c                                   |    8 +
 drivers/scsi/virtio_scsi.c                         |    2 +-
 drivers/vhost/net.c                                |    4 +-
 drivers/vhost/vhost.c                              |  850 +++++++++++---
 drivers/vhost/vhost.h                              |   43 +-
 drivers/virtio/virtio_mmio.c                       |    7 +-
 include/uapi/linux/virtio_ids.h                    |    1 +
 include/uapi/linux/virtio_iommu.h                  |  161 +++
 15 files changed, 2228 insertions(+), 131 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
 create mode 100644 drivers/iommu/virtio-iommu.c
 create mode 100644 include/uapi/linux/virtio_iommu.h

^ permalink raw reply related

* Re: [PATCH net] net: fix use-after-free in __netif_receive_skb_core
From: Sabrina Dubroca @ 2019-07-16 15:26 UTC (permalink / raw)
  To: Edward Cree; +Cc: netdev, Andreas Steinmetz
In-Reply-To: <8166b82f-8430-1441-32e7-540c1829754e@solarflare.com>

2019-07-12, 16:29:48 +0100, Edward Cree wrote:
> On 10/07/2019 23:47, Sabrina Dubroca wrote:
> > 2019-07-10, 16:07:43 +0100, Edward Cree wrote:
> >> On 10/07/2019 14:52, Sabrina Dubroca wrote:
> >>> -static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
> >>> +static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
> >>>  				    struct packet_type **ppt_prev)
> >>>  {
> >>>  	struct packet_type *ptype, *pt_prev;
> >>>  	rx_handler_func_t *rx_handler;
> >>> +	struct sk_buff *skb = *pskb;
> >> Would it not be simpler just to change all users of skb to *pskb?
> >> Then you avoid having to keep doing "*pskb = skb;" whenever skb changes
> >>  (with concomitant risk of bugs if one gets missed).
> > Yes, that would be less risky. I wrote a version of the patch that did
> > exactly that, but found it really too ugly (both the patch and the
> > resulting code).
> If you've still got that version (or can dig it out of your reflog), can
>  you post it so we can see just how ugly it turns out?

No, I didn't even commit it. Rewrote it now, it's rewriting over 1/4
of the lines. Considering that the patch would need to go in stable, I
don't think that's appropriate.

(This has been only compiled, not actually tested)

diff --git a/net/core/dev.c b/net/core/dev.c
index fc676b2610e3..5fb2a15d42e1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4799,7 +4799,7 @@ static inline int nf_ingress(struct sk_buff *skb, struct packet_type **pt_prev,
 	return 0;
 }
 
-static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
+static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
 				    struct packet_type **ppt_prev)
 {
 	struct packet_type *ptype, *pt_prev;
@@ -4809,21 +4809,21 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
 	int ret = NET_RX_DROP;
 	__be16 type;
 
-	net_timestamp_check(!netdev_tstamp_prequeue, skb);
+	net_timestamp_check(!netdev_tstamp_prequeue, *pskb);
 
-	trace_netif_receive_skb(skb);
+	trace_netif_receive_skb(*pskb);
 
-	orig_dev = skb->dev;
+	orig_dev = (*pskb)->dev;
 
-	skb_reset_network_header(skb);
-	if (!skb_transport_header_was_set(skb))
-		skb_reset_transport_header(skb);
-	skb_reset_mac_len(skb);
+	skb_reset_network_header(*pskb);
+	if (!skb_transport_header_was_set(*pskb))
+		skb_reset_transport_header(*pskb);
+	skb_reset_mac_len(*pskb);
 
 	pt_prev = NULL;
 
 another_round:
-	skb->skb_iif = skb->dev->ifindex;
+	(*pskb)->skb_iif = (*pskb)->dev->ifindex;
 
 	__this_cpu_inc(softnet_data.processed);
 
@@ -4831,22 +4831,22 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
 		int ret2;
 
 		preempt_disable();
-		ret2 = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), skb);
+		ret2 = do_xdp_generic(rcu_dereference((*pskb)->dev->xdp_prog), *pskb);
 		preempt_enable();
 
 		if (ret2 != XDP_PASS)
 			return NET_RX_DROP;
-		skb_reset_mac_len(skb);
+		skb_reset_mac_len(*pskb);
 	}
 
-	if (skb->protocol == cpu_to_be16(ETH_P_8021Q) ||
-	    skb->protocol == cpu_to_be16(ETH_P_8021AD)) {
-		skb = skb_vlan_untag(skb);
-		if (unlikely(!skb))
+	if ((*pskb)->protocol == cpu_to_be16(ETH_P_8021Q) ||
+	    (*pskb)->protocol == cpu_to_be16(ETH_P_8021AD)) {
+		*pskb = skb_vlan_untag(*pskb);
+		if (unlikely(!*pskb))
 			goto out;
 	}
 
-	if (skb_skip_tc_classify(skb))
+	if (skb_skip_tc_classify(*pskb))
 		goto skip_classify;
 
 	if (pfmemalloc)
@@ -4854,50 +4854,50 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
 
 	list_for_each_entry_rcu(ptype, &ptype_all, list) {
 		if (pt_prev)
-			ret = deliver_skb(skb, pt_prev, orig_dev);
+			ret = deliver_skb(*pskb, pt_prev, orig_dev);
 		pt_prev = ptype;
 	}
 
-	list_for_each_entry_rcu(ptype, &skb->dev->ptype_all, list) {
+	list_for_each_entry_rcu(ptype, &(*pskb)->dev->ptype_all, list) {
 		if (pt_prev)
-			ret = deliver_skb(skb, pt_prev, orig_dev);
+			ret = deliver_skb(*pskb, pt_prev, orig_dev);
 		pt_prev = ptype;
 	}
 
 skip_taps:
 #ifdef CONFIG_NET_INGRESS
 	if (static_branch_unlikely(&ingress_needed_key)) {
-		skb = sch_handle_ingress(skb, &pt_prev, &ret, orig_dev);
-		if (!skb)
+		*pskb = sch_handle_ingress(*pskb, &pt_prev, &ret, orig_dev);
+		if (!*pskb)
 			goto out;
 
-		if (nf_ingress(skb, &pt_prev, &ret, orig_dev) < 0)
+		if (nf_ingress(*pskb, &pt_prev, &ret, orig_dev) < 0)
 			goto out;
 	}
 #endif
-	skb_reset_tc(skb);
+	skb_reset_tc(*pskb);
 skip_classify:
-	if (pfmemalloc && !skb_pfmemalloc_protocol(skb))
+	if (pfmemalloc && !skb_pfmemalloc_protocol(*pskb))
 		goto drop;
 
-	if (skb_vlan_tag_present(skb)) {
+	if (skb_vlan_tag_present(*pskb)) {
 		if (pt_prev) {
-			ret = deliver_skb(skb, pt_prev, orig_dev);
+			ret = deliver_skb(*pskb, pt_prev, orig_dev);
 			pt_prev = NULL;
 		}
-		if (vlan_do_receive(&skb))
+		if (vlan_do_receive(pskb))
 			goto another_round;
-		else if (unlikely(!skb))
+		else if (unlikely(!*pskb))
 			goto out;
 	}
 
-	rx_handler = rcu_dereference(skb->dev->rx_handler);
+	rx_handler = rcu_dereference((*pskb)->dev->rx_handler);
 	if (rx_handler) {
 		if (pt_prev) {
-			ret = deliver_skb(skb, pt_prev, orig_dev);
+			ret = deliver_skb(*pskb, pt_prev, orig_dev);
 			pt_prev = NULL;
 		}
-		switch (rx_handler(&skb)) {
+		switch (rx_handler(pskb)) {
 		case RX_HANDLER_CONSUMED:
 			ret = NET_RX_SUCCESS;
 			goto out;
@@ -4912,29 +4912,29 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
 		}
 	}
 
-	if (unlikely(skb_vlan_tag_present(skb))) {
+	if (unlikely(skb_vlan_tag_present(*pskb))) {
 check_vlan_id:
-		if (skb_vlan_tag_get_id(skb)) {
+		if (skb_vlan_tag_get_id(*pskb)) {
 			/* Vlan id is non 0 and vlan_do_receive() above couldn't
 			 * find vlan device.
 			 */
-			skb->pkt_type = PACKET_OTHERHOST;
-		} else if (skb->protocol == cpu_to_be16(ETH_P_8021Q) ||
-			   skb->protocol == cpu_to_be16(ETH_P_8021AD)) {
+			(*pskb)->pkt_type = PACKET_OTHERHOST;
+		} else if ((*pskb)->protocol == cpu_to_be16(ETH_P_8021Q) ||
+			   (*pskb)->protocol == cpu_to_be16(ETH_P_8021AD)) {
 			/* Outer header is 802.1P with vlan 0, inner header is
 			 * 802.1Q or 802.1AD and vlan_do_receive() above could
 			 * not find vlan dev for vlan id 0.
 			 */
-			__vlan_hwaccel_clear_tag(skb);
-			skb = skb_vlan_untag(skb);
-			if (unlikely(!skb))
+			__vlan_hwaccel_clear_tag(*pskb);
+			*pskb = skb_vlan_untag(*pskb);
+			if (unlikely(!*pskb))
 				goto out;
-			if (vlan_do_receive(&skb))
+			if (vlan_do_receive(pskb))
 				/* After stripping off 802.1P header with vlan 0
 				 * vlan dev is found for inner header.
 				 */
 				goto another_round;
-			else if (unlikely(!skb))
+			else if (unlikely(!*pskb))
 				goto out;
 			else
 				/* We have stripped outer 802.1P vlan 0 header.
@@ -4944,40 +4944,40 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
 				goto check_vlan_id;
 		}
 		/* Note: we might in the future use prio bits
-		 * and set skb->priority like in vlan_do_receive()
+		 * and set (*pskb)->priority like in vlan_do_receive()
 		 * For the time being, just ignore Priority Code Point
 		 */
-		__vlan_hwaccel_clear_tag(skb);
+		__vlan_hwaccel_clear_tag(*pskb);
 	}
 
-	type = skb->protocol;
+	type = (*pskb)->protocol;
 
 	/* deliver only exact match when indicated */
 	if (likely(!deliver_exact)) {
-		deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
+		deliver_ptype_list_skb(*pskb, &pt_prev, orig_dev, type,
 				       &ptype_base[ntohs(type) &
 						   PTYPE_HASH_MASK]);
 	}
 
-	deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
+	deliver_ptype_list_skb(*pskb, &pt_prev, orig_dev, type,
 			       &orig_dev->ptype_specific);
 
-	if (unlikely(skb->dev != orig_dev)) {
-		deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
-				       &skb->dev->ptype_specific);
+	if (unlikely((*pskb)->dev != orig_dev)) {
+		deliver_ptype_list_skb(*pskb, &pt_prev, orig_dev, type,
+				       &(*pskb)->dev->ptype_specific);
 	}
 
 	if (pt_prev) {
-		if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC)))
+		if (unlikely(skb_orphan_frags_rx(*pskb, GFP_ATOMIC)))
 			goto drop;
 		*ppt_prev = pt_prev;
 	} else {
 drop:
 		if (!deliver_exact)
-			atomic_long_inc(&skb->dev->rx_dropped);
+			atomic_long_inc(&(*pskb)->dev->rx_dropped);
 		else
-			atomic_long_inc(&skb->dev->rx_nohandler);
-		kfree_skb(skb);
+			atomic_long_inc(&(*pskb)->dev->rx_nohandler);
+		kfree_skb(*pskb);
 		/* Jamal, now you will not able to escape explaining
 		 * me how you were going to use this. :-)
 		 */
@@ -4994,7 +4994,7 @@ static int __netif_receive_skb_one_core(struct sk_buff *skb, bool pfmemalloc)
 	struct packet_type *pt_prev = NULL;
 	int ret;
 
-	ret = __netif_receive_skb_core(skb, pfmemalloc, &pt_prev);
+	ret = __netif_receive_skb_core(&skb, pfmemalloc, &pt_prev);
 	if (pt_prev)
 		ret = INDIRECT_CALL_INET(pt_prev->func, ipv6_rcv, ip_rcv, skb,
 					 skb->dev, pt_prev, orig_dev);
@@ -5072,7 +5072,7 @@ static void __netif_receive_skb_list_core(struct list_head *head, bool pfmemallo
 		struct packet_type *pt_prev = NULL;
 
 		skb_list_del_init(skb);
-		__netif_receive_skb_core(skb, pfmemalloc, &pt_prev);
+		__netif_receive_skb_core(&skb, pfmemalloc, &pt_prev);
 		if (!pt_prev)
 			continue;
 		if (pt_curr != pt_prev || od_curr != orig_dev) {


> Here's a thought, how about switching round the return-vs-pass-by-pointer
>  and writing:
> 
> static struct sk_buff *__netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
>                                                 struct packet_type **ppt_prev, int *ret)
> ?
> (Although, you might want to rename 'ret' in that case.)
> 
> Does that make things any less ugly?

That seems more reasonable (also only compiled so far):

diff --git a/net/core/dev.c b/net/core/dev.c
index fc676b2610e3..e09b6a14851c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4799,8 +4799,8 @@ static inline int nf_ingress(struct sk_buff *skb, struct packet_type **pt_prev,
 	return 0;
 }
 
-static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
-				    struct packet_type **ppt_prev)
+static struct sk_buff *__netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
+						struct packet_type **ppt_prev, int *retp)
 {
 	struct packet_type *ptype, *pt_prev;
 	rx_handler_func_t *rx_handler;
@@ -4834,8 +4834,10 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
 		ret2 = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), skb);
 		preempt_enable();
 
-		if (ret2 != XDP_PASS)
-			return NET_RX_DROP;
+		if (ret2 != XDP_PASS) {
+			*retp = NET_RX_DROP;
+			return NULL;
+		}
 		skb_reset_mac_len(skb);
 	}
 
@@ -4985,7 +4987,8 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
 	}
 
 out:
-	return ret;
+	*retp = ret;
+	return skb;
 }
 
 static int __netif_receive_skb_one_core(struct sk_buff *skb, bool pfmemalloc)
@@ -4994,7 +4997,7 @@ static int __netif_receive_skb_one_core(struct sk_buff *skb, bool pfmemalloc)
 	struct packet_type *pt_prev = NULL;
 	int ret;
 
-	ret = __netif_receive_skb_core(skb, pfmemalloc, &pt_prev);
+	skb = __netif_receive_skb_core(skb, pfmemalloc, &pt_prev, &ret);
 	if (pt_prev)
 		ret = INDIRECT_CALL_INET(pt_prev->func, ipv6_rcv, ip_rcv, skb,
 					 skb->dev, pt_prev, orig_dev);
@@ -5070,9 +5073,10 @@ static void __netif_receive_skb_list_core(struct list_head *head, bool pfmemallo
 	list_for_each_entry_safe(skb, next, head, list) {
 		struct net_device *orig_dev = skb->dev;
 		struct packet_type *pt_prev = NULL;
+		int ret;
 
 		skb_list_del_init(skb);
-		__netif_receive_skb_core(skb, pfmemalloc, &pt_prev);
+		skb = __netif_receive_skb_core(skb, pfmemalloc, &pt_prev, &ret);
 		if (!pt_prev)
 			continue;
 		if (pt_curr != pt_prev || od_curr != orig_dev) {


-- 
Sabrina

^ permalink raw reply related

* RE: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits
From: Prout, Andrew - LLSC - MITLL @ 2019-07-16 15:13 UTC (permalink / raw)
  To: Eric Dumazet, Christoph Paasch
  Cc: David S . Miller, netdev, Greg Kroah-Hartman, Jonathan Looney,
	Neal Cardwell, Tyler Hicks, Yuchung Cheng, Bruce Curtis,
	Jonathan Lemon, Dustin Marquess
In-Reply-To: <adec774ed16540c6b627c2f607f3e216@ll.mit.edu>

On 7/11/2019 11:15 PM, Prout, Andrew - LLSC - MITLL wrote: 
> I in no way intended to imply that I had confirmed the small SO_SNDBUF was a prerequisite to our problem. With my synthetic test, I was looking for a concise reproducer and trying things to > stress the system.

I've looked into this some more: I am now able to reproduce it consistently (on 4.14.132) with my synthetic test seemingly regardless of the SO_SNDBUF size (tested 128KiB, 512KiB, 1MiB and 10MiB). The key change to make it reproducible was to use sendfile() instead of send() - this is what samba was doing in our real failure case. Looking at the code, it appears that sendfile() calls into the zerocopy functions which only check SNDBUF for memory they allocate (the skb structure) - they'll happily "overfill" the send buffer with zerocopy'd pages.

In my tests I can see the send buffer used reported by netstat getting to much larger values than the SO_SNDBUF value I set (with a 10MiB SO_SNDBUF, it got up to ~52MiB on a stalled connection). This of course easily causes the CVE-2019-11478 fix to false alarm and the TCP connection to stall.

^ permalink raw reply

* Re: [PATCH] skbuff: fix compilation warnings in skb_dump()
From: Willem de Bruijn @ 2019-07-16 15:04 UTC (permalink / raw)
  To: Qian Cai
  Cc: David Miller, Willem de Bruijn, clang-built-linux,
	Network Development, LKML
In-Reply-To: <1563288840-1913-1-git-send-email-cai@lca.pw>

On Tue, Jul 16, 2019 at 4:56 PM Qian Cai <cai@lca.pw> wrote:
>
> The commit 6413139dfc64 ("skbuff: increase verbosity when dumping skb
> data") introduced a few compilation warnings.
>
> net/core/skbuff.c:766:32: warning: format specifies type 'unsigned
> short' but the argument has type 'unsigned int' [-Wformat]
>                        level, sk->sk_family, sk->sk_type,
> sk->sk_protocol);
>                                              ^~~~~~~~~~~
> net/core/skbuff.c:766:45: warning: format specifies type 'unsigned
> short' but the argument has type 'unsigned int' [-Wformat]
>                        level, sk->sk_family, sk->sk_type,
> sk->sk_protocol);
> ^~~~~~~~~~~~~~~

Ah, I looked at sk_family (skc_family), which is type unsigned short.

But sk_type and sk_protocol are defined as

 unsigned int            sk_padding : 1,
                                sk_kern_sock : 1,
                                sk_no_check_tx : 1,
                                sk_no_check_rx : 1,
                                sk_userlocks : 4,
                                sk_protocol  : 8,
                                sk_type      : 16;

So %u is indeed needed instead of %hu.

> Fix them by using the proper types, and also fix some checkpatch
> warnings by using pr_info().
>
> WARNING: printk() should include KERN_<LEVEL> facility level
> +               printk("%ssk family=%hu type=%u proto=%u\n",

Converting printk to pr_info lowers all levels to KERN_INFO.

skb_dump takes an explicit parameter level to be able to log at
KERN_ERR or KERN_WARNING

I would like to avoid those checkpatch warnings, but this is not the
right approach.

> Fixes: 6413139dfc64 ("skbuff: increase verbosity when dumping skb data")

Thanks. For a v2, please mark the target branch, as [PATCH net v2].

^ permalink raw reply

* Re: [PATCH bpf v2] selftests/bpf: skip nmi test when perf hw events are disabled
From: Andrii Nakryiko @ 2019-07-16 14:57 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: bpf, Networking, gor, heiko.carstens, Y Song
In-Reply-To: <20190716105634.21827-1-iii@linux.ibm.com>

On Tue, Jul 16, 2019 at 3:56 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> Some setups (e.g. virtual machines) might run with hardware perf events
> disabled. If this is the case, skip the test_send_signal_nmi test.
>
> Add a separate test involving a software perf event. This allows testing
> the perf event path regardless of hardware perf event support.
>
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---

LGTM!

Acked-by: Andrii Nakryiko <andriin@fb.com>

>
> v1->v2: Skip the test instead of using a software event.
> Add a separate test with a software event.
>
>  .../selftests/bpf/prog_tests/send_signal.c    | 33 ++++++++++++++++++-
>  1 file changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> index 67cea1686305..54218ee3c004 100644
> --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
> +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> @@ -173,6 +173,18 @@ static int test_send_signal_tracepoint(void)
>         return test_send_signal_common(&attr, BPF_PROG_TYPE_TRACEPOINT, "tracepoint");
>  }
>

[...]

^ permalink raw reply

* [PATCH] skbuff: fix compilation warnings in skb_dump()
From: Qian Cai @ 2019-07-16 14:54 UTC (permalink / raw)
  To: davem; +Cc: willemb, clang-built-linux, netdev, linux-kernel, Qian Cai

The commit 6413139dfc64 ("skbuff: increase verbosity when dumping skb
data") introduced a few compilation warnings.

net/core/skbuff.c:766:32: warning: format specifies type 'unsigned
short' but the argument has type 'unsigned int' [-Wformat]
                       level, sk->sk_family, sk->sk_type,
sk->sk_protocol);
                                             ^~~~~~~~~~~
net/core/skbuff.c:766:45: warning: format specifies type 'unsigned
short' but the argument has type 'unsigned int' [-Wformat]
                       level, sk->sk_family, sk->sk_type,
sk->sk_protocol);
^~~~~~~~~~~~~~~

Fix them by using the proper types, and also fix some checkpatch
warnings by using pr_info().

WARNING: printk() should include KERN_<LEVEL> facility level
+		printk("%ssk family=%hu type=%u proto=%u\n",

Fixes: 6413139dfc64 ("skbuff: increase verbosity when dumping skb data")
Signed-off-by: Qian Cai <cai@lca.pw>
---
 net/core/skbuff.c | 44 ++++++++++++++++++++++----------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6f1e31f674a3..fa1e78f7bb96 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -740,30 +740,30 @@ void skb_dump(const char *level, const struct sk_buff *skb, bool full_pkt)
 	has_mac = skb_mac_header_was_set(skb);
 	has_trans = skb_transport_header_was_set(skb);
 
-	printk("%sskb len=%u headroom=%u headlen=%u tailroom=%u\n"
-	       "mac=(%d,%d) net=(%d,%d) trans=%d\n"
-	       "shinfo(txflags=%u nr_frags=%u gso(size=%hu type=%u segs=%hu))\n"
-	       "csum(0x%x ip_summed=%u complete_sw=%u valid=%u level=%u)\n"
-	       "hash(0x%x sw=%u l4=%u) proto=0x%04x pkttype=%u iif=%d\n",
-	       level, skb->len, headroom, skb_headlen(skb), tailroom,
-	       has_mac ? skb->mac_header : -1,
-	       has_mac ? skb_mac_header_len(skb) : -1,
-	       skb->network_header,
-	       has_trans ? skb_network_header_len(skb) : -1,
-	       has_trans ? skb->transport_header : -1,
-	       sh->tx_flags, sh->nr_frags,
-	       sh->gso_size, sh->gso_type, sh->gso_segs,
-	       skb->csum, skb->ip_summed, skb->csum_complete_sw,
-	       skb->csum_valid, skb->csum_level,
-	       skb->hash, skb->sw_hash, skb->l4_hash,
-	       ntohs(skb->protocol), skb->pkt_type, skb->skb_iif);
+	pr_info("%sskb len=%u headroom=%u headlen=%u tailroom=%u\n"
+		"mac=(%d,%d) net=(%d,%d) trans=%d\n"
+		"shinfo(txflags=%u nr_frags=%u gso(size=%hu type=%u segs=%hu))\n"
+		"csum(0x%x ip_summed=%u complete_sw=%u valid=%u level=%u)\n"
+		"hash(0x%x sw=%u l4=%u) proto=0x%04x pkttype=%u iif=%d\n",
+		level, skb->len, headroom, skb_headlen(skb), tailroom,
+		has_mac ? skb->mac_header : -1,
+		has_mac ? skb_mac_header_len(skb) : -1,
+		skb->network_header,
+		has_trans ? skb_network_header_len(skb) : -1,
+		has_trans ? skb->transport_header : -1,
+		sh->tx_flags, sh->nr_frags,
+		sh->gso_size, sh->gso_type, sh->gso_segs,
+		skb->csum, skb->ip_summed, skb->csum_complete_sw,
+		skb->csum_valid, skb->csum_level,
+		skb->hash, skb->sw_hash, skb->l4_hash,
+		ntohs(skb->protocol), skb->pkt_type, skb->skb_iif);
 
 	if (dev)
-		printk("%sdev name=%s feat=0x%pNF\n",
-		       level, dev->name, &dev->features);
+		pr_info("%sdev name=%s feat=0x%pNF\n",
+			level, dev->name, &dev->features);
 	if (sk)
-		printk("%ssk family=%hu type=%hu proto=%hu\n",
-		       level, sk->sk_family, sk->sk_type, sk->sk_protocol);
+		pr_info("%ssk family=%hu type=%u proto=%u\n",
+			level, sk->sk_family, sk->sk_type, sk->sk_protocol);
 
 	if (full_pkt && headroom)
 		print_hex_dump(level, "skb headroom: ", DUMP_PREFIX_OFFSET,
@@ -801,7 +801,7 @@ void skb_dump(const char *level, const struct sk_buff *skb, bool full_pkt)
 	}
 
 	if (full_pkt && skb_has_frag_list(skb)) {
-		printk("skb fraglist:\n");
+		pr_info("skb fraglist:\n");
 		skb_walk_frags(skb, list_skb)
 			skb_dump(level, list_skb, true);
 	}
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH v2] libertas_tf: Use correct channel range in lbtf_geo_init
From: YueHaibing @ 2019-07-16 14:42 UTC (permalink / raw)
  To: kvalo, davem, lkundrak, derosier
  Cc: linux-kernel, netdev, linux-wireless, YueHaibing

It seems we should use 'range' instead of 'priv->range'
in lbtf_geo_init(), because 'range' is the corret one
related to current regioncode.

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 691cdb49388b ("libertas_tf: command helper functions for libertas_tf")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
v2: fix compile error
---
 drivers/net/wireless/marvell/libertas_tf/cmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/libertas_tf/cmd.c b/drivers/net/wireless/marvell/libertas_tf/cmd.c
index 1eacca0..a333172 100644
--- a/drivers/net/wireless/marvell/libertas_tf/cmd.c
+++ b/drivers/net/wireless/marvell/libertas_tf/cmd.c
@@ -65,7 +65,7 @@ static void lbtf_geo_init(struct lbtf_private *priv)
 			break;
 		}
 
-	for (ch = priv->range.start; ch < priv->range.end; ch++)
+	for (ch = range->start; ch < range->end; ch++)
 		priv->channels[CHAN_TO_IDX(ch)].flags = 0;
 }
 
-- 
2.7.4



^ permalink raw reply related

* Re: [PATCH v3 0/3] kernel/notifier.c: avoid duplicate registration
From: Xiaoming Ni @ 2019-07-16 14:07 UTC (permalink / raw)
  To: Vasily Averin, gregkh@linuxfoundation.org
  Cc: adobriyan@gmail.com, akpm@linux-foundation.org,
	anna.schumaker@netapp.com, arjan@linux.intel.com,
	bfields@fieldses.org, chuck.lever@oracle.com, davem@davemloft.net,
	jlayton@kernel.org, luto@kernel.org, mingo@kernel.org,
	Nadia.Derbey@bull.net, paulmck@linux.vnet.ibm.com,
	semen.protsenko@linaro.org, stable@kernel.org,
	stern@rowland.harvard.edu, tglx@linutronix.de,
	torvalds@linux-foundation.org, trond.myklebust@hammerspace.com,
	viresh.kumar@linaro.org, Huangjianhui (Alex), Dailei,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <e4753c70-de7c-063a-dc49-0edc7520ccd2@virtuozzo.com>



On 2019/7/16 18:20, Vasily Averin wrote:
> On 7/16/19 5:00 AM, Xiaoming Ni wrote:
>> On 2019/7/15 13:38, Vasily Averin wrote:
>>> On 7/14/19 5:45 AM, Xiaoming Ni wrote:
>>>> On 2019/7/12 22:07, gregkh@linuxfoundation.org wrote:
>>>>> On Fri, Jul 12, 2019 at 09:11:57PM +0800, Xiaoming Ni wrote:
>>>>>> On 2019/7/11 21:57, Vasily Averin wrote:
>>>>>>> On 7/11/19 4:55 AM, Nixiaoming wrote:
>>>>>>>> On Wed, July 10, 2019 1:49 PM Vasily Averin wrote:
>>>>>>>>> On 7/10/19 6:09 AM, Xiaoming Ni wrote:
>>>>>>>>>> Registering the same notifier to a hook repeatedly can cause the hook
>>>>>>>>>> list to form a ring or lose other members of the list.
>>>>>>>>>
>>>>>>>>> I think is not enough to _prevent_ 2nd register attempt,
>>>>>>>>> it's enough to detect just attempt and generate warning to mark host in bad state.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Duplicate registration is prevented in my patch, not just "mark host in bad state"
>>>>>>>>
>>>>>>>> Duplicate registration is checked and exited in notifier_chain_cond_register()
>>>>>>>>
>>>>>>>> Duplicate registration was checked in notifier_chain_register() but only 
>>>>>>>> the alarm was triggered without exiting. added by commit 831246570d34692e 
>>>>>>>> ("kernel/notifier.c: double register detection")
>>>>>>>>
>>>>>>>> My patch is like a combination of 831246570d34692e and notifier_chain_cond_register(),
>>>>>>>>  which triggers an alarm and exits when a duplicate registration is detected.
>>>>>>>>
>>>>>>>>> Unexpected 2nd register of the same hook most likely will lead to 2nd unregister,
>>>>>>>>> and it can lead to host crash in any time: 
>>>>>>>>> you can unregister notifier on first attempt it can be too early, it can be still in use.
>>>>>>>>> on the other hand you can never call 2nd unregister at all.
>>>>>>>>
>>>>>>>> Since the member was not added to the linked list at the time of the second registration, 
>>>>>>>> no linked list ring was formed. 
>>>>>>>> The member is released on the first unregistration and -ENOENT on the second unregistration.
>>>>>>>> After patching, the fault has been alleviated
>>>>>>>
>>>>>>> You are wrong here.
>>>>>>> 2nd notifier's registration is a pure bug, this should never happen.
>>>>>>> If you know the way to reproduce this situation -- you need to fix it. 
>>>>>>>
>>>>>>> 2nd registration can happen in 2 cases:
>>>>>>> 1) missed rollback, when someone forget to call unregister after successfull registration, 
>>>>>>> and then tried to call register again. It can lead to crash for example when according module will be unloaded.
>>>>>>> 2) some subsystem is registered twice, for example from  different namespaces.
>>>>>>> in this case unregister called during sybsystem cleanup in first namespace will incorrectly remove notifier used 
>>>>>>> in second namespace, it also can lead to unexpacted behaviour.
>>>>>>>
>>>>>> So in these two cases, is it more reasonable to trigger BUG() directly when checking for duplicate registration ?
>>>>>> But why does current notifier_chain_register() just trigger WARN() without exiting ?
>>>>>> notifier_chain_cond_register() direct exit without triggering WARN() ?
>>>>>
>>>>> It should recover from this, if it can be detected.  The main point is
>>>>> that not all apis have to be this "robust" when used within the kernel
>>>>> as we do allow for the callers to know what they are doing :)
>>>>>
>>>> In the notifier_chain_register(), the condition ( (*nl) == n) is the same registration of the same hook.
>>>>  We can intercept this situation and avoid forming a linked list ring to make the API more rob
>>>
>>> Once again -- yes, you CAN prevent list corruption, but you CANNOT recover the host and return it back to safe state.
>>> If double register event was detected -- it means you have bug in kernel.
>>>
>>> Yes, you can add BUG here and crash the host immediately, but I prefer to use warning in such situations.
>>>
>>>>> If this does not cause any additional problems or slow downs, it's
>>>>> probably fine to add.
>>>>>
>>>> Notifier_chain_register() is not a system hotspot function.
>>>> At the same time, there is already a WARN_ONCE judgment. There is no new judgment in the new patch.
>>>> It only changes the processing under the condition of (*nl) == n, which will not cause performance problems.
>>>> At the same time, avoiding the formation of a link ring can make the system more robust.
>>>
>>> I disagree, 
>>> yes, node will have correct list, but anyway node will work wrong and can crash the host in any time.
>>
>> Sorry, my description is not accurate.
>>
>> My patch feature does not prevent users from repeatedly registering hooks.
>> But avoiding the chain ring caused by the user repeatedly registering the hook
>>
>> There are no modules for duplicate registration hooks in the current system.
>> But considering that not all modules are in the kernel source tree,
>> In order to improve the robustness of the kernel API, we should avoid the linked list ring caused by repeated registration.
>> Or in order to improve the efficiency of problem location, when the duplicate registration is checked, the system crashes directly.
> 
> Detect of duplicate registration means an unrecoverable error,
> from this point of view it makes sense to replace WARN_ONCE by BUG_ON.
>  
>> On the other hand, the difference between notifier_chain_register() and notifier_chain_cond_register() for duplicate registrations is confusing:
>> Blocking the formation of the linked list ring in notifier_chain_cond_register()
>> There is no interception of the linked list ring in notifier_chain_register(), just an alarm.
>> Give me the illusion: Isn't notifier_chain_register() allowed to create a linked list ring?
> 
> I'm not sure that I understood your question correctly but will try to answer.
> As far as I see all callers of notifier_chain_cond_register checks return value, expect possible failure and handle it somehow.
> On the other hand callers of notifier_chain_register() in many cases do not check return value and always expect success.
> The goal of original WARN_ONCE -- to detect possible misuse of notifiers and it seems for me it correctly handles this task.
>
Notifier_chain_cond_register() has only one return value: 0
At the same time, it is only called by blocking_notifier_chain_cond_register().
In the function comment of blocking_notifier_chain_cond_register there is " Currently always returns zero."
Therefore, the user cannot check whether the hook has duplicate registration or other errors by checking the return value.

If the interceptor list ring is added to notifier_chain_register(), notifier_chain_register()
 And notifier_chain_cond_register() becomes redundant code, we can delete one of them

> 
> .
> 


^ permalink raw reply

* Re: IPv6 L2TP issues related to 93531c67
From: Paul Donohue @ 2019-07-16 13:56 UTC (permalink / raw)
  To: David Ahern; +Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev
In-Reply-To: <d6db74f5-5add-7500-1b7a-fa62302a455f@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2193 bytes --]

On Mon, Jul 15, 2019 at 12:55:48PM -0600, David Ahern wrote:
> As an FYI, gmail thinks your emails are spam.
Ugh.  Thanks for letting me know.  I'll look into it.

> On 7/15/19 10:18 AM, Paul Donohue wrote:
> > Reverting commit 93531c6743157d7e8c5792f8ed1a57641149d62c (identified by bisection) fixes this issue.
> That commit can not be reverted. It is a foundational piece for a lot of
> other changes. Did you mean the commit before it works and this commit
> fails?
Sorry, yes, I meant the commit before it works, and this one fails.  I did not try reverting this commit on a more recent kernel.

> > It is not obvious to me how commit 93531c6743157d7e8c5792f8ed1a57641149d62c causes this issue, or how it should be fixed.  Could someone take a look and point me in the right direction for further troubleshooting?
> Let's get a complete example that demonstrates the problem, and I can go
> from there. Can you take the attached script and update it so that it
> reflects the problem you are reporting? That script works on latest
> kernel as well as 4.14.133. It uses network namespaces for 2 hosts with
> a router between them.
> 
> Also, check the return of the fib lookups using:
>     perf record -e fib6:* -a
>     <run test, ctrl-c on the record>
>     perf script
> 
> Checkout the fib lookup parameters and result. Do they look correct to
> you for your setup?

Unfortunately, I have a fairly complicated setup, so it took me a while to figure out which pieces were relevant ... But I think I've finally got it.  The missing piece was IPsec.

After establishing an IPsec tunnel to carry the L2TP traffic, the first L2TP packet through the IPsec tunnel permanently breaks the associated L2TP tunnel.  Tearing down the IPsec tunnel does not restore functionality of the L2TP tunnel - I have to tear down and re-create the L2TP tunnel before it will work again.  In my real-world use case, I have two L2TP tunnels running over the same IPsec tunnel, and the first L2TP tunnel to send a packet after IPsec is established gets permanently broken, while the other L2TP tunnel works fine.

I've attached a modified version of the script which demonstrates this issue.

Thank you!
-Paul

[-- Attachment #2: l2tp.sh --]
[-- Type: application/x-sh, Size: 4977 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox