Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC v2 00/10] HFI Virtual Network Interface Controller (VNIC)
From: Vishwanathapura, Niranjana @ 2016-12-16  2:30 UTC (permalink / raw)
  To: ira.weiny
  Cc: Jason Gunthorpe, Doug Ledford, Leon Romanovsky, Jeff Kirsher,
	David S. Miller, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <20161216012404.GD3785-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>

On Thu, Dec 15, 2016 at 08:24:05PM -0500, ira.weiny wrote:
>On Thu, Dec 15, 2016 at 11:48:37AM -0700, Jason Gunthorpe wrote:
>> On Thu, Dec 15, 2016 at 01:19:18PM -0500, Doug Ledford wrote:
>> > On 12/15/2016 12:07 PM, Jason Gunthorpe wrote:
>> > > On Thu, Dec 15, 2016 at 11:28:06AM -0500, Doug Ledford wrote:
>> > >
>> > >> 1) Since your intent is to make this work with multiple versions of the
>> > >> hfi drivers, I disagree with Jason that just because there is only one
>> > >> driver today that we should keep it simple.  Design it right from the
>> > >> beginning of multi driver is your intent is, IMO, a better way to go.
>> > >> You'll work out the bugs in the initial implementation and when it comes
>> > >> time to add the second driver, things will go much more smoothly.
>> > >
>> > > If that is your position then this should be a straight up IB ULP that
>> > > works with any IB hardware.
>> >
>> > Yes, see my comments in point #3 of my previous email...
>>
>> Well, I'm not opposed to the vnic idea - Mellanox had (has?) a similar
>> IB driver. There are lots of good reasons to strictly maintain the
>> ethernet presentation.
>
>Agreed.  I'm pretty worried about the idea of putting VNIC into IPoIB.  It
>seems like a force fit at best.
>

Just to add what Jason, Ira already mentioned,
1) This isn't much common code between hfi_vnic and ipoib.
Besides we expect both ipoib and hfi_vnic to function parallely.
Registering with the network stack is also different.
hfi_vnic exchanges encapsulation information via IB MAD interface from OPA
EM which is not the case with ipoib.
We needed minimal set of interfaces (defined in include/rdma/opa_hfi.h in this 
path series) that represents HW.

2) The design is very different. There are no path record queries, QPs etc in 
hfi_vnic.

3) hfi_vnic also does the encapsulation with fabric (OPA) header, so bottom 
driver only puts it on the wire.
Whereas in ipoib, bottom ib device driver does the encapsulation for ipoib.

4) hfi_vnic do not need ib work request/completion structures.
hfi_vnic supports multiple TX/RX queues.

>>
>> There is much more going on here than just changing the LLADDR,
>> essentially everything MAD focused is different compared to ipoib, and
>> it looks like the required datastructures are different too. This is
>> more of a map a mac to a OPA_LRH approach with SA mediated discovery,
>> by my eye.
>>
>> The main share is the 'skb send' part, we've talked about hoisting
>> that out of ipoib in the past anyhow. A generic verb along those lines
>> would probably allow the sdma optimization for hfi for both this new
>> ulp and ipoib without creating such an ugly HFI1 specific interface.
>
>I'm not sure what you mean about "skb send" being used by ipoib.  Right now
>IPoIB already supplies a "generic skb send" for _Verbs_ in ipoib_send.
>
>I don't know what other devices would do to implement ipoib_send?  To me, it
>seems like the abstraction for IPoIB is at the proper layer now.
>
>For OPA, the hfi driver supports both IPoIB and VNIC.  So expecting IPoIB and
>VNIC to use a generic "skb send" in ib_device is going to make hfi1 do a lot of
>work to determine which ULP is calling it or make the interface kind of ugly.
>Either way I don't see how this is better than a separate set of functions.
>
>IMO the cleanest way to "clean up the ugly HFI1 interface" is to just  put the
>VNIC operations into ib_device similar to the iWarp specific structure
>"iw_cm_verbs" which is there today.
>
>If a device supports the VNIC operations then it can set the pointer and if not
>it will be NULL.  VNIC will look for that pointer for the support it needs.  If
>in the future other devices need modifications to that interface we can modify
>it then.
>
>Ira

Yes, I agree. The interface defined in include/rdma/opa_hfi.h in this patch 
series is pretty simple and generic interface that represents the HW.
If we include this file and put the hfi_vnic_ctrl_ops directly in ib_device 
structure, then it will simplify lot of stuff. We don't need to abstract
out hfi_ibdev and define any ib device capability flag for VNIC support.

>
>>
>> Jason
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PULL] virtio, vhost: new device, fixes, speedups
From: Linus Torvalds @ 2016-12-16  2:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: arend.vanspriel, KVM list, Network Development,
	Linux Kernel Mailing List, virtualization
In-Reply-To: <20161216010507-mutt-send-email-mst@kernel.org>

On Thu, Dec 15, 2016 at 3:05 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

Pulled, but I wonder...

>  Documentation/translations/zh_CN/sparse.txt        |   7 +-
>  arch/arm/plat-samsung/include/plat/gpio-cfg.h      |   2 +-
>  drivers/crypto/virtio/virtio_crypto_common.h       | 128 +++++
[...]

what are you generating these diffstats with? Because they are pretty bogus..

The end result is correct:

>  86 files changed, 2106 insertions(+), 280 deletions(-)

but the file order in the diffstat is completely random, which makes
it very hard to compare with what I get. It also makes it hard to see
what you changed, because it's not alphabetical like it should be
(strictly speaking the git pathname ordering isnt' really
alphabetical, since the '/' sorts as the NUL character, but close
enough).

I can't see the logic to the re-ordering of the lines, so I'm
intrigued how you even generated it.

            Linus

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: kbuild test robot @ 2016-12-16  2:14 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: kbuild-all, Netdev, kernel-hardening, LKML, linux-crypto,
	David Laight, Ted Tso, Hannes Frederic Sowa, Linus Torvalds,
	Eric Biggers, Tom Herbert, George Spelvin, Vegard Nossum, ak,
	davem, luto, Jason A. Donenfeld, Jean-Philippe Aumasson,
	Daniel J . Bernstein
In-Reply-To: <20161215203003.31989-2-Jason@zx2c4.com>

[-- Attachment #1: Type: text/plain, Size: 1530 bytes --]

Hi Jason,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.9 next-20161215]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jason-A-Donenfeld/siphash-add-cryptographically-secure-PRF/20161216-092837
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   lib/siphash.c: In function 'siphash_unaligned':
>> lib/siphash.c:123:15: error: 'bytes' undeclared (first use in this function)
     case 1: b |= bytes[0];
                  ^~~~~
   lib/siphash.c:123:15: note: each undeclared identifier is reported only once for each function it appears in

vim +/bytes +123 lib/siphash.c

   117		case 7: b |= ((u64)end[6]) << 48;
   118		case 6: b |= ((u64)end[5]) << 40;
   119		case 5: b |= ((u64)end[4]) << 32;
   120		case 4: b |= get_unaligned_le32(end); break;
   121		case 3: b |= ((u64)end[2]) << 16;
   122		case 2: b |= get_unaligned_le16(end); break;
 > 123		case 1: b |= bytes[0];
   124		}
   125	#endif
   126		v3 ^= b;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 45664 bytes --]

^ permalink raw reply

* Re: [PATCH v2 net-next 1/2] phy: add phy fixup unregister functions
From: Dongpo Li @ 2016-12-16  2:05 UTC (permalink / raw)
  To: Woojung.Huh, davem, f.fainelli; +Cc: andrew, netdev, UNGLinuxDriver
In-Reply-To: <9235D6609DB808459E95D78E17F2E43D40981BCB@CHN-SV-EXMX02.mchp-main.com>


On 2016/12/14 23:34, Woojung.Huh@microchip.com wrote:
>> I just want to commit the unregister patch and found this patch. Good job!
>> But I consider this patch may miss something.
>> If one SoC has 2 MAC ports and each port uses the different network driver,
>> the 2 drivers may register fixup for the same PHY chip with different
>> "run" function because the PHY chip works in different mode.
>> In such a case, this patch doesn't consider "run" function and may cause
>> problem.
>> When removing the driver which register fixup at last, it will remove another
>> driver's fixup.
>> Should this condition be considered and fixed?
> Good point.
> Current phy fixup is independent LIST from phydev structure,
> and, fixup runs in two places of phy_device_register() and phy_init_hw().
> It's not clear that it needs two separate fixup, but it may be good idea to
> pass phy fixup when calling phy_attach() or phy_attach_direct() and
> put it under phydev structure.
> So, fixup can be called at phy_init_hw() per phy device and remove
> When phy detached.
> Welcome any comments.

I rethink this problem and find that the "fixup->bus_id" may be a flag to
distinguish different PHY device.
In such condition, the driver should call "phy_register_fixup/phy_unregister_fixup" directly
instead of "*_for_uid" interface.


    Regards,
    Dongpo

.

^ permalink raw reply

* Re: wl1251 & mac address & calibration data
From: Luis R. Rodriguez @ 2016-12-16  2:03 UTC (permalink / raw)
  To: Arend Van Spriel, Tom Gundersen, Daniel Wagner, Johannes Berg,
	Ming Lei, Mimi Zohar, Bjorn Andersson, Rafał Miłecki
  Cc: Pali Rohár, Kalle Valo, Sebastian Reichel, Pavel Machek,
	Michal Kazior, Ivaylo Dimitrov, Aaro Koskinen, Tony Lindgren,
	linux-wireless, Network Development, linux-kernel@vger.kernel.org,
	David Woodhouse, Takashi Iwai, Josh Boyer, Dmitry Torokhov
In-Reply-To: <76365770-f5ba-9565-3fca-710518f64d81@broadcom.com>

On Thu, Dec 15, 2016 at 2:12 PM, Arend Van Spriel
<arend.vanspriel@broadcom.com> wrote:
> On 15-12-2016 16:33, Pali Rohár wrote:
>> On Thu Dec 15 09:18:44 2016 Kalle Valo <kvalo@codeaurora.org> wrote:
>>> (Adding Luis because he has been working on request_firmware() lately)
>>>
>>> Pali Rohár <pali.rohar@gmail.com> writes:
>>>
>>>>>> So no, there is no argument against... request_firmware() in
>>>>>> fallback mode with userspace helper is by design blocking and
>>>>>> waiting for userspace. But waiting for some change in DTS in
>>>>>> kernel is just nonsense.
>>>>>
>>>>> I would just mark the wlan device with status = "disabled" and
>>>>> enable it in the overlay together with adding the NVS & MAC info.
>>>>
>>>> So if you think that this solution make sense, we can wait what net
>>>> wireless maintainers say about it...
>>>>
>>>> For me it looks like that solution can be:
>>>>
>>>> extending request_firmware() to use only userspace helper
>>>
>>> I haven't followed the discussion very closely but this is my preference
>>> what drivers should do:
>>>
>>> 1) First the driver should do try to get the calibration data and mac
>>>        address from the device tree.
>>>
>>
>> Ok, but there is no (dynamic, device specific) data in DTS for N900. So 1) is noop.
>
> Uhm. What do you mean? You can propose a patch to the DT bindings [1] to
> get it in there and create your N900 DTB or am I missing something here.
> Are there hardware restrictions that do not allow you to boot with your
> own DTB.
>
>>> 2) If they are not in DT the driver should retrieve the calibration data
>>>        with request_firmware(). BUT with an option for user space to
>>>        implement that with a helper script so that the data can be created
>>>        dynamically, which I believe openwrt does with ath10k calibration
>>>        data right now.
>>
>> Currently there is flag for request_firmware() that it should fallback to user helper if direct VFS access not find needed firmware.
>>
>> But this flag is not suitable as /lib/firmware already provides default (not device specific) calibration data.
>>
>> So I would suggest to add another flag/function which will primary use user helper.
> I recall Luis saying that user-mode helper (fallback) should be
> discouraged, because there is no assurance that there is a user-mode
> helper so you might just be pissing in the wind.

There's tons of issues with the current status quo of the so called
"firmware usermode helper". To start off with its a complete misnomer,
the kernel's usermode helper infrastructure is implemented in
lib/kmod.c and it provides an API to help call out to userspace some
helper for you. That's the real kernel usermode helper. In so far as
the firmware_class.c driver is concerned -- it only makes use of the
kernel user mode helper as an option if you have
CONFIG_UEVENT_HELPER_PATH set to help address kobject uevents. Most
distributions do not use this, and back in the day systemd udev, and
prior to that hotplug used to process firmware kobject uevents.
systemd udev is long gone. Gone. This kobject uevents really are a
"fallback mechanism" for firmware only -- if cached firmware, built-in
firmware, or direct filesystem lookup fails. These each are their own
beast. Finally kobject uevents are only one of the fallback
mechanisms, "custom fallback mechanisms" are the other option and its
what you and others seem to describe to want for these sorts of custom
things.

There are issues with the existing request_firmware() API in that
everyone and their mother keeps extending it with stupid small API
extensions to do yet-another-tweak, and then having to go and change
tons of drivers. Or a new API call added for just one custom knob.
Naturally this is all plain dumb, so yet-another-API call or new
argument is not going to help us. We don't have "flags" persi in this
API, that was one issue with the design of the API, but there are much
more. The entire change in mechanism of "firmware fallback mechanism"
depending on which kernel config options you have is another. Its a
big web of gum put together. All this needs to end.

I've recently proposed a new patch to first help with understanding
each of the individual items I've listed above with as much new
information as is possible. I myself need to re-read it to just keep
tabs on what the hell is going on at times. After this I have a new
API proposal which I've been working on for a long time now which adds
an extensible API with only 2 calls: sync, async, and lets us modify
attributes through a parameters struct. This is what we should use in
the future for further extensions.

For the new API a solution for "fallback mechanisms" should be clean
though and I am looking to stay as far as possible from the existing
mess. A solution to help both the old API and new API is possible for
the "fallback mechanism" though -- but for that I can only refer you
at this point to some of Daniel Wagner and Tom Gunderson's firmwared
deamon prospect. It should help pave the way for a clean solution and
help address other stupid issues.

I'll note -- the "custom fallback mechanism", part of the old API is
just a terrible idea for many reasons. I've documented issues in my
documentation revamp, I suggest to read that, refer to this thread:

https://lkml.kernel.org/r/20161213030828.17820-4-mcgrof@kernel.org

> The idea was to have a
> dedicated API call that explicitly does the request towards user-space.

In so far as this driver example that was mentioned in this thread my
recommendation is to use the default existing MAC address /
calibration data on /lib/firmware/ and then use another request to
help override for the custom thing. The only issue of course is that
the timeout for the custom thing is 60 seconds, but if your platforms
are controlled -- just reduce this to 1 second or so given that udev
is long gone and it would seem you'd only use a custom fw request to
depend on. You could also wrap this thing into a kconfig option for
now, and have the filename specified, if empty then no custom request
is sent. If set then you have it request it.

Please note the other patches in my series for the custom fallback
though. We have only 2 users upstream -- and from recent discussions
it would seem it might be possible to address what you need with
firmwared in a clean way without this custom old fallback crap thing.
Johannes at least has a similar requirement and is convinced a
"custom" thing can be done without even adding an extra custom kobject
uevent attribute as from what I recall he hinted, drivers have no
business to be involved in this. If you do end up using the custom
fallback mechanism be sure to add super crystal clear documentation
for it (see my other patches for the declarer for this documentation).
Since we have only 2 drivers using this custom thing its hard to get
folks to commit to writing the docs, write it now and be sure you
think of the future as well.

Oh also the "custom firmware fallback" was also broken on recent
kernels for many kernel revisions, it just did not work until recently
a fix which went in, so your users wills need this fix cherry picked.
Hey I tell you, the custom fw thing was a terrible incarnation. Only 2
drivers use it. There are good reasons to not like it (be sure to read
the suspend issue I documented).

> By the way, are we talking here about wl1251 device or driver as you
> also mentioned wl12xx? I did not read the entire thread.

  Luis

^ permalink raw reply

* Re: [PATCH perf/core REBASE 2/5] samples/bpf: Switch over to libbpf
From: Joe Stringer @ 2016-12-16  1:48 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: LKML, netdev, Wang Nan, ast, Daniel Borkmann,
	Arnaldo Carvalho de Melo
In-Reply-To: <CAPWQB7GhHZOf4c5mu=__mNNyFFeq+jOZMawweMBg=sNiPpt-zg@mail.gmail.com>

On 15 December 2016 at 14:00, Joe Stringer <joe@ovn.org> wrote:
> On 15 December 2016 at 10:34, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>> Em Thu, Dec 15, 2016 at 03:29:18PM -0300, Arnaldo Carvalho de Melo escreveu:
>>> Em Thu, Dec 15, 2016 at 12:50:22PM -0300, Arnaldo Carvalho de Melo escreveu:
>>> > Em Wed, Dec 14, 2016 at 02:43:39PM -0800, Joe Stringer escreveu:
>>> > > Now that libbpf under tools/lib/bpf/* is synced with the version from
>>> > > samples/bpf, we can get rid most of the libbpf library here.
>>> > >
>>> > > Signed-off-by: Joe Stringer <joe@ovn.org>
>>> > > Cc: Alexei Starovoitov <ast@fb.com>
>>> > > Cc: Daniel Borkmann <daniel@iogearbox.net>
>>> > > Cc: Wang Nan <wangnan0@huawei.com>
>>> > > Link: http://lkml.kernel.org/r/20161209024620.31660-6-joe@ovn.org
>>> > > [ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as noticed by Wang Nan ]
>>>
>>> So, the above comment no longer applied to this adjusted patch from you,
>>> as you removed one hunk too much, that, after applied, gets samples/bpf/
>>> to build successfully:
>>>
>>> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
>>> index add514e2984a..81b0ef2f7994 100644
>>> --- a/samples/bpf/Makefile
>>> +++ b/samples/bpf/Makefile
>>> @@ -107,6 +107,7 @@ always += lwt_len_hist_kern.o
>>>  always += xdp_tx_iptunnel_kern.o
>>>
>>>  HOSTCFLAGS += -I$(objtree)/usr/include
>>> +HOSTCFLAGS += -I$(srctree)/tools/lib/
>>>  HOSTCFLAGS += -I$(srctree)/tools/testing/selftests/bpf/
>>>
>>>  HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
>>>
>>> ---------------------
>>>
>>> I added it, continuing...
>>
>> But then, when I tried to run offwaketime with it, it fails:
>>
>> [root@jouet bpf]# ./offwaketime  ls
>> bpf_load_program() err=22
>> BPF_LDX uses reserved fields
>> bpf_load_program() err=22
>> BPF_LDX uses reserved fields
>> [root@jouet bpf]#
>>
>> If I remove this patch and try again, it works:
>>
>> [root@jouet bpf]# ./offwaketime | head -4
>> swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 46
>> chrome;return_from_SYSCALL_64;do_syscall_64;exit_to_usermode_loop;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;do_syscall_64;return_from_SYSCALL_64;;Chrome_ChildIOT 1
>> firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 3
>> dockerd-current;entry_SYSCALL_64_fastpath;sys_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;futex_wake;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;dockerd-current 2
>> [root@jouet bpf]#
>>
>>
>> So, I'm stopping here so that I can push what I have to Ingo, then I'll get
>> back to this, hopefully by then you beat me and I have just to retest 8-)
>
> OK, thanks for the report. Looks like there was another difference
> between the two libbpfs - one used total program size for its
> load_program API; the actual kernel API uses instruction count. This
> incremental should do the trick:
>
> https://github.com/joestringer/linux/commit/6ff7726f20077bed66fb725f5189c13690154b6a

The full branch with this change (fast-forward from your tmp branch)
is available here:
https://github.com/joestringer/linux/tree/submit/libbpf_samples_v5

I tried running every selftest and BPF sample I could get my hands on;
there's one or two that I couldn't run, but seemed more to do with my
versions of TC/iproute and kernel config rather than libbpf changes.
Let me know if you see any further trouble.

^ permalink raw reply

* Re: [RFC v2 00/10] HFI Virtual Network Interface Controller (VNIC)
From: ira.weiny @ 2016-12-16  1:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Leon Romanovsky, Jeff Kirsher, David S. Miller,
	Vishwanathapura, Niranjana, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <20161215184837.GA16552-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Thu, Dec 15, 2016 at 11:48:37AM -0700, Jason Gunthorpe wrote:
> On Thu, Dec 15, 2016 at 01:19:18PM -0500, Doug Ledford wrote:
> > On 12/15/2016 12:07 PM, Jason Gunthorpe wrote:
> > > On Thu, Dec 15, 2016 at 11:28:06AM -0500, Doug Ledford wrote:
> > > 
> > >> 1) Since your intent is to make this work with multiple versions of the
> > >> hfi drivers, I disagree with Jason that just because there is only one
> > >> driver today that we should keep it simple.  Design it right from the
> > >> beginning of multi driver is your intent is, IMO, a better way to go.
> > >> You'll work out the bugs in the initial implementation and when it comes
> > >> time to add the second driver, things will go much more smoothly.
> > > 
> > > If that is your position then this should be a straight up IB ULP that
> > > works with any IB hardware.
> > 
> > Yes, see my comments in point #3 of my previous email...
> 
> Well, I'm not opposed to the vnic idea - Mellanox had (has?) a similar
> IB driver. There are lots of good reasons to strictly maintain the
> ethernet presentation.

Agreed.  I'm pretty worried about the idea of putting VNIC into IPoIB.  It
seems like a force fit at best.

> 
> There is much more going on here than just changing the LLADDR,
> essentially everything MAD focused is different compared to ipoib, and
> it looks like the required datastructures are different too. This is
> more of a map a mac to a OPA_LRH approach with SA mediated discovery,
> by my eye.
> 
> The main share is the 'skb send' part, we've talked about hoisting
> that out of ipoib in the past anyhow. A generic verb along those lines
> would probably allow the sdma optimization for hfi for both this new
> ulp and ipoib without creating such an ugly HFI1 specific interface.

I'm not sure what you mean about "skb send" being used by ipoib.  Right now
IPoIB already supplies a "generic skb send" for _Verbs_ in ipoib_send.

I don't know what other devices would do to implement ipoib_send?  To me, it
seems like the abstraction for IPoIB is at the proper layer now.

For OPA, the hfi driver supports both IPoIB and VNIC.  So expecting IPoIB and
VNIC to use a generic "skb send" in ib_device is going to make hfi1 do a lot of
work to determine which ULP is calling it or make the interface kind of ugly.
Either way I don't see how this is better than a separate set of functions.

IMO the cleanest way to "clean up the ugly HFI1 interface" is to just  put the
VNIC operations into ib_device similar to the iWarp specific structure
"iw_cm_verbs" which is there today.

If a device supports the VNIC operations then it can set the pointer and if not
it will be NULL.  VNIC will look for that pointer for the support it needs.  If
in the future other devices need modifications to that interface we can modify
it then.

Ira

> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 3/3] nfc: trf7970a: Prevent repeated polling from crashing the kernel
From: Mark Greer @ 2016-12-16  1:18 UTC (permalink / raw)
  To: Geoff Lansberry
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	lauro.venancio-430g2QfJUUCGglJvpFV4uA,
	aloisio.almeida-430g2QfJUUCGglJvpFV4uA,
	sameo-VuQAYsv1563Yd54FQh9/CA, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	justin-R+k406RtEhcAvxtiuMwx3w, Jaret Cantu
In-Reply-To: <1481841044-4314-3-git-send-email-glansberry-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Thu, Dec 15, 2016 at 05:30:44PM -0500, Geoff Lansberry wrote:
> From: Jaret Cantu <jaret.cantu-jEh4hwF5bVhBDgjK7y7TUQ@public.gmane.org>
> 
> Repeated polling attempts cause a NULL dereference error to occur.
> This is because the curent state of the trf7970a is reading but
> a request has been made to send a command.
> 
> The solution is to properly kill the waiting reading (workqueue)
> before failing on the send.

Maybe its just me but I find this description a little hard to grok.
Mind reworking it?

The patch itself looks fine.

Thanks,

Mark
--
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage
From: Mark Greer @ 2016-12-16  1:13 UTC (permalink / raw)
  To: Geoff Lansberry
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	lauro.venancio-430g2QfJUUCGglJvpFV4uA,
	aloisio.almeida-430g2QfJUUCGglJvpFV4uA,
	sameo-VuQAYsv1563Yd54FQh9/CA, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	justin-R+k406RtEhcAvxtiuMwx3w
In-Reply-To: <1481841044-4314-2-git-send-email-glansberry-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Thu, Dec 15, 2016 at 05:30:43PM -0500, Geoff Lansberry wrote:
> From: Geoff Lansberry <geoff-R+k406RtEhcAvxtiuMwx3w@public.gmane.org>

Missing commit description.

> diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
> index 2d2a077..b4c37ab 100644
> --- a/drivers/nfc/trf7970a.c
> +++ b/drivers/nfc/trf7970a.c

> @@ -1048,6 +1049,11 @@ static int trf7970a_init(struct trf7970a *trf)
>  	if (ret)
>  		goto err_out;
>  
> +	ret = trf7970a_write(trf, TRF7970A_REG_IO_CTRL,
> +			trf->io_ctrl|TRF7970A_REG_IO_CTRL_VRS(0x1));

s/l|T/l | T/

Otherwise, looks good.

Mark
--
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
From: Mark Greer @ 2016-12-16  1:06 UTC (permalink / raw)
  To: Geoff Lansberry
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	lauro.venancio-430g2QfJUUCGglJvpFV4uA,
	aloisio.almeida-430g2QfJUUCGglJvpFV4uA,
	sameo-VuQAYsv1563Yd54FQh9/CA, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	justin-R+k406RtEhcAvxtiuMwx3w
In-Reply-To: <1481841044-4314-1-git-send-email-glansberry-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Hi Geoff.

On Thu, Dec 15, 2016 at 05:30:42PM -0500, Geoff Lansberry wrote:
> From: Geoff Lansberry <geoff-R+k406RtEhcAvxtiuMwx3w@public.gmane.org>

Please add an informative commit description to all of your commits.
No matter how trivial this patch may seem to you now, it may not be
to others (or to you in a few years).

> diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
> index 26c9dbb..2d2a077 100644
> --- a/drivers/nfc/trf7970a.c
> +++ b/drivers/nfc/trf7970a.c

> @@ -1181,27 +1180,37 @@ static int trf7970a_in_config_rf_tech(struct trf7970a *trf, int tech)
>  	switch (tech) {
>  	case NFC_DIGITAL_RF_TECH_106A:
>  		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443A_106;
> -		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK;
> +		trf->modulator_sys_clk_ctrl =
> +			(trf->modulator_sys_clk_ctrl & 0xF8) |

nit: s/0xF8/0xf8/ please (for consistency with the rest of the file.).

Otherwise, it looks good.

Thanks,

Mark
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Soft lockup in inet_put_port on 4.6
From: Hannes Frederic Sowa @ 2016-12-16  0:07 UTC (permalink / raw)
  To: Josef Bacik, Tom Herbert
  Cc: Craig Gallek, Eric Dumazet, Linux Kernel Network Developers
In-Reply-To: <1481828016.24490.5@smtp.office365.com>

Hi Josef,

On 15.12.2016 19:53, Josef Bacik wrote:
> On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert <tom@herbertland.com> wrote:
>> On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek <kraigatgoog@gmail.com>
>> wrote:
>>>  On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert <tom@herbertland.com>
>>> wrote:
>>>>  I think there may be some suspicious code in inet_csk_get_port. At
>>>>  tb_found there is:
>>>>
>>>>                  if (((tb->fastreuse > 0 && reuse) ||
>>>>                       (tb->fastreuseport > 0 &&
>>>>                        !rcu_access_pointer(sk->sk_reuseport_cb) &&
>>>>                        sk->sk_reuseport && uid_eq(tb->fastuid,
>>>> uid))) &&
>>>>                      smallest_size == -1)
>>>>                          goto success;
>>>>                  if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk,
>>>> tb, true)) {
>>>>                          if ((reuse ||
>>>>                               (tb->fastreuseport > 0 &&
>>>>                                sk->sk_reuseport &&
>>>>                               
>>>> !rcu_access_pointer(sk->sk_reuseport_cb) &&
>>>>                                uid_eq(tb->fastuid, uid))) &&
>>>>                              smallest_size != -1 && --attempts >= 0) {
>>>>                                  spin_unlock_bh(&head->lock);
>>>>                                  goto again;
>>>>                          }
>>>>                          goto fail_unlock;
>>>>                  }
>>>>
>>>>  AFAICT there is redundancy in these two conditionals.  The same clause
>>>>  is being checked in both: (tb->fastreuseport > 0 &&
>>>>  !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport &&
>>>>  uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the
>>>>  first conditional should be hit, goto done,  and the second will never
>>>>  evaluate that part to true-- unless the sk is changed (do we need
>>>>  READ_ONCE for sk->sk_reuseport_cb?).
>>>  That's an interesting point... It looks like this function also
>>>  changed in 4.6 from using a single local_bh_disable() at the beginning
>>>  with several spin_lock(&head->lock) to exclusively
>>>  spin_lock_bh(&head->lock) at each locking point.  Perhaps the full bh
>>>  disable variant was preventing the timers in your stack trace from
>>>  running interleaved with this function before?
>>
>> Could be, although dropping the lock shouldn't be able to affect the
>> search state. TBH, I'm a little lost in reading function, the
>> SO_REUSEPORT handling is pretty complicated. For instance,
>> rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that
>> function and also in every call to inet_csk_bind_conflict. I wonder if
>> we can simply this under the assumption that SO_REUSEPORT is only
>> allowed if the port number (snum) is explicitly specified.
> 
> Ok first I have data for you Hannes, here's the time distributions
> before during and after the lockup (with all the debugging in place the
> box eventually recovers).  I've attached it as a text file since it is
> long.

Thanks a lot!

> Second is I was thinking about why we would spend so much time doing the
> ->owners list, and obviously it's because of the massive amount of
> timewait sockets on the owners list.  I wrote the following dumb patch
> and tested it and the problem has disappeared completely.  Now I don't
> know if this is right at all, but I thought it was weird we weren't
> copying the soreuseport option from the original socket onto the twsk. 
> Is there are reason we aren't doing this currently?  Does this help
> explain what is happening?  Thanks,

The patch is interesting and a good clue, but I am immediately a bit
concerned that we don't copy/tag the socket with the uid also to keep
the security properties for SO_REUSEPORT. I have to think a bit more
about this.

We have seen hangs during connect. I am afraid this patch wouldn't help
there while also guaranteeing uniqueness.

Thanks a lot!

^ permalink raw reply

* Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function
From: Hannes Frederic Sowa @ 2016-12-16  0:03 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: David Laight, Netdev, kernel-hardening@lists.openwall.com,
	Jean-Philippe Aumasson, LKML, Linux Crypto Mailing List,
	Daniel J . Bernstein, Linus Torvalds, Eric Biggers
In-Reply-To: <CAHmME9rF0pKxfc3p9ThD3LZU2+ZNPeR-4=9MVEE3AJQKBMNA+w@mail.gmail.com>

On 16.12.2016 00:43, Jason A. Donenfeld wrote:
> Hi Hannes,
> 
> Good news.
> 
> On Thu, Dec 15, 2016 at 10:45 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>>> How's that sound?
>>
>> I am still very much concerned about the API.
> 
> Thanks for pushing me and putting up with my daftness... the constant
> folding works absolutely perfectly. I've run several tests. When gcc
> knows that a struct is aligned (say, via __aligned(8)), then it erases
> the branch and makes a direct jump to the aligned code. When it's
> uncertain, it evaluates at runtime. So, now there is a single
> siphash() function that chooses the best one automatically. Behind the
> scene there's siphash_aligned and siphash_unaligned, but nobody needs
> to call these directly. (Should I rename these to have a double
> underscore prefix?) On platforms that have
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, of course all of this
> disappears and everything goes directly to the aligned version.
> 
> So, I think this assuages your concerns entirely. A single API entry
> point that does the right thing.
> 
> Whew! Good thinking, and thanks again for the suggestion.

Awesome, thanks for trying this out. This basically resolves my concern
API-wise so far.

Hannes out. ;)

^ permalink raw reply

* Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-15 23:47 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: David Laight, Netdev, kernel-hardening@lists.openwall.com,
	Jean-Philippe Aumasson, LKML, Linux Crypto Mailing List,
	Daniel J . Bernstein, Linus Torvalds, Eric Biggers
In-Reply-To: <e574d97f-e0b4-f5e2-6d60-88d3ce185249@stressinduktion.org>

On Thu, Dec 15, 2016 at 10:45 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> By the way, if you target net-next, it is currently closed. So no need
> to hurry.

Honestly I have no idea what I'm targeting. The hash function touches
lib/. The secure_seq stuff touches net/. The rng stuff touches
random.c. Shall this be for net-next? For lib-next (doesn't exist)?
For tytso-next? Since a lot of feedback has come from netdev people, I
suspect net-next is the correct answer. In that case, I'll ask Ted for
his sign-off to touch random.c, and then we'll get this queued up in
net-next. Please correct me if this doesn't actually resemble how
things work around here...

Jason

^ permalink raw reply

* Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-15 23:43 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: David Laight, Netdev, kernel-hardening@lists.openwall.com,
	Jean-Philippe Aumasson, LKML, Linux Crypto Mailing List,
	Daniel J . Bernstein, Linus Torvalds, Eric Biggers
In-Reply-To: <e574d97f-e0b4-f5e2-6d60-88d3ce185249@stressinduktion.org>

Hi Hannes,

Good news.

On Thu, Dec 15, 2016 at 10:45 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
>> How's that sound?
>
> I am still very much concerned about the API.

Thanks for pushing me and putting up with my daftness... the constant
folding works absolutely perfectly. I've run several tests. When gcc
knows that a struct is aligned (say, via __aligned(8)), then it erases
the branch and makes a direct jump to the aligned code. When it's
uncertain, it evaluates at runtime. So, now there is a single
siphash() function that chooses the best one automatically. Behind the
scene there's siphash_aligned and siphash_unaligned, but nobody needs
to call these directly. (Should I rename these to have a double
underscore prefix?) On platforms that have
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, of course all of this
disappears and everything goes directly to the aligned version.

So, I think this assuages your concerns entirely. A single API entry
point that does the right thing.

Whew! Good thinking, and thanks again for the suggestion.

Jason

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: George Spelvin @ 2016-12-15 23:28 UTC (permalink / raw)
  To: ak, davem, David.Laight, ebiggers3, hannes, Jason,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, linux, luto, netdev, tom, torvalds, tytso,
	vegard.nossum
  Cc: djb
In-Reply-To: <CAGiyFdfmiCMyHvAg=5sGh8KjBBrF0Wb4Qf=JLzJqUAx4yFSS3Q@mail.gmail.com>

> If a halved version of SipHash can bring significant performance boost
> (with 32b words instead of 64b words) with an acceptable security level
> (64-bit enough?) then we may design such a version.

I was thinking if the key could be pushed to 80 bits, that would be nice,
but honestly 64 bits is fine.  This is DoS protection, and while it's
possible to brute-force a 64 bit secret, there are more effective (DDoS)
attacks possible for the same cost.

(I'd suggest a name of "HalfSipHash" to convey the reduced security
effectively.)

> Regarding output size, are 64 bits sufficient?

As a replacement for jhash, 32 bits are sufficient.  It's for
indexing an in-memory hash table on a 32-bit machine.


(When you're done thinking about this, as a matter of personal interest
I'd love a hash expert's opinion on
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2a18da7a9c7886f1c7307f8d3f23f24318583f03
and
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8387ff2577eb9ed245df9a39947f66976c6bcd02
which is a non-cryptographic hash function of novel design that's
inspired by SipHash.)

^ permalink raw reply

* Re: Soft lockup in inet_put_port on 4.6
From: Craig Gallek @ 2016-12-15 23:25 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Josef Bacik, Hannes Frederic Sowa, Eric Dumazet,
	Linux Kernel Network Developers
In-Reply-To: <CALx6S34dmOWiEtOFQMc1FGhqvLqxM5JkD6q5MvaZz0cGZmNM=A@mail.gmail.com>

On Thu, Dec 15, 2016 at 5:39 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Thu, Dec 15, 2016 at 10:53 AM, Josef Bacik <jbacik@fb.com> wrote:
>> On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>
>>> On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek <kraigatgoog@gmail.com>
>>> wrote:
>>>>
>>>>  On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert <tom@herbertland.com>
>>>> wrote:
>>>>>
>>>>>  I think there may be some suspicious code in inet_csk_get_port. At
>>>>>  tb_found there is:
>>>>>
>>>>>                  if (((tb->fastreuse > 0 && reuse) ||
>>>>>                       (tb->fastreuseport > 0 &&
>>>>>                        !rcu_access_pointer(sk->sk_reuseport_cb) &&
>>>>>                        sk->sk_reuseport && uid_eq(tb->fastuid, uid))) &&
>>>>>                      smallest_size == -1)
>>>>>                          goto success;
>>>>>                  if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb,
>>>>> true)) {
>>>>>                          if ((reuse ||
>>>>>                               (tb->fastreuseport > 0 &&
>>>>>                                sk->sk_reuseport &&
>>>>>                                !rcu_access_pointer(sk->sk_reuseport_cb)
>>>>> &&
>>>>>                                uid_eq(tb->fastuid, uid))) &&
>>>>>                              smallest_size != -1 && --attempts >= 0) {
>>>>>                                  spin_unlock_bh(&head->lock);
>>>>>                                  goto again;
>>>>>                          }
>>>>>                          goto fail_unlock;
>>>>>                  }
>>>>>
>>>>>  AFAICT there is redundancy in these two conditionals.  The same clause
>>>>>  is being checked in both: (tb->fastreuseport > 0 &&
>>>>>  !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport &&
>>>>>  uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the
>>>>>  first conditional should be hit, goto done,  and the second will never
>>>>>  evaluate that part to true-- unless the sk is changed (do we need
>>>>>  READ_ONCE for sk->sk_reuseport_cb?).
>>>>
>>>>  That's an interesting point... It looks like this function also
>>>>  changed in 4.6 from using a single local_bh_disable() at the beginning
>>>>  with several spin_lock(&head->lock) to exclusively
>>>>  spin_lock_bh(&head->lock) at each locking point.  Perhaps the full bh
>>>>  disable variant was preventing the timers in your stack trace from
>>>>  running interleaved with this function before?
>>>
>>>
>>> Could be, although dropping the lock shouldn't be able to affect the
>>> search state. TBH, I'm a little lost in reading function, the
>>> SO_REUSEPORT handling is pretty complicated. For instance,
>>> rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that
>>> function and also in every call to inet_csk_bind_conflict. I wonder if
>>> we can simply this under the assumption that SO_REUSEPORT is only
>>> allowed if the port number (snum) is explicitly specified.
>>
>>
>> Ok first I have data for you Hannes, here's the time distributions before
>> during and after the lockup (with all the debugging in place the box
>> eventually recovers).  I've attached it as a text file since it is long.
>>
>> Second is I was thinking about why we would spend so much time doing the
>> ->owners list, and obviously it's because of the massive amount of timewait
>> sockets on the owners list.  I wrote the following dumb patch and tested it
>> and the problem has disappeared completely.  Now I don't know if this is
>> right at all, but I thought it was weird we weren't copying the soreuseport
>> option from the original socket onto the twsk.  Is there are reason we
>> aren't doing this currently?  Does this help explain what is happening?
>> Thanks,
>>
> I think that would explain it. We would be walking long lists of TW
> sockets in inet_bind_bucket_for_each(tb, &head->chain). This should
> break, although now I'm wondering if there's other ways we can get
> into this situation. reuseport ensures that we can have long lists of
> sockets in a single bucket, TW sockets can make that list really long.
What if the time-wait timer implementation was changed to do more
opportunistic removals?  In this case, you seem to have a coordinated
timer event causing many independent locking events on the bucket in
question.  If one of those firing events realized it could handle all
of them, you could greatly reduce the contention.  The fact that they
all hash to the same bucket may make this even easier...

> Tom
>
>> Josef

^ permalink raw reply

* Re: [net-next PATCH v6 0/5] XDP for virtio_net
From: Michael S. Tsirkin @ 2016-12-15 23:17 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, netdev, alexei.starovoitov, john.r.fastabend, brouer,
	tgraf, davem
In-Reply-To: <20161215200712.23639.53043.stgit@john-Precision-Tower-5810>

On Thu, Dec 15, 2016 at 12:12:04PM -0800, John Fastabend wrote:
> This implements virtio_net for the mergeable buffers and big_packet
> modes. I tested this with vhost_net running on qemu and did not see
> any issues. For testing num_buf > 1 I added a hack to vhost driver
> to only but 100 bytes per buffer.
> 
> There are some restrictions for XDP to be enabled and work well
> (see patch 3) for more details.
> 
>   1. GUEST_TSO{4|6} must be off
>   2. MTU must be less than PAGE_SIZE
>   3. queues must be available to dedicate to XDP
>   4. num_bufs received in mergeable buffers must be 1
>   5. big_packet mode must have all data on single page
> 
> To test this I used pktgen in the hypervisor and ran the XDP sample
> programs xdp1 and xdp2 from ./samples/bpf in the host. The default
> mode that is used with these patches with Linux guest and QEMU/Linux
> hypervisor is the mergeable buffers mode. I tested this mode for 2+
> days running xdp2 without issues. Additionally I did a series of
> driver unload/load tests to check the allocate/release paths.
> 
> To test the big_packets path I applied the following simple patch against
> the virtio driver forcing big_packets mode,
> 
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2242,7 +2242,7 @@ static int virtnet_probe(struct virtio_device *vdev)
>                 vi->big_packets = true;
>  
>         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> -               vi->mergeable_rx_bufs = true;
> +               vi->mergeable_rx_bufs = false;
>  
>         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF) ||
>             virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
> 
> I then repeated the tests with xdp1 and xdp2. After letting them run
> for a few hours I called it good enough.
> 
> Testing the unexpected case where virtio receives a packet across
> multiple buffers required patching the hypervisor vhost driver to
> convince it to send these unexpected packets. Then I used ping with
> the -s option to trigger the case with multiple buffers. This mode
> is not expected to be used but as MST pointed out per spec it is
> not strictly speaking illegal to generate multi-buffer packets so we
> need someway to handle these. The following patch can be used to
> generate multiple buffers,
> 
> 
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1777,7 +1777,8 @@ static int translate_desc(struct vhost_virtqueue
> *vq, u64
> 
>                 _iov = iov + ret;
>                 size = node->size - addr + node->start;
> -               _iov->iov_len = min((u64)len - s, size);
> +               printk("%s: build 100 length headers!\n", __func__);
> +               _iov->iov_len = min((u64)len - s, (u64)100);//size);
>                 _iov->iov_base = (void __user *)(unsigned long)
>                         (node->userspace_addr + addr - node->start);
>                 s += size;
> 
> The qemu command I most frequently used for testing (although I did test
> various other combinations of devices) is the following,
> 
>  ./x86_64-softmmu/qemu-system-x86_64              \
>     -hda /var/lib/libvirt/images/Fedora-test0.img \
>     -m 4096  -enable-kvm -smp 2                   \
>     -netdev tap,id=hn0,queues=4,vhost=on          \
>     -device virtio-net-pci,netdev=hn0,mq=on,vectors=9,guest_tso4=off,guest_tso6=off \
>     -serial stdio
> 
> The options 'guest_tso4=off,guest_tso6=off' are required because we
> do not support LRO with XDP at the moment.
> 
> Please review any comments/feedback welcome as always.
> 
> Thanks,
> John
> 
> ---

OK, I think we can queue this for -next.

It's fairly limited in the kind of hardware supported, we can and
probably should extend it further with time.

Acked-by: Michael S. Tsirkin <mst@redhat.com>


> John Fastabend (5):
>       net: xdp: add invalid buffer warning
>       virtio_net: Add XDP support
>       virtio_net: add dedicated XDP transmit queues
>       virtio_net: add XDP_TX support
>       virtio_net: xdp, add slowpath case for non contiguous buffers
> 
> 
>  drivers/net/virtio_net.c |  365 +++++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/filter.h   |    1 
>  net/core/filter.c        |    6 +
>  3 files changed, 365 insertions(+), 7 deletions(-)
> 
> --
> Signature

^ permalink raw reply

* [PULL] virtio, vhost: new device, fixes, speedups
From: Michael S. Tsirkin @ 2016-12-15 23:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: mark.rutland, kvm, mst, virtualization, hch, stefan, krzk, felipe,
	tklauser, liuyuan, arend.vanspriel, marcel, mkl, kvalo,
	omarapazanadi, gregkh, linux-kernel, stable, netdev, torvalds

The following changes since commit a57cb1c1d7974c62a5c80f7869e35b492ace12cd:

  Merge branch 'akpm' (patches from Andrew) (2016-12-14 17:25:18 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 6bdf1e0efb04a1716373646cb6f35b73addca492:

  Makefile: drop -D__CHECK_ENDIAN__ from cflags (2016-12-16 00:13:43 +0200)

----------------------------------------------------------------
virtio, vhost: new device, fixes, speedups

This includes the new virtio crypto device, and fixes all over the
place.  In particular enabling endian-ness checks for sparse builds
found some bugs which this fixes.  And it appears that everyone is in
agreement that disabling endian-ness sparse checks shouldn't be
necessary any longer.

So this enables them for everyone, and drops __CHECK_ENDIAN__
and __bitwise__ APIs.

IRQ handling in virtio has been refactored somewhat, the
larger switch to IRQ_SHARED will have to wait as
it proved too aggressive.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

----------------------------------------------------------------
Christoph Hellwig (4):
      virtio_pci: use pci_alloc_irq_vectors
      virtio_pci: remove the call to vp_free_vectors in vp_request_msix_vectors
      virtio_pci: merge vp_free_vectors into vp_del_vqs
      virtio_pci: split vp_try_to_find_vqs into INTx and MSI-X variants

Felipe Franciosi (1):
      virtio_ring: fix description of virtqueue_get_buf

Gao feng (1):
      vsock: lookup and setup guest_cid inside vhost_vsock_lock

Gonglei (3):
      virtio_pci_modern: fix complaint by sparse
      virtio_ring: fix complaint by sparse
      crypto: add virtio-crypto driver

Jason Wang (2):
      vhost: cache used event for better performance
      vhost: remove unused feature bit

Mark Rutland (3):
      tools/virtio: fix READ_ONCE()
      vringh: kill off ACCESS_ONCE()
      tools/virtio: use {READ,WRITE}_ONCE() in uaccess.h

Michael S. Tsirkin (18):
      virtio_console: drop unused config fields
      drm/virtio: fix endianness in primary_plane_update
      drm/virtio: fix lock context imbalance
      drm/virtio: annotate virtio_gpu_queue_ctrl_buffer_locked
      vhost: make interval tree static inline
      vhost: add missing __user annotations
      vsock/virtio: add a missing __le annotation
      vsock/virtio: mark an internal function static
      vsock/virtio: fix src/dst cid format
      virtio: clean up handling of request_irq failure
      linux/types.h: enable endian checks for all sparse builds
      tools: enable endian checks for all sparse builds
      Documentation/sparse: drop __bitwise__
      checkpatch: replace __bitwise__ with __bitwise
      linux: drop __bitwise__ everywhere
      Documentation/sparse: drop __CHECK_ENDIAN__
      fs/logfs: drop __CHECK_ENDIAN__
      Makefile: drop -D__CHECK_ENDIAN__ from cflags

Tobias Klauser (1):
      vhost/scsi: Remove unused but set variable

Yuan Liu (1):
      virtio_mmio: Set dev.release() to avoid warning

 Documentation/translations/zh_CN/sparse.txt        |   7 +-
 arch/arm/plat-samsung/include/plat/gpio-cfg.h      |   2 +-
 drivers/crypto/virtio/virtio_crypto_common.h       | 128 +++++
 drivers/md/dm-cache-block-types.h                  |   6 +-
 drivers/net/ethernet/sun/sunhme.h                  |   2 +-
 drivers/net/wireless/intel/iwlwifi/iwl-fw-file.h   |   4 +-
 drivers/vhost/vhost.h                              |   3 +
 drivers/virtio/virtio_pci_common.h                 |   1 -
 fs/logfs/logfs.h                                   |   4 +-
 include/linux/mmzone.h                             |   2 +-
 include/linux/serial_core.h                        |   4 +-
 include/linux/types.h                              |   4 +-
 include/scsi/iscsi_proto.h                         |   2 +-
 include/target/target_core_base.h                  |   2 +-
 include/uapi/linux/types.h                         |   4 -
 include/uapi/linux/vhost.h                         |   2 -
 include/uapi/linux/virtio_crypto.h                 | 450 +++++++++++++++++
 include/uapi/linux/virtio_ids.h                    |   1 +
 include/uapi/linux/virtio_types.h                  |   6 +-
 net/ieee802154/6lowpan/6lowpan_i.h                 |   2 +-
 net/mac80211/ieee80211_i.h                         |   4 +-
 tools/include/linux/types.h                        |   4 -
 tools/virtio/linux/compiler.h                      |   2 +-
 tools/virtio/linux/uaccess.h                       |   9 +-
 drivers/char/virtio_console.c                      |  14 +-
 drivers/crypto/virtio/virtio_crypto_algs.c         | 540 +++++++++++++++++++++
 drivers/crypto/virtio/virtio_crypto_core.c         | 476 ++++++++++++++++++
 drivers/crypto/virtio/virtio_crypto_mgr.c          | 264 ++++++++++
 drivers/gpu/drm/virtio/virtgpu_plane.c             |   4 +-
 drivers/gpu/drm/virtio/virtgpu_vq.c                |   6 +-
 drivers/vhost/scsi.c                               |   2 -
 drivers/vhost/vhost.c                              |  40 +-
 drivers/vhost/vringh.c                             |   5 +-
 drivers/vhost/vsock.c                              |  25 +-
 drivers/virtio/virtio_mmio.c                       |   2 +
 drivers/virtio/virtio_pci_common.c                 | 201 ++++----
 drivers/virtio/virtio_pci_modern.c                 |   8 +-
 drivers/virtio/virtio_ring.c                       |   6 +-
 net/vmw_vsock/virtio_transport.c                   |   2 +-
 net/vmw_vsock/virtio_transport_common.c            |  17 +-
 Documentation/dev-tools/sparse.rst                 |  14 +-
 MAINTAINERS                                        |   9 +
 drivers/bluetooth/Makefile                         |   2 -
 drivers/crypto/Kconfig                             |   2 +
 drivers/crypto/Makefile                            |   1 +
 drivers/crypto/virtio/Kconfig                      |  10 +
 drivers/crypto/virtio/Makefile                     |   5 +
 drivers/net/can/Makefile                           |   1 -
 drivers/net/ethernet/altera/Makefile               |   1 -
 drivers/net/ethernet/atheros/alx/Makefile          |   1 -
 drivers/net/ethernet/freescale/Makefile            |   2 -
 drivers/net/wireless/ath/Makefile                  |   2 -
 drivers/net/wireless/ath/wil6210/Makefile          |   2 -
 .../wireless/broadcom/brcm80211/brcmfmac/Makefile  |   2 -
 .../wireless/broadcom/brcm80211/brcmsmac/Makefile  |   1 -
 drivers/net/wireless/intel/iwlegacy/Makefile       |   2 -
 drivers/net/wireless/intel/iwlwifi/Makefile        |   2 +-
 drivers/net/wireless/intel/iwlwifi/dvm/Makefile    |   2 +-
 drivers/net/wireless/intel/iwlwifi/mvm/Makefile    |   2 +-
 drivers/net/wireless/intersil/orinoco/Makefile     |   3 -
 drivers/net/wireless/mediatek/mt7601u/Makefile     |   2 -
 drivers/net/wireless/realtek/rtlwifi/Makefile      |   2 -
 .../wireless/realtek/rtlwifi/btcoexist/Makefile    |   2 -
 .../wireless/realtek/rtlwifi/rtl8188ee/Makefile    |   2 -
 .../net/wireless/realtek/rtlwifi/rtl8192c/Makefile |   2 -
 .../wireless/realtek/rtlwifi/rtl8192ce/Makefile    |   2 -
 .../wireless/realtek/rtlwifi/rtl8192cu/Makefile    |   2 -
 .../wireless/realtek/rtlwifi/rtl8192de/Makefile    |   2 -
 .../wireless/realtek/rtlwifi/rtl8192ee/Makefile    |   2 -
 .../wireless/realtek/rtlwifi/rtl8192se/Makefile    |   2 -
 .../wireless/realtek/rtlwifi/rtl8723ae/Makefile    |   2 -
 .../wireless/realtek/rtlwifi/rtl8723be/Makefile    |   2 -
 .../wireless/realtek/rtlwifi/rtl8723com/Makefile   |   2 -
 .../wireless/realtek/rtlwifi/rtl8821ae/Makefile    |   2 -
 drivers/net/wireless/ti/wl1251/Makefile            |   2 -
 drivers/net/wireless/ti/wlcore/Makefile            |   2 -
 drivers/staging/rtl8188eu/Makefile                 |   2 +-
 drivers/staging/rtl8192e/Makefile                  |   2 -
 drivers/staging/rtl8192e/rtl8192e/Makefile         |   2 -
 include/uapi/linux/Kbuild                          |   1 +
 net/bluetooth/Makefile                             |   2 -
 net/ieee802154/Makefile                            |   2 -
 net/mac80211/Makefile                              |   2 +-
 net/mac802154/Makefile                             |   2 -
 net/wireless/Makefile                              |   2 -
 scripts/checkpatch.pl                              |   4 +-
 86 files changed, 2106 insertions(+), 280 deletions(-)
 create mode 100644 drivers/crypto/virtio/virtio_crypto_common.h
 create mode 100644 include/uapi/linux/virtio_crypto.h
 create mode 100644 drivers/crypto/virtio/virtio_crypto_algs.c
 create mode 100644 drivers/crypto/virtio/virtio_crypto_core.c
 create mode 100644 drivers/crypto/virtio/virtio_crypto_mgr.c
 create mode 100644 drivers/crypto/virtio/Kconfig
 create mode 100644 drivers/crypto/virtio/Makefile

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: Jean-Philippe Aumasson @ 2016-12-15 23:00 UTC (permalink / raw)
  To: George Spelvin, ak, davem, David.Laight, ebiggers3, hannes, Jason,
	kernel-hardening, linux-crypto, linux-kernel, luto, netdev, tom,
	torvalds, tytso, vegard.nossum
  Cc: djb
In-Reply-To: <20161215224224.21447.qmail@ns.sciencehorizons.net>

[-- Attachment #1: Type: text/plain, Size: 4109 bytes --]

If a halved version of SipHash can bring significant performance boost
(with 32b words instead of 64b words) with an acceptable security level
(64-bit enough?) then we may design such a version.

Regarding output size, are 64 bits sufficient?
On Thu, 15 Dec 2016 at 23:42, George Spelvin <linux@sciencehorizons.net>
wrote:

> > While SipHash is extremely fast for a cryptographically secure function,
> > it is likely a tiny bit slower than the insecure jhash, and so
> replacements
> > will be evaluated on a case-by-case basis based on whether or not the
> > difference in speed is negligible and whether or not the current jhash
> usage
> > poses a real security risk.
>
> To quantify that, jhash is 27 instructions per 12 bytes of input, with a
> dependency path length of 13 instructions.  (24/12 in __jash_mix, plus
> 3/1 for adding the input to the state.) The final add + __jhash_final
> is 24 instructions with a path length of 15, which is close enough for
> this handwaving.  Call it 18n instructions and 8n cycles for 8n bytes.
>
> SipHash (on a 64-bit machine) is 14 instructions with a dependency path
> length of 4 *per round*.  Two rounds per 8 bytes, plus plus two adds
> and one cycle per input word, plus four rounds to finish makes 30n+46
> instructions and 9n+16 cycles for 8n bytes.
>
> So *if* you have a 64-bit 4-way superscalar machine, it's not that much
> slower once it gets going, but the four-round finalization is quite
> noticeable for short inputs.
>
> For typical kernel input lengths "within a factor of 2" is
> probably more accurate than "a tiny bit".
>
> You lose a factor of 2 if you machine is 2-way or non-superscalar,
> and a second factor of 2 if it's a 32-bit machine.
>
> I mention this because there are a lot of home routers and other netwoek
> appliances running Linux on 32-bit ARM and MIPS processors.  For those,
> it's a factor of *eight*, which is a lot more than "a tiny bit".
>
> The real killer is if you don't have enough registers; SipHash performs
> horribly on i386 because it uses more state than i386 has registers.
>
> (If i386 performance is desired, you might ask Jean-Philippe for some
> rotate constants for a 32-bit variant with 64 bits of key.  Note that
> SipHash's security proof requires that key length + input length is
> strictly less than the state size, so for a 4x32-bit variant, while
> you could stretch the key length a little, you'd have a hard limit at
> 95 bits.)
>
>
> A second point, the final XOR in SipHash is either a (very minor) design
> mistake, or an opportunity for optimization, depending on how you look
> at it.  Look at the end of the function:
>
> >+      SIPROUND;
> >+      SIPROUND;
> >+      return (v0 ^ v1) ^ (v2 ^ v3);
>
> Expanding that out, you get:
> +       v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32);
> +       v2 += v3; v3 = rol64(v3, 16); v3 ^= v2;
> +       v0 += v3; v3 = rol64(v3, 21); v3 ^= v0;
> +       v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32);
> +       return v0 ^ v1 ^ v2 ^ v3;
>
> Since the final XOR includes both v0 and v3, it's undoing the "v3 ^= v0"
> two lines earlier, so the value of v0 doesn't matter after its XOR into
> v1 on line one.
>
> The final SIPROUND and return can then be optimized to
>
> +       v0 += v1; v1 = rol64(v1, 13); v1 ^= v0;
> +       v2 += v3; v3 = rol64(v3, 16); v3 ^= v2;
> +       v3 = rol64(v3, 21);
> +       v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32);
> +       return v1 ^ v2 ^ v3;
>
> A 32-bit implementation could further tweak the 4 instructions of
>         v1 ^= v2; v2 = rol64(v2, 32); v1 ^= v2;
>
> gcc 6.2.1 -O3 compiles it to basically:
>         v1.low ^= v2.low;
>         v1.high ^= v2.high;
>         v1.low ^= v2.high;
>         v1.high ^= v2.low;
> but it could be written as:
>         v2.low ^= v2.high;
>         v1.low ^= v2.low;
>         v1.high ^= v2.low;
>
> Alternatively, if it's for private use only (key not shared with other
> systems), a slightly stronger variant would "return v1 ^ v3;".
> (The final swap of v2 is dead code, but a compiler can spot that easily.)
>

[-- Attachment #2: Type: text/html, Size: 6336 bytes --]

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: George Spelvin @ 2016-12-15 22:42 UTC (permalink / raw)
  To: ak, davem, David.Laight, ebiggers3, hannes, Jason,
	kernel-hardening, linux-crypto, linux-kernel, linux, luto, netdev,
	tom, torvalds, tytso, vegard.nossum
  Cc: djb, jeanphilippe.aumasson
In-Reply-To: <20161215203003.31989-2-Jason@zx2c4.com>

> While SipHash is extremely fast for a cryptographically secure function,
> it is likely a tiny bit slower than the insecure jhash, and so replacements
> will be evaluated on a case-by-case basis based on whether or not the
> difference in speed is negligible and whether or not the current jhash usage
> poses a real security risk.

To quantify that, jhash is 27 instructions per 12 bytes of input, with a
dependency path length of 13 instructions.  (24/12 in __jash_mix, plus
3/1 for adding the input to the state.) The final add + __jhash_final
is 24 instructions with a path length of 15, which is close enough for
this handwaving.  Call it 18n instructions and 8n cycles for 8n bytes.

SipHash (on a 64-bit machine) is 14 instructions with a dependency path
length of 4 *per round*.  Two rounds per 8 bytes, plus plus two adds
and one cycle per input word, plus four rounds to finish makes 30n+46
instructions and 9n+16 cycles for 8n bytes.

So *if* you have a 64-bit 4-way superscalar machine, it's not that much
slower once it gets going, but the four-round finalization is quite
noticeable for short inputs.

For typical kernel input lengths "within a factor of 2" is
probably more accurate than "a tiny bit".

You lose a factor of 2 if you machine is 2-way or non-superscalar,
and a second factor of 2 if it's a 32-bit machine.

I mention this because there are a lot of home routers and other netwoek
appliances running Linux on 32-bit ARM and MIPS processors.  For those,
it's a factor of *eight*, which is a lot more than "a tiny bit".

The real killer is if you don't have enough registers; SipHash performs
horribly on i386 because it uses more state than i386 has registers.

(If i386 performance is desired, you might ask Jean-Philippe for some
rotate constants for a 32-bit variant with 64 bits of key.  Note that
SipHash's security proof requires that key length + input length is
strictly less than the state size, so for a 4x32-bit variant, while
you could stretch the key length a little, you'd have a hard limit at
95 bits.)

A second point, the final XOR in SipHash is either a (very minor) design
mistake, or an opportunity for optimization, depending on how you look
at it.  Look at the end of the function:

>+	SIPROUND;
>+	SIPROUND;
>+	return (v0 ^ v1) ^ (v2 ^ v3);

Expanding that out, you get:
+	v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32);
+	v2 += v3; v3 = rol64(v3, 16); v3 ^= v2;
+	v0 += v3; v3 = rol64(v3, 21); v3 ^= v0;
+	v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32);
+	return v0 ^ v1 ^ v2 ^ v3;

Since the final XOR includes both v0 and v3, it's undoing the "v3 ^= v0"
two lines earlier, so the value of v0 doesn't matter after its XOR into
v1 on line one.

The final SIPROUND and return can then be optimized to

+	v0 += v1; v1 = rol64(v1, 13); v1 ^= v0;
+	v2 += v3; v3 = rol64(v3, 16); v3 ^= v2;
+	v3 = rol64(v3, 21);
+	v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32);
+	return v1 ^ v2 ^ v3;

A 32-bit implementation could further tweak the 4 instructions of
	v1 ^= v2; v2 = rol64(v2, 32); v1 ^= v2;

gcc 6.2.1 -O3 compiles it to basically:
	v1.low ^= v2.low;
	v1.high ^= v2.high;
	v1.low ^= v2.high;
	v1.high ^= v2.low;
but it could be written as:
	v2.low ^= v2.high;
	v1.low ^= v2.low;
	v1.high ^= v2.low;

Alternatively, if it's for private use only (key not shared with other
systems), a slightly stronger variant would "return v1 ^ v3;".
(The final swap of v2 is dead code, but a compiler can spot that easily.)

^ permalink raw reply

* Re: Soft lockup in inet_put_port on 4.6
From: Tom Herbert @ 2016-12-15 22:39 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Craig Gallek, Hannes Frederic Sowa, Eric Dumazet,
	Linux Kernel Network Developers
In-Reply-To: <1481828016.24490.5@smtp.office365.com>

On Thu, Dec 15, 2016 at 10:53 AM, Josef Bacik <jbacik@fb.com> wrote:
> On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert <tom@herbertland.com> wrote:
>>
>> On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek <kraigatgoog@gmail.com>
>> wrote:
>>>
>>>  On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert <tom@herbertland.com>
>>> wrote:
>>>>
>>>>  I think there may be some suspicious code in inet_csk_get_port. At
>>>>  tb_found there is:
>>>>
>>>>                  if (((tb->fastreuse > 0 && reuse) ||
>>>>                       (tb->fastreuseport > 0 &&
>>>>                        !rcu_access_pointer(sk->sk_reuseport_cb) &&
>>>>                        sk->sk_reuseport && uid_eq(tb->fastuid, uid))) &&
>>>>                      smallest_size == -1)
>>>>                          goto success;
>>>>                  if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb,
>>>> true)) {
>>>>                          if ((reuse ||
>>>>                               (tb->fastreuseport > 0 &&
>>>>                                sk->sk_reuseport &&
>>>>                                !rcu_access_pointer(sk->sk_reuseport_cb)
>>>> &&
>>>>                                uid_eq(tb->fastuid, uid))) &&
>>>>                              smallest_size != -1 && --attempts >= 0) {
>>>>                                  spin_unlock_bh(&head->lock);
>>>>                                  goto again;
>>>>                          }
>>>>                          goto fail_unlock;
>>>>                  }
>>>>
>>>>  AFAICT there is redundancy in these two conditionals.  The same clause
>>>>  is being checked in both: (tb->fastreuseport > 0 &&
>>>>  !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport &&
>>>>  uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the
>>>>  first conditional should be hit, goto done,  and the second will never
>>>>  evaluate that part to true-- unless the sk is changed (do we need
>>>>  READ_ONCE for sk->sk_reuseport_cb?).
>>>
>>>  That's an interesting point... It looks like this function also
>>>  changed in 4.6 from using a single local_bh_disable() at the beginning
>>>  with several spin_lock(&head->lock) to exclusively
>>>  spin_lock_bh(&head->lock) at each locking point.  Perhaps the full bh
>>>  disable variant was preventing the timers in your stack trace from
>>>  running interleaved with this function before?
>>
>>
>> Could be, although dropping the lock shouldn't be able to affect the
>> search state. TBH, I'm a little lost in reading function, the
>> SO_REUSEPORT handling is pretty complicated. For instance,
>> rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that
>> function and also in every call to inet_csk_bind_conflict. I wonder if
>> we can simply this under the assumption that SO_REUSEPORT is only
>> allowed if the port number (snum) is explicitly specified.
>
>
> Ok first I have data for you Hannes, here's the time distributions before
> during and after the lockup (with all the debugging in place the box
> eventually recovers).  I've attached it as a text file since it is long.
>
> Second is I was thinking about why we would spend so much time doing the
> ->owners list, and obviously it's because of the massive amount of timewait
> sockets on the owners list.  I wrote the following dumb patch and tested it
> and the problem has disappeared completely.  Now I don't know if this is
> right at all, but I thought it was weird we weren't copying the soreuseport
> option from the original socket onto the twsk.  Is there are reason we
> aren't doing this currently?  Does this help explain what is happening?
> Thanks,
>
I think that would explain it. We would be walking long lists of TW
sockets in inet_bind_bucket_for_each(tb, &head->chain). This should
break, although now I'm wondering if there's other ways we can get
into this situation. reuseport ensures that we can have long lists of
sockets in a single bucket, TW sockets can make that list really long.

Tom

> Josef

^ permalink raw reply

* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock
From: Lino Sanfilippo @ 2016-12-15 22:33 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro,
	alexandre.torgue, davem, linux-kernel, netdev
In-Reply-To: <f3fc8035-fb6e-fb06-d9b8-bfc96dc4e1d2@gmx.de>

[-- Attachment #1: Type: text/plain, Size: 556 bytes --]

On 15.12.2016 22:32, Lino Sanfilippo wrote:

> Ah ok. Then maybe priv->hw->dma->stop_tx() does not do the job correctly (stop the
> tx path properly) and the HW is still active on the tx path while the tx buffers are
> freed. OTOH stmmac_release() also stops the phy before the tx (and rx) paths are stopped.
> Did you try to stop the phy fist in stmmac_tx_err_work(), too?
> 
> Regards,
> Lino
> 

And this is the "sledgehammer" approach: Do a complete shutdown and restart
of the hardware in case of tx error (against net-next and only compile tested).


[-- Attachment #2: 0001-Sledgehammer.patch --]
[-- Type: text/x-patch, Size: 4140 bytes --]

>From 0eda87ce6cbc2fb6d25653f30121f30f89332199 Mon Sep 17 00:00:00 2001
From: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Date: Thu, 15 Dec 2016 23:18:15 +0100
Subject: [PATCH] Sledgehammer

---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h      |  1 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 70 ++++++++++-------------
 2 files changed, 31 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index eab04ae..9c240d7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -131,6 +131,7 @@ struct stmmac_priv {
 	u32 rx_tail_addr;
 	u32 tx_tail_addr;
 	u32 mss;
+	struct work_struct tx_err_work;
 
 #ifdef CONFIG_DEBUG_FS
 	struct dentry *dbgfs_dir;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 3e40578..7546574 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1403,37 +1403,6 @@ static inline void stmmac_disable_dma_irq(struct stmmac_priv *priv)
 }
 
 /**
- * stmmac_tx_err - to manage the tx error
- * @priv: driver private structure
- * Description: it cleans the descriptors and restarts the transmission
- * in case of transmission errors.
- */
-static void stmmac_tx_err(struct stmmac_priv *priv)
-{
-	int i;
-	netif_stop_queue(priv->dev);
-
-	priv->hw->dma->stop_tx(priv->ioaddr);
-	dma_free_tx_skbufs(priv);
-	for (i = 0; i < DMA_TX_SIZE; i++)
-		if (priv->extend_desc)
-			priv->hw->desc->init_tx_desc(&priv->dma_etx[i].basic,
-						     priv->mode,
-						     (i == DMA_TX_SIZE - 1));
-		else
-			priv->hw->desc->init_tx_desc(&priv->dma_tx[i],
-						     priv->mode,
-						     (i == DMA_TX_SIZE - 1));
-	priv->dirty_tx = 0;
-	priv->cur_tx = 0;
-	netdev_reset_queue(priv->dev);
-	priv->hw->dma->start_tx(priv->ioaddr);
-
-	priv->dev->stats.tx_errors++;
-	netif_wake_queue(priv->dev);
-}
-
-/**
  * stmmac_dma_interrupt - DMA ISR
  * @priv: driver private structure
  * Description: this is the DMA ISR. It is called by the main ISR.
@@ -1466,7 +1435,7 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv)
 			priv->xstats.threshold = tc;
 		}
 	} else if (unlikely(status == tx_hard_error))
-		stmmac_tx_err(priv);
+		schedule_work(&priv->tx_err_work);
 }
 
 /**
@@ -1870,13 +1839,7 @@ static int stmmac_open(struct net_device *dev)
 	return ret;
 }
 
-/**
- *  stmmac_release - close entry point of the driver
- *  @dev : device pointer.
- *  Description:
- *  This is the stop entry point of the driver.
- */
-static int stmmac_release(struct net_device *dev)
+static int stmmac_do_release(struct net_device *dev)
 {
 	struct stmmac_priv *priv = netdev_priv(dev);
 
@@ -1919,10 +1882,36 @@ static int stmmac_release(struct net_device *dev)
 #endif
 
 	stmmac_release_ptp(priv);
+}
+
+/**
+ *  stmmac_release - close entry point of the driver
+ *  @dev : device pointer.
+ *  Description:
+ *  This is the stop entry point of the driver.
+ */
+static int stmmac_release(struct net_device *dev)
+{
+	struct stmmac_priv *priv = netdev_priv(dev);
+
+	cancel_work_sync(&priv->tx_err_work);
+
+	stmmac_do_release(dev);
 
 	return 0;
 }
 
+static void stmmac_tx_err_work(struct work_struct *work)
+{
+	struct stmmac_priv *priv = container_of(work, struct stmmac_priv,
+						tx_err_work);
+	/* restart netdev */
+	rtnl_lock();
+	stmmac_release(priv->dev);
+	stmmac_open(priv->dev);
+	rtnl_unlock();
+}
+
 /**
  *  stmmac_tso_allocator - close entry point of the driver
  *  @priv: driver private structure
@@ -2688,7 +2677,7 @@ static void stmmac_tx_timeout(struct net_device *dev)
 	struct stmmac_priv *priv = netdev_priv(dev);
 
 	/* Clear Tx resources and restart transmitting again */
-	stmmac_tx_err(priv);
+	schedule_work(&priv->tx_err_work);
 }
 
 /**
@@ -3338,6 +3327,7 @@ int stmmac_dvr_probe(struct device *device,
 	netif_napi_add(ndev, &priv->napi, stmmac_poll, 64);
 
 	spin_lock_init(&priv->lock);
+	INIT_WORK(&priv->tx_err_work, stmmac_tx_err_work);
 
 	ret = register_netdev(ndev);
 	if (ret) {
-- 
1.9.1


^ permalink raw reply related

* [PATCH 3/3] nfc: trf7970a: Prevent repeated polling from crashing the kernel
From: Geoff Lansberry @ 2016-12-15 22:30 UTC (permalink / raw)
  To: linux-wireless
  Cc: lauro.venancio, aloisio.almeida, sameo, robh+dt, mark.rutland,
	netdev, devicetree, linux-kernel, mgreer, justin, Jaret Cantu,
	Geoff Lansberry
In-Reply-To: <1481841044-4314-1-git-send-email-glansberry@gmail.com>

From: Jaret Cantu <jaret.cantu@timesys.com>

Repeated polling attempts cause a NULL dereference error to occur.
This is because the curent state of the trf7970a is reading but
a request has been made to send a command.

The solution is to properly kill the waiting reading (workqueue)
before failing on the send.
---
 drivers/nfc/trf7970a.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
index b4c37ab..f96a321 100644
--- a/drivers/nfc/trf7970a.c
+++ b/drivers/nfc/trf7970a.c
@@ -1493,6 +1493,10 @@ static int trf7970a_send_cmd(struct nfc_digital_dev *ddev,
 			(trf->state != TRF7970A_ST_IDLE_RX_BLOCKED)) {
 		dev_err(trf->dev, "%s - Bogus state: %d\n", __func__,
 				trf->state);
+		if (trf->state == TRF7970A_ST_WAIT_FOR_RX_DATA ||
+		    trf->state == TRF7970A_ST_WAIT_FOR_RX_DATA_CONT)
+			trf->ignore_timeout =
+				!cancel_delayed_work(&trf->timeout_work);
 		ret = -EIO;
 		goto out_err;
 	}
-- 
Signed-off-by: Geoff Lansberry <geoff@kuvee.com>

^ permalink raw reply related

* [PATCH 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage
From: Geoff Lansberry @ 2016-12-15 22:30 UTC (permalink / raw)
  To: linux-wireless-u79uwXL29TY76Z2rM5mHXA
  Cc: lauro.venancio-430g2QfJUUCGglJvpFV4uA,
	aloisio.almeida-430g2QfJUUCGglJvpFV4uA,
	sameo-VuQAYsv1563Yd54FQh9/CA, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	mgreer-luAo+O/VEmrlveNOaEYElw, justin-R+k406RtEhcAvxtiuMwx3w,
	Geoff Lansberry
In-Reply-To: <1481841044-4314-1-git-send-email-glansberry-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

From: Geoff Lansberry <geoff-R+k406RtEhcAvxtiuMwx3w@public.gmane.org>

---
 Documentation/devicetree/bindings/net/nfc/trf7970a.txt |  2 ++
 drivers/nfc/trf7970a.c                                 | 13 ++++++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
index 9dda879..208f045 100644
--- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
+++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
@@ -21,6 +21,7 @@ Optional SoC Specific Properties:
 - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum
   where an extra byte is returned by Read Multiple Block commands issued
   to Type 5 tags.
+- vdd_io_1v8: Set to specify that the trf7970a io voltage should be set to 1.8V
 - crystal_27mhz: Set to specify that the input frequency to the trf7970a is 27.12MHz
 
 
@@ -45,6 +46,7 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
 		irq-status-read-quirk;
 		en2-rf-quirk;
 		t5t-rmb-extra-byte-quirk;
+		vdd_io_1v8;
 		crystal_27mhz;
 		status = "okay";
 	};
diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
index 2d2a077..b4c37ab 100644
--- a/drivers/nfc/trf7970a.c
+++ b/drivers/nfc/trf7970a.c
@@ -441,6 +441,7 @@ struct trf7970a {
 	u8				iso_ctrl_tech;
 	u8				modulator_sys_clk_ctrl;
 	u8				special_fcn_reg1;
+	u8				io_ctrl;
 	unsigned int			guard_time;
 	int				technology;
 	int				framing;
@@ -1048,6 +1049,11 @@ static int trf7970a_init(struct trf7970a *trf)
 	if (ret)
 		goto err_out;
 
+	ret = trf7970a_write(trf, TRF7970A_REG_IO_CTRL,
+			trf->io_ctrl|TRF7970A_REG_IO_CTRL_VRS(0x1));
+	if (ret)
+		goto err_out;
+
 	ret = trf7970a_write(trf, TRF7970A_NFC_TARGET_LEVEL, 0);
 	if (ret)
 		goto err_out;
@@ -1764,7 +1770,7 @@ static int _trf7970a_tg_listen(struct nfc_digital_dev *ddev, u16 timeout,
 		goto out_err;
 
 	ret = trf7970a_write(trf, TRF7970A_REG_IO_CTRL,
-			TRF7970A_REG_IO_CTRL_VRS(0x1));
+			trf->io_ctrl|TRF7970A_REG_IO_CTRL_VRS(0x1));
 	if (ret)
 		goto out_err;
 
@@ -2058,6 +2064,11 @@ static int trf7970a_probe(struct spi_device *spi)
 		return ret;
 	}
 
+	if (of_property_read_bool(np, "vdd_io_1v8")) {
+		trf->io_ctrl = TRF7970A_REG_IO_CTRL_IO_LOW;
+		dev_dbg(trf->dev, "trf7970a config vdd_io_1v8\n");
+	}
+
 	if (of_property_read_bool(np, "crystal_27mhz")) {
 		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_27MHZ;
 		dev_dbg(trf->dev, "trf7970a configure crystal_27mhz\n");
-- 
Signed-off-by: Geoff Lansberry <geoff-R+k406RtEhcAvxtiuMwx3w@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
From: Geoff Lansberry @ 2016-12-15 22:30 UTC (permalink / raw)
  To: linux-wireless
  Cc: lauro.venancio, aloisio.almeida, sameo, robh+dt, mark.rutland,
	netdev, devicetree, linux-kernel, mgreer, justin, Geoff Lansberry

From: Geoff Lansberry <geoff@kuvee.com>

---
 .../devicetree/bindings/net/nfc/trf7970a.txt       |  3 ++
 drivers/nfc/trf7970a.c                             | 42 ++++++++++++++++------
 2 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
index 32b35a0..9dda879 100644
--- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
+++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
@@ -21,6 +21,8 @@ Optional SoC Specific Properties:
 - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum
   where an extra byte is returned by Read Multiple Block commands issued
   to Type 5 tags.
+- crystal_27mhz: Set to specify that the input frequency to the trf7970a is 27.12MHz
+
 
 Example (for ARM-based BeagleBone with TRF7970A on SPI1):
 
@@ -43,6 +45,7 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
 		irq-status-read-quirk;
 		en2-rf-quirk;
 		t5t-rmb-extra-byte-quirk;
+		crystal_27mhz;
 		status = "okay";
 	};
 };
diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
index 26c9dbb..2d2a077 100644
--- a/drivers/nfc/trf7970a.c
+++ b/drivers/nfc/trf7970a.c
@@ -1056,12 +1056,11 @@ static int trf7970a_init(struct trf7970a *trf)
 
 	trf->chip_status_ctrl &= ~TRF7970A_CHIP_STATUS_RF_ON;
 
-	ret = trf7970a_write(trf, TRF7970A_MODULATOR_SYS_CLK_CTRL, 0);
+	ret = trf7970a_write(trf, TRF7970A_MODULATOR_SYS_CLK_CTRL,
+			trf->modulator_sys_clk_ctrl);
 	if (ret)
 		goto err_out;
 
-	trf->modulator_sys_clk_ctrl = 0;
-
 	ret = trf7970a_write(trf, TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS,
 			TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS_WLH_96 |
 			TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS_WLL_32);
@@ -1181,27 +1180,37 @@ static int trf7970a_in_config_rf_tech(struct trf7970a *trf, int tech)
 	switch (tech) {
 	case NFC_DIGITAL_RF_TECH_106A:
 		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443A_106;
-		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK;
+		trf->modulator_sys_clk_ctrl =
+			(trf->modulator_sys_clk_ctrl & 0xF8) |
+			TRF7970A_MODULATOR_DEPTH_OOK;
 		trf->guard_time = TRF7970A_GUARD_TIME_NFCA;
 		break;
 	case NFC_DIGITAL_RF_TECH_106B:
 		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443B_106;
-		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
+		trf->modulator_sys_clk_ctrl =
+			(trf->modulator_sys_clk_ctrl & 0xF8) |
+			TRF7970A_MODULATOR_DEPTH_ASK10;
 		trf->guard_time = TRF7970A_GUARD_TIME_NFCB;
 		break;
 	case NFC_DIGITAL_RF_TECH_212F:
 		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_FELICA_212;
-		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
+		trf->modulator_sys_clk_ctrl =
+			(trf->modulator_sys_clk_ctrl & 0xF8) |
+			TRF7970A_MODULATOR_DEPTH_ASK10;
 		trf->guard_time = TRF7970A_GUARD_TIME_NFCF;
 		break;
 	case NFC_DIGITAL_RF_TECH_424F:
 		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_FELICA_424;
-		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
+		trf->modulator_sys_clk_ctrl =
+			(trf->modulator_sys_clk_ctrl & 0xF8) |
+			TRF7970A_MODULATOR_DEPTH_ASK10;
 		trf->guard_time = TRF7970A_GUARD_TIME_NFCF;
 		break;
 	case NFC_DIGITAL_RF_TECH_ISO15693:
 		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_15693_SGL_1OF4_2648;
-		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK;
+		trf->modulator_sys_clk_ctrl =
+			(trf->modulator_sys_clk_ctrl & 0xF8) |
+			TRF7970A_MODULATOR_DEPTH_OOK;
 		trf->guard_time = TRF7970A_GUARD_TIME_15693;
 		break;
 	default:
@@ -1571,17 +1580,23 @@ static int trf7970a_tg_config_rf_tech(struct trf7970a *trf, int tech)
 		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_NFC_NFC_CE_MODE |
 			TRF7970A_ISO_CTRL_NFC_CE |
 			TRF7970A_ISO_CTRL_NFC_CE_14443A;
-		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK;
+		trf->modulator_sys_clk_ctrl =
+			(trf->modulator_sys_clk_ctrl & 0xF8) |
+			TRF7970A_MODULATOR_DEPTH_OOK;
 		break;
 	case NFC_DIGITAL_RF_TECH_212F:
 		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_NFC_NFC_CE_MODE |
 			TRF7970A_ISO_CTRL_NFC_NFCF_212;
-		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
+		trf->modulator_sys_clk_ctrl =
+			(trf->modulator_sys_clk_ctrl & 0xF8) |
+			TRF7970A_MODULATOR_DEPTH_ASK10;
 		break;
 	case NFC_DIGITAL_RF_TECH_424F:
 		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_NFC_NFC_CE_MODE |
 			TRF7970A_ISO_CTRL_NFC_NFCF_424;
-		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
+		trf->modulator_sys_clk_ctrl =
+			(trf->modulator_sys_clk_ctrl & 0xF8) |
+			TRF7970A_MODULATOR_DEPTH_ASK10;
 		break;
 	default:
 		dev_dbg(trf->dev, "Unsupported rf technology: %d\n", tech);
@@ -2043,6 +2058,11 @@ static int trf7970a_probe(struct spi_device *spi)
 		return ret;
 	}
 
+	if (of_property_read_bool(np, "crystal_27mhz")) {
+		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_27MHZ;
+		dev_dbg(trf->dev, "trf7970a configure crystal_27mhz\n");
+	}
+
 	if (of_property_read_bool(np, "en2-rf-quirk"))
 		trf->quirks |= TRF7970A_QUIRK_EN2_MUST_STAY_LOW;
 
-- 
Signed-off-by: Geoff Lansberry <geoff@kuvee.com>

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox