* Re: [kernel-hardening] Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-15 21:11 UTC (permalink / raw)
To: kernel-hardening
Cc: Hannes Frederic Sowa, David Laight, Netdev,
Jean-Philippe Aumasson, LKML, Linux Crypto Mailing List,
Daniel J . Bernstein, Linus Torvalds, Eric Biggers
In-Reply-To: <20161215210933.GY3207@twins.programming.kicks-ass.net>
On Thu, Dec 15, 2016 at 10:09 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Dec 15, 2016 at 07:50:36PM +0100, Jason A. Donenfeld wrote:
>> There's no 32-bit platform
>> that will trap on a 64-bit unaligned access because there's no such
>> thing as a 64-bit access there. In short, we're fine.
>
> ARMv7 LPAE is a 32bit architecture that has 64bit load/stores IIRC.
>
> x86 has cmpxchg8b that can do 64bit things and very much wants the u64
> aligned.
>
> Also, IIRC we have a few platforms where u64 doesn't carry 8 byte
> alignment, m68k or something like that, but yes, you likely don't care.
Indeed, I stand corrected. But in any case, the use of __aligned(8) in
the patchset ensures that things are fixed and that we don't have this
issue.
^ permalink raw reply
* Re: [kernel-hardening] Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function
From: Linus Torvalds @ 2016-12-15 21:14 UTC (permalink / raw)
To: Jason A. Donenfeld
Cc: kernel-hardening@lists.openwall.com, Hannes Frederic Sowa,
David Laight, Netdev, Jean-Philippe Aumasson, LKML,
Linux Crypto Mailing List, Daniel J . Bernstein, Eric Biggers
In-Reply-To: <CAHmME9p+xEBTKz+Wfy5Ypav4HU7H+rnA-0hLgd1sMmthDzOmvw@mail.gmail.com>
On Thu, Dec 15, 2016 at 1:11 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Indeed, I stand corrected. But in any case, the use of __aligned(8) in
> the patchset ensures that things are fixed and that we don't have this
> issue.
I think you can/should just use the natural alignment for "u64".
For architectures that need 8-byte alignment, u64 will already be
properly aligned. For architectures (like x86-32) that only need
4-byte alignment, you get it.
Linus
^ permalink raw reply
* Re: Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-15 21:16 UTC (permalink / raw)
To: Linus Torvalds
Cc: kernel-hardening@lists.openwall.com, Hannes Frederic Sowa,
David Laight, Netdev, Jean-Philippe Aumasson, LKML,
Linux Crypto Mailing List, Daniel J . Bernstein, Eric Biggers
On Thu, Dec 15, 2016 at 10:14 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> I think you can/should just use the natural alignment for "u64".
>
> For architectures that need 8-byte alignment, u64 will already be
> properly aligned. For architectures (like x86-32) that only need
> 4-byte alignment, you get it.
I should have added mention of that with my previous email. For the
parameters that are always a multiple of u64 -- namely, the key -- I
now do that in v5 of the patchset. So this is already done.
^ permalink raw reply
* Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function
From: Hannes Frederic Sowa @ 2016-12-15 21:17 UTC (permalink / raw)
To: Jason A. Donenfeld
Cc: David Laight, Netdev, kernel-hardening@lists.openwall.com,
Jean-Philippe Aumasson, LKML, Linux Crypto Mailing List,
Daniel J . Bernstein, Linus Torvalds, Eric Biggers
In-Reply-To: <CAHmME9rDCb=2rojJba13Uew9V9qAbxv1qcJGHwEAKoahxyE9QA@mail.gmail.com>
On 15.12.2016 21:43, Jason A. Donenfeld wrote:
> On Thu, Dec 15, 2016 at 9:31 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> ARM64 and x86-64 have memory operations that are not vector operations
>> that operate on 128 bit memory.
>
> Fair enough. imull I guess.
>
>> How do you know that the compiler for some architecture will not chose a
>> more optimized instruction to load a 64 bit memory value into two 32 bit
>> registers if you tell the compiler it is 8 byte aligned but it actually
>> isn't? I don't know the answer but telling the compiler some data is 8
>> byte aligned while it isn't really pretty much seems like a call for
>> trouble.
>
> If a compiler is in the business of using special 64-bit instructions
> on 64-bit aligned data, then it is also the job of the compiler to
> align structs to 64-bits when passed __aligned(8), which is what we've
> done in this code. If the compiler were to hypothetically choose to
> ignore that and internally convert it to a __aligned(4), then it would
> only be able to do so with the knowledge that it will never use 64-bit
> aligned data instructions. But so far as I can tell, gcc always
> respects __aligned(8), which is why I use it in this patchset.
>
> I think there might have been confusion here, because perhaps someone
> was hoping that since in6_addr is 128-bits, that the __aligned
> attribute would not be required and that the struct would just
> automatically be aligned to at least 8 bytes. But in fact, as I
> mentioned, in6_addr is actually composed of u32[4] and not u64[2], so
> it will only be aligned to 4 bytes, making the __aligned(8) necessary.
>
> I think for the purposes of this patchset, this is a solved problem.
> There's the unaligned version of the function if you don't know about
> the data, and there's the aligned version if you're using
> __aligned(SIPHASH_ALIGNMENT) on your data. Plain and simple.
And I was exactly questioning this.
static unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
const struct in6_addr *daddr)
{
net_get_random_once(&ip6_frags.rnd, sizeof(ip6_frags.rnd));
return jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr),
(__force u32)id, ip6_frags.rnd);
}
This function had a hash DoS (and kind of still has), but it has been
mitigated by explicit checks, I hope.
So you start looking for all the pointers where ipv6 addresses could
come from and find some globally defined struct where I would need to
put the aligned(SIPHASH_ALIGNMENT) into to make this work on 32 bit
code? Otherwise just the unaligned version is safe on 32 bit code.
Who knows this? It isn't even obvious by looking at the header!
I would be interested if the compiler can actually constant-fold the
address of the stack allocation with an simple if () or some
__builtin_constant_p fiddeling, so we don't have this constant review
overhead to which function we pass which data. This would also make
this whole discussion mood.
Bye,
Hannes
^ permalink raw reply
* [net-next PATCH v6 4/5] virtio_net: add XDP_TX support
From: John Fastabend @ 2016-12-15 20:14 UTC (permalink / raw)
To: mst
Cc: daniel, netdev, alexei.starovoitov, john.r.fastabend, brouer,
tgraf, davem
In-Reply-To: <20161215200712.23639.53043.stgit@john-Precision-Tower-5810>
This adds support for the XDP_TX action to virtio_net. When an XDP
program is run and returns the XDP_TX action the virtio_net XDP
implementation will transmit the packet on a TX queue that aligns
with the current CPU that the XDP packet was processed on.
Before sending the packet the header is zeroed. Also XDP is expected
to handle checksum correctly so no checksum offload support is
provided.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
drivers/net/virtio_net.c | 100 +++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 93 insertions(+), 7 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 992ec5f..1f8300b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -330,12 +330,58 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
return skb;
}
+static void virtnet_xdp_xmit(struct virtnet_info *vi,
+ struct receive_queue *rq,
+ struct send_queue *sq,
+ struct xdp_buff *xdp)
+{
+ struct page *page = virt_to_head_page(xdp->data);
+ struct virtio_net_hdr_mrg_rxbuf *hdr;
+ unsigned int num_sg, len;
+ void *xdp_sent;
+ int err;
+
+ /* Free up any pending old buffers before queueing new ones. */
+ while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+ struct page *sent_page = virt_to_head_page(xdp_sent);
+
+ if (vi->mergeable_rx_bufs)
+ put_page(sent_page);
+ else
+ give_pages(rq, sent_page);
+ }
+
+ /* Zero header and leave csum up to XDP layers */
+ hdr = xdp->data;
+ memset(hdr, 0, vi->hdr_len);
+
+ num_sg = 1;
+ sg_init_one(sq->sg, xdp->data, xdp->data_end - xdp->data);
+ err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
+ xdp->data, GFP_ATOMIC);
+ if (unlikely(err)) {
+ if (vi->mergeable_rx_bufs)
+ put_page(page);
+ else
+ give_pages(rq, page);
+ return; // On error abort to avoid unnecessary kick
+ } else if (!vi->mergeable_rx_bufs) {
+ /* If not mergeable bufs must be big packets so cleanup pages */
+ give_pages(rq, (struct page *)page->private);
+ page->private = 0;
+ }
+
+ virtqueue_kick(sq->vq);
+}
+
static u32 do_xdp_prog(struct virtnet_info *vi,
+ struct receive_queue *rq,
struct bpf_prog *xdp_prog,
struct page *page, int offset, int len)
{
int hdr_padded_len;
struct xdp_buff xdp;
+ unsigned int qp;
u32 act;
u8 *buf;
@@ -353,9 +399,15 @@ static u32 do_xdp_prog(struct virtnet_info *vi,
switch (act) {
case XDP_PASS:
return XDP_PASS;
+ case XDP_TX:
+ qp = vi->curr_queue_pairs -
+ vi->xdp_queue_pairs +
+ smp_processor_id();
+ xdp.data = buf + (vi->mergeable_rx_bufs ? 0 : 4);
+ virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp);
+ return XDP_TX;
default:
bpf_warn_invalid_xdp_action(act);
- case XDP_TX:
case XDP_ABORTED:
case XDP_DROP:
return XDP_DROP;
@@ -390,9 +442,17 @@ static struct sk_buff *receive_big(struct net_device *dev,
if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
goto err_xdp;
- act = do_xdp_prog(vi, xdp_prog, page, 0, len);
- if (act == XDP_DROP)
+ act = do_xdp_prog(vi, rq, xdp_prog, page, 0, len);
+ switch (act) {
+ case XDP_PASS:
+ break;
+ case XDP_TX:
+ rcu_read_unlock();
+ goto xdp_xmit;
+ case XDP_DROP:
+ default:
goto err_xdp;
+ }
}
rcu_read_unlock();
@@ -407,6 +467,7 @@ static struct sk_buff *receive_big(struct net_device *dev,
err:
dev->stats.rx_dropped++;
give_pages(rq, page);
+xdp_xmit:
return NULL;
}
@@ -425,6 +486,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
struct bpf_prog *xdp_prog;
unsigned int truesize;
+ head_skb = NULL;
+
rcu_read_lock();
xdp_prog = rcu_dereference(rq->xdp_prog);
if (xdp_prog) {
@@ -448,9 +511,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
goto err_xdp;
- act = do_xdp_prog(vi, xdp_prog, page, offset, len);
- if (act == XDP_DROP)
+ act = do_xdp_prog(vi, rq, xdp_prog, page, offset, len);
+ switch (act) {
+ case XDP_PASS:
+ break;
+ case XDP_TX:
+ rcu_read_unlock();
+ goto xdp_xmit;
+ case XDP_DROP:
+ default:
goto err_xdp;
+ }
}
rcu_read_unlock();
@@ -528,6 +599,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
err_buf:
dev->stats.rx_dropped++;
dev_kfree_skb(head_skb);
+xdp_xmit:
return NULL;
}
@@ -1713,6 +1785,16 @@ static void free_receive_page_frags(struct virtnet_info *vi)
put_page(vi->rq[i].alloc_frag.page);
}
+static bool is_xdp_queue(struct virtnet_info *vi, int q)
+{
+ if (q < (vi->curr_queue_pairs - vi->xdp_queue_pairs))
+ return false;
+ else if (q < vi->curr_queue_pairs)
+ return true;
+ else
+ return false;
+}
+
static void free_unused_bufs(struct virtnet_info *vi)
{
void *buf;
@@ -1720,8 +1802,12 @@ static void free_unused_bufs(struct virtnet_info *vi)
for (i = 0; i < vi->max_queue_pairs; i++) {
struct virtqueue *vq = vi->sq[i].vq;
- while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
- dev_kfree_skb(buf);
+ while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
+ if (!is_xdp_queue(vi, i))
+ dev_kfree_skb(buf);
+ else
+ put_page(virt_to_head_page(buf));
+ }
}
for (i = 0; i < vi->max_queue_pairs; i++) {
^ permalink raw reply related
* Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-15 21:25 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: David Laight, Netdev, kernel-hardening@lists.openwall.com,
Jean-Philippe Aumasson, LKML, Linux Crypto Mailing List,
Daniel J . Bernstein, Linus Torvalds, Eric Biggers
On Thu, Dec 15, 2016 at 10:17 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> And I was exactly questioning this.
>
> static unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
> const struct in6_addr *daddr)
> {
> net_get_random_once(&ip6_frags.rnd, sizeof(ip6_frags.rnd));
> return jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr),
> (__force u32)id, ip6_frags.rnd);
> }
For this example, the replacement is the function entitled siphash_4u32:
static unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
const struct in6_addr *daddr)
{
net_get_random_once(&ip6_frags.rnd, sizeof(ip6_frags.rnd));
return siphash_4u32(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr),
(__force u32)id, 0, ip6_frags.rnd);
}
And then you make ip6_frags.rnd be of type siphash_key_t. Then
everything is taken care of and works beautifully. Please see v5 of
this patchset.
> I would be interested if the compiler can actually constant-fold the
> address of the stack allocation with an simple if () or some
> __builtin_constant_p fiddeling, so we don't have this constant review
> overhead to which function we pass which data. This would also make
> this whole discussion moot.
I'll play with it to see if the compiler is capable of doing that.
Does anybody know off hand if it is or if there are other examples of
the compiler doing that?
In any case, for all current replacement of jhash_1word, jhash_2words,
jhash_3words, there's the siphash_2u32 or siphash_4u32 functions. This
covers the majority of cases.
For replacements of md5_transform, either the data is small and can
fit in siphash_Nu{32,64}, or it can be put into a struct explicitly
aligned on the stack.
For the remaining use of jhash_nwords, either siphash() can be used or
siphash_unaligned() can be used if the source is of unknown alignment.
Both functions have their alignment requirements (or lack thereof)
documented in a docbook comment.
I'll look into the constant folding to see if it actually works. If it
does, I'll use it. If not, I believe the current solution works.
How's that sound?
Jason
^ permalink raw reply
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock
From: Lino Sanfilippo @ 2016-12-15 21:32 UTC (permalink / raw)
To: Pavel Machek
Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro,
alexandre.torgue, davem, linux-kernel, netdev
In-Reply-To: <20161215210324.GA13878@amd>
On 15.12.2016 22:03, Pavel Machek wrote:
>
> I actually did experiment with adding locking there, too, and no, no
> luck. It seems stmmac_tx_err() is more broken than just locking.
>
Ah ok. Then maybe priv->hw->dma->stop_tx() does not do the job correctly (stop the
tx path properly) and the HW is still active on the tx path while the tx buffers are
freed. OTOH stmmac_release() also stops the phy before the tx (and rx) paths are stopped.
Did you try to stop the phy fist in stmmac_tx_err_work(), too?
Regards,
Lino
^ permalink raw reply
* [PATCH 0/2] GTP tunneling fixes for net
From: Pablo Neira Ayuso @ 2016-12-15 21:35 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, laforge
Hi David,
The following patchset contains two GTP tunneling fixes for your net
tree, they are:
1) Offset to IPv4 header in gtp_check_src_ms_ipv4() is incorrect, thus
this function always succeeds and therefore this defeats this sanity
check. This allows packets that have no PDP to go though, patch from
Lionel Gauthier.
2) According to Note 0 of Figure 2 in Section 6 of 3GPP TS 29.060 v13.5.0
Release 13, always set GTPv1 reserved bit to zero. This may cause
interoperability problems, patch from Harald Welte.
Please, apply, thanks a lot!
Harald Welte (1):
gtp: Fix initialization of Flags octet in GTPv1 header
Lionel Gauthier (1):
gtp: gtp_check_src_ms_ipv4() always return success
drivers/net/gtp.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--
2.1.4
^ permalink raw reply
* [PATCH 1/2] gtp: gtp_check_src_ms_ipv4() always return success
From: Pablo Neira Ayuso @ 2016-12-15 21:35 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, laforge
In-Reply-To: <1481837753-10317-1-git-send-email-pablo@netfilter.org>
From: Lionel Gauthier <Lionel.Gauthier@eurecom.fr>
gtp_check_src_ms_ipv4() did not find the PDP context matching with the
UE IP address because the memory location is not right, but the result
is inverted by the Boolean "not" operator. So whatever is the PDP
context, any call to this function is successful.
Signed-off-by: Lionel Gauthier <Lionel.Gauthier@eurecom.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
drivers/net/gtp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 98f10c216521..6031d499f2be 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -158,9 +158,9 @@ static bool gtp_check_src_ms_ipv4(struct sk_buff *skb, struct pdp_ctx *pctx,
if (!pskb_may_pull(skb, hdrlen + sizeof(struct iphdr)))
return false;
- iph = (struct iphdr *)(skb->data + hdrlen + sizeof(struct iphdr));
+ iph = (struct iphdr *)(skb->data + hdrlen);
- return iph->saddr != pctx->ms_addr_ip4.s_addr;
+ return iph->saddr == pctx->ms_addr_ip4.s_addr;
}
/* Check if the inner IP source address in this packet is assigned to any
--
2.1.4
^ permalink raw reply related
* [PATCH 2/2] gtp: Fix initialization of Flags octet in GTPv1 header
From: Pablo Neira Ayuso @ 2016-12-15 21:35 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, laforge
In-Reply-To: <1481837753-10317-1-git-send-email-pablo@netfilter.org>
From: Harald Welte <laforge@gnumonks.org>
When generating a GTPv1 header in gtp1_push_header(), initialize the
'reserved' bit to zero. All 3GPP specifications for GTPv1 from Release
99 through Release 13 agree that a transmitter shall set this bit to
zero, see e.g. Note 0 of Figure 2 in Section 6 of 3GPP TS 29.060 v13.5.0
Release 13, available from
http://www.etsi.org/deliver/etsi_ts/129000_129099/129060/13.05.00_60/ts_129060v130500p.pdf
Signed-off-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
drivers/net/gtp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 6031d499f2be..8b6810bad54b 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -423,11 +423,11 @@ static inline void gtp1_push_header(struct sk_buff *skb, struct pdp_ctx *pctx)
/* Bits 8 7 6 5 4 3 2 1
* +--+--+--+--+--+--+--+--+
- * |version |PT| 1| E| S|PN|
+ * |version |PT| 0| E| S|PN|
* +--+--+--+--+--+--+--+--+
* 0 0 1 1 1 0 0 0
*/
- gtp1->flags = 0x38; /* v1, GTP-non-prime. */
+ gtp1->flags = 0x30; /* v1, GTP-non-prime. */
gtp1->type = GTP_TPDU;
gtp1->length = htons(payload_len);
gtp1->tid = htonl(pctx->u.v1.o_tei);
--
2.1.4
^ permalink raw reply related
* Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function
From: Hannes Frederic Sowa @ 2016-12-15 21:45 UTC (permalink / raw)
To: Jason A. Donenfeld
Cc: David Laight, Netdev, kernel-hardening@lists.openwall.com,
Jean-Philippe Aumasson, LKML, Linux Crypto Mailing List,
Daniel J . Bernstein, Linus Torvalds, Eric Biggers
In-Reply-To: <CAHmME9p9cf1W3vhbu=YTRY1Xt=fmE1sVqY1XPt5iQwxfCfQUOA@mail.gmail.com>
On 15.12.2016 22:25, Jason A. Donenfeld wrote:
> On Thu, Dec 15, 2016 at 10:17 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> And I was exactly questioning this.
>>
>> static unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
>> const struct in6_addr *daddr)
>> {
>> net_get_random_once(&ip6_frags.rnd, sizeof(ip6_frags.rnd));
>> return jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr),
>> (__force u32)id, ip6_frags.rnd);
>> }
>
> For this example, the replacement is the function entitled siphash_4u32:
>
> static unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
> const struct in6_addr *daddr)
> {
> net_get_random_once(&ip6_frags.rnd, sizeof(ip6_frags.rnd));
> return siphash_4u32(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr),
> (__force u32)id, 0, ip6_frags.rnd);
> }
>
> And then you make ip6_frags.rnd be of type siphash_key_t. Then
> everything is taken care of and works beautifully. Please see v5 of
> this patchset.
Sorry to not be specific enough, the Hash-DoS is in ipv6_addr_hash.
Maybe it was a silly example to start with, sorry. But anyway, your
proposal wouldn't have prevented the hash DoS. I wanted to show how it
can be difficult to make sure that all pointers come from an appropriate
aligned memory region.
The idea would be to actually factor out the key in the data structure
and align it with __aligned(SIPHASH_ALIGNMENT), make sure the padding
bits are all equal zero to not cause any bugs and irregularities with
the corresponding equality function. This might need some serious review
when switching to siphash to actually make use of it and prevent
HashDoS. Or simply use the unaligned version always...
>> I would be interested if the compiler can actually constant-fold the
>> address of the stack allocation with an simple if () or some
>> __builtin_constant_p fiddeling, so we don't have this constant review
>> overhead to which function we pass which data. This would also make
>> this whole discussion moot.
>
> I'll play with it to see if the compiler is capable of doing that.
> Does anybody know off hand if it is or if there are other examples of
> the compiler doing that?
Not of the top of my head, but it should be easy to test.
> In any case, for all current replacement of jhash_1word, jhash_2words,
> jhash_3words, there's the siphash_2u32 or siphash_4u32 functions. This
> covers the majority of cases.
Agreed and this is also totally fine by me.
> For replacements of md5_transform, either the data is small and can
> fit in siphash_Nu{32,64}, or it can be put into a struct explicitly
> aligned on the stack.
> For the remaining use of jhash_nwords, either siphash() can be used or
> siphash_unaligned() can be used if the source is of unknown alignment.
> Both functions have their alignment requirements (or lack thereof)
> documented in a docbook comment.
I think the warning needs to be bigger, seriously. Most of the people
develop on 64 bit arch, where it will just work during testing and break
later on 32 bit. ;)
> I'll look into the constant folding to see if it actually works. If it
> does, I'll use it. If not, I believe the current solution works.
>
> How's that sound?
I am still very much concerned about the API.
By the way, if you target net-next, it is currently closed. So no need
to hurry.
Bye,
Hannes
^ permalink raw reply
* Re: [PATCH] net: sfc: use new api ethtool_{get|set}_link_ksettings
From: Philippe Reynes @ 2016-12-15 21:50 UTC (permalink / raw)
To: Jarod Wilson; +Cc: linux-net-drivers, ecree, bkenward, netdev, linux-kernel
In-Reply-To: <741a20e3-5b7a-bfe8-a7af-81bacd736e16@redhat.com>
Hi Jarod,
On 12/15/16, Jarod Wilson <jarod@redhat.com> wrote:
> On 2016-12-14 6:12 PM, Philippe Reynes wrote:
>> The ethtool api {get|set}_settings is deprecated.
>> We move this driver to new api {get|set}_link_ksettings.
>>
>> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
>> ---
>> drivers/net/ethernet/sfc/ethtool.c | 35 ++++++++++++-------
>> drivers/net/ethernet/sfc/mcdi_port.c | 60
>> ++++++++++++++++++++------------
>> drivers/net/ethernet/sfc/net_driver.h | 12 +++---
>> 3 files changed, 65 insertions(+), 42 deletions(-)
>
> What about drivers/net/ethernet/sfc/falcon/ethtool.c? Coming in a
> separate patch?
Yes, I send only one driver per patch.
I'll send another patch for this driver.
Philippe
^ permalink raw reply
* Re: [PATCH perf/core REBASE 2/5] samples/bpf: Switch over to libbpf
From: Joe Stringer @ 2016-12-15 22:00 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: LKML, netdev, Wang Nan, ast, Daniel Borkmann,
Arnaldo Carvalho de Melo
In-Reply-To: <20161215183440.GH6866@kernel.org>
On 15 December 2016 at 10:34, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> Em Thu, Dec 15, 2016 at 03:29:18PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Thu, Dec 15, 2016 at 12:50:22PM -0300, Arnaldo Carvalho de Melo escreveu:
>> > Em Wed, Dec 14, 2016 at 02:43:39PM -0800, Joe Stringer escreveu:
>> > > Now that libbpf under tools/lib/bpf/* is synced with the version from
>> > > samples/bpf, we can get rid most of the libbpf library here.
>> > >
>> > > Signed-off-by: Joe Stringer <joe@ovn.org>
>> > > Cc: Alexei Starovoitov <ast@fb.com>
>> > > Cc: Daniel Borkmann <daniel@iogearbox.net>
>> > > Cc: Wang Nan <wangnan0@huawei.com>
>> > > Link: http://lkml.kernel.org/r/20161209024620.31660-6-joe@ovn.org
>> > > [ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as noticed by Wang Nan ]
>>
>> So, the above comment no longer applied to this adjusted patch from you,
>> as you removed one hunk too much, that, after applied, gets samples/bpf/
>> to build successfully:
>>
>> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
>> index add514e2984a..81b0ef2f7994 100644
>> --- a/samples/bpf/Makefile
>> +++ b/samples/bpf/Makefile
>> @@ -107,6 +107,7 @@ always += lwt_len_hist_kern.o
>> always += xdp_tx_iptunnel_kern.o
>>
>> HOSTCFLAGS += -I$(objtree)/usr/include
>> +HOSTCFLAGS += -I$(srctree)/tools/lib/
>> HOSTCFLAGS += -I$(srctree)/tools/testing/selftests/bpf/
>>
>> HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
>>
>> ---------------------
>>
>> I added it, continuing...
>
> But then, when I tried to run offwaketime with it, it fails:
>
> [root@jouet bpf]# ./offwaketime ls
> bpf_load_program() err=22
> BPF_LDX uses reserved fields
> bpf_load_program() err=22
> BPF_LDX uses reserved fields
> [root@jouet bpf]#
>
> If I remove this patch and try again, it works:
>
> [root@jouet bpf]# ./offwaketime | head -4
> swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 46
> chrome;return_from_SYSCALL_64;do_syscall_64;exit_to_usermode_loop;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;do_syscall_64;return_from_SYSCALL_64;;Chrome_ChildIOT 1
> firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 3
> dockerd-current;entry_SYSCALL_64_fastpath;sys_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;futex_wake;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;dockerd-current 2
> [root@jouet bpf]#
>
>
> So, I'm stopping here so that I can push what I have to Ingo, then I'll get
> back to this, hopefully by then you beat me and I have just to retest 8-)
OK, thanks for the report. Looks like there was another difference
between the two libbpfs - one used total program size for its
load_program API; the actual kernel API uses instruction count. This
incremental should do the trick:
https://github.com/joestringer/linux/commit/6ff7726f20077bed66fb725f5189c13690154b6a
^ permalink raw reply
* Re: [PATCH net] ibmveth: calculate gso_segs for large packets
From: Jonathan Maxwell @ 2016-12-15 21:49 UTC (permalink / raw)
To: Thomas Falcon
Cc: netdev, Brian King, marcelo.leitner, pradeeps, zdai, Eric Dumazet
In-Reply-To: <1481674509-14256-1-git-send-email-tlfalcon@linux.vnet.ibm.com>
On Wed, Dec 14, 2016 at 11:15 AM, Thomas Falcon
<tlfalcon@linux.vnet.ibm.com> wrote:
> Include calculations to compute the number of segments
> that comprise an aggregated large packet.
>
> Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Reviewed-by: Jonathan Maxwell <jmaxwell37@gmail.com>
> ---
> drivers/net/ethernet/ibm/ibmveth.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> index fbece63..a831f94 100644
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> @@ -1181,7 +1181,9 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb,
>
> static void ibmveth_rx_mss_helper(struct sk_buff *skb, u16 mss, int lrg_pkt)
> {
> + struct tcphdr *tcph;
> int offset = 0;
> + int hdr_len;
>
> /* only TCP packets will be aggregated */
> if (skb->protocol == htons(ETH_P_IP)) {
> @@ -1208,14 +1210,20 @@ static void ibmveth_rx_mss_helper(struct sk_buff *skb, u16 mss, int lrg_pkt)
> /* if mss is not set through Large Packet bit/mss in rx buffer,
> * expect that the mss will be written to the tcp header checksum.
> */
> + tcph = (struct tcphdr *)(skb->data + offset);
> if (lrg_pkt) {
> skb_shinfo(skb)->gso_size = mss;
> } else if (offset) {
> - struct tcphdr *tcph = (struct tcphdr *)(skb->data + offset);
> -
> skb_shinfo(skb)->gso_size = ntohs(tcph->check);
> tcph->check = 0;
> }
> +
> + if (skb_shinfo(skb)->gso_size) {
> + hdr_len = offset + tcph->doff * 4;
> + skb_shinfo(skb)->gso_segs =
> + DIV_ROUND_UP(skb->len - hdr_len,
> + skb_shinfo(skb)->gso_size);
> + }
> }
>
> static int ibmveth_poll(struct napi_struct *napi, int budget)
> --
> 1.8.3.1
>
^ permalink raw reply
* Re: [PATCH v2 2/2] net: ethernet: stmmac: remove private tx queue lock
From: Pavel Machek @ 2016-12-15 22:03 UTC (permalink / raw)
To: Lino Sanfilippo
Cc: bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue,
romieu, davem, linux-kernel, netdev
In-Reply-To: <626fc3ef-ba18-ed99-aea1-2f737425b199@gmx.de>
[-- Attachment #1: Type: text/plain, Size: 3235 bytes --]
Hi!
> >> The driver uses a private lock for synchronization of the xmit function and
> >> the xmit completion handler, but since the NETIF_F_LLTX flag is not set,
> >> the xmit function is also called with the xmit_lock held.
> >>
> >> On the other hand the completion handler uses the reverse locking order by
> >> first taking the private lock and (in case that the tx queue had been
> >> stopped) then the xmit_lock.
> >>
> >> Improve the locking by removing the private lock and using only the
> >> xmit_lock for synchronization instead.
> >
> > Do you have stmmac hardware to test on?
> >
>
> Unfortunately not (I mentioned that the patch I send was only compile tested in
> the first version but I think I forgot to do so in the last
> version).
:-(.
> > I believe something is very wrong with the locking there. In
> > particular... scheduling the stmmac_tx_timer() function to run often
> > should not do anything bad if locking is correct... but it breaks the
> > driver rather quickly. [Example patch below, needs applying to two
> > places in net-next.]
> >
>
> Do you get this result only after the private lock is removed? Or has this problem
> been there before? And how exactly does the failure look like?
I believe I was getting very similar fun even with the private lock. I
re-applied the private lock, and the result is the same.
Also.. locking does seems to work. I added checks to see if the
stmmac_tx_clean() and stmmac_xmit() run at the same time, and they
don't seem to. So my best guess at the moment is missing cache flush
or mb() somewhere.
Failure looks like this:
root@wagabuibui:~# mount /dev/mmcblk0p4 /mnt
o 1000000 > /proc/sys/net/core/wmeroot@wagabuibui:~# chroot /mnt
/bin/bash
root@wagabuibui:/# mount /proc000 100 30
root@wagabuibui:/# #echo 1000000 > /proc/sys/net/core/wmem_default
root@wagabuibui:/# cd /data/tmp/udpt
root@wagabuibui:/data/tmp/udpt# ifconfig eth0 10.0.0.170 up
[ 18.358072] socfpga-dwmac ff702000.ethernet eth0: IEEE 1588-2008
Advanced Timestamp supported
[ 18.366836] socfpga-dwmac ff702000.ethernet eth0: registered PTP
clock
root@wagabuibui:/data/tmp/udpt# ./udp-test raw 10.0.0.6 1234 1000 100
30
Sending 100 packets (1000b each) at an interval of 30ms, expected data
rate:3333333b/s (3373333b/s incl udp overhead)
[ 20.453538] socfpga-dwmac ff702000.ethernet eth0: Link is Up -
100Mbps/Full - flow control rx/tx
[ 20.581826] Link is Up - 100/Full
Sending UDP packet took >10ms: 5205162us
This would lead to a lost frame!
Sending UDP packet took >10ms: 40010us
This would lead to a lost frame!
Sending UDP packet took >10ms: 6366084us
This would lead to a lost frame!
Sending UDP packet took >10ms: 36971us
This would lead to a lost frame!
[ 42.084940] ------------[ cut here ]------------
[ 42.089577] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316
dev_watchdog+0x254/0x26c
[ 42.097821] NETDEV WATCHDOG: eth0 (socfpga-dwmac): transmit queue 0
timed out
[ 42.104935] Modules linked in:
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply
* [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
From: Geoff Lansberry @ 2016-12-15 22:30 UTC (permalink / raw)
To: linux-wireless
Cc: lauro.venancio, aloisio.almeida, sameo, robh+dt, mark.rutland,
netdev, devicetree, linux-kernel, mgreer, justin, Geoff Lansberry
From: Geoff Lansberry <geoff@kuvee.com>
---
.../devicetree/bindings/net/nfc/trf7970a.txt | 3 ++
drivers/nfc/trf7970a.c | 42 ++++++++++++++++------
2 files changed, 34 insertions(+), 11 deletions(-)
diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
index 32b35a0..9dda879 100644
--- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
+++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
@@ -21,6 +21,8 @@ Optional SoC Specific Properties:
- t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum
where an extra byte is returned by Read Multiple Block commands issued
to Type 5 tags.
+- crystal_27mhz: Set to specify that the input frequency to the trf7970a is 27.12MHz
+
Example (for ARM-based BeagleBone with TRF7970A on SPI1):
@@ -43,6 +45,7 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
irq-status-read-quirk;
en2-rf-quirk;
t5t-rmb-extra-byte-quirk;
+ crystal_27mhz;
status = "okay";
};
};
diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
index 26c9dbb..2d2a077 100644
--- a/drivers/nfc/trf7970a.c
+++ b/drivers/nfc/trf7970a.c
@@ -1056,12 +1056,11 @@ static int trf7970a_init(struct trf7970a *trf)
trf->chip_status_ctrl &= ~TRF7970A_CHIP_STATUS_RF_ON;
- ret = trf7970a_write(trf, TRF7970A_MODULATOR_SYS_CLK_CTRL, 0);
+ ret = trf7970a_write(trf, TRF7970A_MODULATOR_SYS_CLK_CTRL,
+ trf->modulator_sys_clk_ctrl);
if (ret)
goto err_out;
- trf->modulator_sys_clk_ctrl = 0;
-
ret = trf7970a_write(trf, TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS,
TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS_WLH_96 |
TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS_WLL_32);
@@ -1181,27 +1180,37 @@ static int trf7970a_in_config_rf_tech(struct trf7970a *trf, int tech)
switch (tech) {
case NFC_DIGITAL_RF_TECH_106A:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443A_106;
- trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK;
+ trf->modulator_sys_clk_ctrl =
+ (trf->modulator_sys_clk_ctrl & 0xF8) |
+ TRF7970A_MODULATOR_DEPTH_OOK;
trf->guard_time = TRF7970A_GUARD_TIME_NFCA;
break;
case NFC_DIGITAL_RF_TECH_106B:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443B_106;
- trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
+ trf->modulator_sys_clk_ctrl =
+ (trf->modulator_sys_clk_ctrl & 0xF8) |
+ TRF7970A_MODULATOR_DEPTH_ASK10;
trf->guard_time = TRF7970A_GUARD_TIME_NFCB;
break;
case NFC_DIGITAL_RF_TECH_212F:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_FELICA_212;
- trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
+ trf->modulator_sys_clk_ctrl =
+ (trf->modulator_sys_clk_ctrl & 0xF8) |
+ TRF7970A_MODULATOR_DEPTH_ASK10;
trf->guard_time = TRF7970A_GUARD_TIME_NFCF;
break;
case NFC_DIGITAL_RF_TECH_424F:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_FELICA_424;
- trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
+ trf->modulator_sys_clk_ctrl =
+ (trf->modulator_sys_clk_ctrl & 0xF8) |
+ TRF7970A_MODULATOR_DEPTH_ASK10;
trf->guard_time = TRF7970A_GUARD_TIME_NFCF;
break;
case NFC_DIGITAL_RF_TECH_ISO15693:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_15693_SGL_1OF4_2648;
- trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK;
+ trf->modulator_sys_clk_ctrl =
+ (trf->modulator_sys_clk_ctrl & 0xF8) |
+ TRF7970A_MODULATOR_DEPTH_OOK;
trf->guard_time = TRF7970A_GUARD_TIME_15693;
break;
default:
@@ -1571,17 +1580,23 @@ static int trf7970a_tg_config_rf_tech(struct trf7970a *trf, int tech)
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_NFC_NFC_CE_MODE |
TRF7970A_ISO_CTRL_NFC_CE |
TRF7970A_ISO_CTRL_NFC_CE_14443A;
- trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK;
+ trf->modulator_sys_clk_ctrl =
+ (trf->modulator_sys_clk_ctrl & 0xF8) |
+ TRF7970A_MODULATOR_DEPTH_OOK;
break;
case NFC_DIGITAL_RF_TECH_212F:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_NFC_NFC_CE_MODE |
TRF7970A_ISO_CTRL_NFC_NFCF_212;
- trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
+ trf->modulator_sys_clk_ctrl =
+ (trf->modulator_sys_clk_ctrl & 0xF8) |
+ TRF7970A_MODULATOR_DEPTH_ASK10;
break;
case NFC_DIGITAL_RF_TECH_424F:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_NFC_NFC_CE_MODE |
TRF7970A_ISO_CTRL_NFC_NFCF_424;
- trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
+ trf->modulator_sys_clk_ctrl =
+ (trf->modulator_sys_clk_ctrl & 0xF8) |
+ TRF7970A_MODULATOR_DEPTH_ASK10;
break;
default:
dev_dbg(trf->dev, "Unsupported rf technology: %d\n", tech);
@@ -2043,6 +2058,11 @@ static int trf7970a_probe(struct spi_device *spi)
return ret;
}
+ if (of_property_read_bool(np, "crystal_27mhz")) {
+ trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_27MHZ;
+ dev_dbg(trf->dev, "trf7970a configure crystal_27mhz\n");
+ }
+
if (of_property_read_bool(np, "en2-rf-quirk"))
trf->quirks |= TRF7970A_QUIRK_EN2_MUST_STAY_LOW;
--
Signed-off-by: Geoff Lansberry <geoff@kuvee.com>
^ permalink raw reply related
* [PATCH 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage
From: Geoff Lansberry @ 2016-12-15 22:30 UTC (permalink / raw)
To: linux-wireless-u79uwXL29TY76Z2rM5mHXA
Cc: lauro.venancio-430g2QfJUUCGglJvpFV4uA,
aloisio.almeida-430g2QfJUUCGglJvpFV4uA,
sameo-VuQAYsv1563Yd54FQh9/CA, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
mark.rutland-5wv7dgnIgG8, netdev-u79uwXL29TY76Z2rM5mHXA,
devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
mgreer-luAo+O/VEmrlveNOaEYElw, justin-R+k406RtEhcAvxtiuMwx3w,
Geoff Lansberry
In-Reply-To: <1481841044-4314-1-git-send-email-glansberry-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
From: Geoff Lansberry <geoff-R+k406RtEhcAvxtiuMwx3w@public.gmane.org>
---
Documentation/devicetree/bindings/net/nfc/trf7970a.txt | 2 ++
drivers/nfc/trf7970a.c | 13 ++++++++++++-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
index 9dda879..208f045 100644
--- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
+++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
@@ -21,6 +21,7 @@ Optional SoC Specific Properties:
- t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum
where an extra byte is returned by Read Multiple Block commands issued
to Type 5 tags.
+- vdd_io_1v8: Set to specify that the trf7970a io voltage should be set to 1.8V
- crystal_27mhz: Set to specify that the input frequency to the trf7970a is 27.12MHz
@@ -45,6 +46,7 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
irq-status-read-quirk;
en2-rf-quirk;
t5t-rmb-extra-byte-quirk;
+ vdd_io_1v8;
crystal_27mhz;
status = "okay";
};
diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
index 2d2a077..b4c37ab 100644
--- a/drivers/nfc/trf7970a.c
+++ b/drivers/nfc/trf7970a.c
@@ -441,6 +441,7 @@ struct trf7970a {
u8 iso_ctrl_tech;
u8 modulator_sys_clk_ctrl;
u8 special_fcn_reg1;
+ u8 io_ctrl;
unsigned int guard_time;
int technology;
int framing;
@@ -1048,6 +1049,11 @@ static int trf7970a_init(struct trf7970a *trf)
if (ret)
goto err_out;
+ ret = trf7970a_write(trf, TRF7970A_REG_IO_CTRL,
+ trf->io_ctrl|TRF7970A_REG_IO_CTRL_VRS(0x1));
+ if (ret)
+ goto err_out;
+
ret = trf7970a_write(trf, TRF7970A_NFC_TARGET_LEVEL, 0);
if (ret)
goto err_out;
@@ -1764,7 +1770,7 @@ static int _trf7970a_tg_listen(struct nfc_digital_dev *ddev, u16 timeout,
goto out_err;
ret = trf7970a_write(trf, TRF7970A_REG_IO_CTRL,
- TRF7970A_REG_IO_CTRL_VRS(0x1));
+ trf->io_ctrl|TRF7970A_REG_IO_CTRL_VRS(0x1));
if (ret)
goto out_err;
@@ -2058,6 +2064,11 @@ static int trf7970a_probe(struct spi_device *spi)
return ret;
}
+ if (of_property_read_bool(np, "vdd_io_1v8")) {
+ trf->io_ctrl = TRF7970A_REG_IO_CTRL_IO_LOW;
+ dev_dbg(trf->dev, "trf7970a config vdd_io_1v8\n");
+ }
+
if (of_property_read_bool(np, "crystal_27mhz")) {
trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_27MHZ;
dev_dbg(trf->dev, "trf7970a configure crystal_27mhz\n");
--
Signed-off-by: Geoff Lansberry <geoff-R+k406RtEhcAvxtiuMwx3w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 3/3] nfc: trf7970a: Prevent repeated polling from crashing the kernel
From: Geoff Lansberry @ 2016-12-15 22:30 UTC (permalink / raw)
To: linux-wireless
Cc: lauro.venancio, aloisio.almeida, sameo, robh+dt, mark.rutland,
netdev, devicetree, linux-kernel, mgreer, justin, Jaret Cantu,
Geoff Lansberry
In-Reply-To: <1481841044-4314-1-git-send-email-glansberry@gmail.com>
From: Jaret Cantu <jaret.cantu@timesys.com>
Repeated polling attempts cause a NULL dereference error to occur.
This is because the curent state of the trf7970a is reading but
a request has been made to send a command.
The solution is to properly kill the waiting reading (workqueue)
before failing on the send.
---
drivers/nfc/trf7970a.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
index b4c37ab..f96a321 100644
--- a/drivers/nfc/trf7970a.c
+++ b/drivers/nfc/trf7970a.c
@@ -1493,6 +1493,10 @@ static int trf7970a_send_cmd(struct nfc_digital_dev *ddev,
(trf->state != TRF7970A_ST_IDLE_RX_BLOCKED)) {
dev_err(trf->dev, "%s - Bogus state: %d\n", __func__,
trf->state);
+ if (trf->state == TRF7970A_ST_WAIT_FOR_RX_DATA ||
+ trf->state == TRF7970A_ST_WAIT_FOR_RX_DATA_CONT)
+ trf->ignore_timeout =
+ !cancel_delayed_work(&trf->timeout_work);
ret = -EIO;
goto out_err;
}
--
Signed-off-by: Geoff Lansberry <geoff@kuvee.com>
^ permalink raw reply related
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock
From: Lino Sanfilippo @ 2016-12-15 22:33 UTC (permalink / raw)
To: Pavel Machek
Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro,
alexandre.torgue, davem, linux-kernel, netdev
In-Reply-To: <f3fc8035-fb6e-fb06-d9b8-bfc96dc4e1d2@gmx.de>
[-- Attachment #1: Type: text/plain, Size: 556 bytes --]
On 15.12.2016 22:32, Lino Sanfilippo wrote:
> Ah ok. Then maybe priv->hw->dma->stop_tx() does not do the job correctly (stop the
> tx path properly) and the HW is still active on the tx path while the tx buffers are
> freed. OTOH stmmac_release() also stops the phy before the tx (and rx) paths are stopped.
> Did you try to stop the phy fist in stmmac_tx_err_work(), too?
>
> Regards,
> Lino
>
And this is the "sledgehammer" approach: Do a complete shutdown and restart
of the hardware in case of tx error (against net-next and only compile tested).
[-- Attachment #2: 0001-Sledgehammer.patch --]
[-- Type: text/x-patch, Size: 4140 bytes --]
>From 0eda87ce6cbc2fb6d25653f30121f30f89332199 Mon Sep 17 00:00:00 2001
From: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Date: Thu, 15 Dec 2016 23:18:15 +0100
Subject: [PATCH] Sledgehammer
---
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 1 +
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 70 ++++++++++-------------
2 files changed, 31 insertions(+), 40 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index eab04ae..9c240d7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -131,6 +131,7 @@ struct stmmac_priv {
u32 rx_tail_addr;
u32 tx_tail_addr;
u32 mss;
+ struct work_struct tx_err_work;
#ifdef CONFIG_DEBUG_FS
struct dentry *dbgfs_dir;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 3e40578..7546574 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1403,37 +1403,6 @@ static inline void stmmac_disable_dma_irq(struct stmmac_priv *priv)
}
/**
- * stmmac_tx_err - to manage the tx error
- * @priv: driver private structure
- * Description: it cleans the descriptors and restarts the transmission
- * in case of transmission errors.
- */
-static void stmmac_tx_err(struct stmmac_priv *priv)
-{
- int i;
- netif_stop_queue(priv->dev);
-
- priv->hw->dma->stop_tx(priv->ioaddr);
- dma_free_tx_skbufs(priv);
- for (i = 0; i < DMA_TX_SIZE; i++)
- if (priv->extend_desc)
- priv->hw->desc->init_tx_desc(&priv->dma_etx[i].basic,
- priv->mode,
- (i == DMA_TX_SIZE - 1));
- else
- priv->hw->desc->init_tx_desc(&priv->dma_tx[i],
- priv->mode,
- (i == DMA_TX_SIZE - 1));
- priv->dirty_tx = 0;
- priv->cur_tx = 0;
- netdev_reset_queue(priv->dev);
- priv->hw->dma->start_tx(priv->ioaddr);
-
- priv->dev->stats.tx_errors++;
- netif_wake_queue(priv->dev);
-}
-
-/**
* stmmac_dma_interrupt - DMA ISR
* @priv: driver private structure
* Description: this is the DMA ISR. It is called by the main ISR.
@@ -1466,7 +1435,7 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv)
priv->xstats.threshold = tc;
}
} else if (unlikely(status == tx_hard_error))
- stmmac_tx_err(priv);
+ schedule_work(&priv->tx_err_work);
}
/**
@@ -1870,13 +1839,7 @@ static int stmmac_open(struct net_device *dev)
return ret;
}
-/**
- * stmmac_release - close entry point of the driver
- * @dev : device pointer.
- * Description:
- * This is the stop entry point of the driver.
- */
-static int stmmac_release(struct net_device *dev)
+static int stmmac_do_release(struct net_device *dev)
{
struct stmmac_priv *priv = netdev_priv(dev);
@@ -1919,10 +1882,36 @@ static int stmmac_release(struct net_device *dev)
#endif
stmmac_release_ptp(priv);
+}
+
+/**
+ * stmmac_release - close entry point of the driver
+ * @dev : device pointer.
+ * Description:
+ * This is the stop entry point of the driver.
+ */
+static int stmmac_release(struct net_device *dev)
+{
+ struct stmmac_priv *priv = netdev_priv(dev);
+
+ cancel_work_sync(&priv->tx_err_work);
+
+ stmmac_do_release(dev);
return 0;
}
+static void stmmac_tx_err_work(struct work_struct *work)
+{
+ struct stmmac_priv *priv = container_of(work, struct stmmac_priv,
+ tx_err_work);
+ /* restart netdev */
+ rtnl_lock();
+ stmmac_release(priv->dev);
+ stmmac_open(priv->dev);
+ rtnl_unlock();
+}
+
/**
* stmmac_tso_allocator - close entry point of the driver
* @priv: driver private structure
@@ -2688,7 +2677,7 @@ static void stmmac_tx_timeout(struct net_device *dev)
struct stmmac_priv *priv = netdev_priv(dev);
/* Clear Tx resources and restart transmitting again */
- stmmac_tx_err(priv);
+ schedule_work(&priv->tx_err_work);
}
/**
@@ -3338,6 +3327,7 @@ int stmmac_dvr_probe(struct device *device,
netif_napi_add(ndev, &priv->napi, stmmac_poll, 64);
spin_lock_init(&priv->lock);
+ INIT_WORK(&priv->tx_err_work, stmmac_tx_err_work);
ret = register_netdev(ndev);
if (ret) {
--
1.9.1
^ permalink raw reply related
* Re: Soft lockup in inet_put_port on 4.6
From: Tom Herbert @ 2016-12-15 22:39 UTC (permalink / raw)
To: Josef Bacik
Cc: Craig Gallek, Hannes Frederic Sowa, Eric Dumazet,
Linux Kernel Network Developers
In-Reply-To: <1481828016.24490.5@smtp.office365.com>
On Thu, Dec 15, 2016 at 10:53 AM, Josef Bacik <jbacik@fb.com> wrote:
> On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert <tom@herbertland.com> wrote:
>>
>> On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek <kraigatgoog@gmail.com>
>> wrote:
>>>
>>> On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert <tom@herbertland.com>
>>> wrote:
>>>>
>>>> I think there may be some suspicious code in inet_csk_get_port. At
>>>> tb_found there is:
>>>>
>>>> if (((tb->fastreuse > 0 && reuse) ||
>>>> (tb->fastreuseport > 0 &&
>>>> !rcu_access_pointer(sk->sk_reuseport_cb) &&
>>>> sk->sk_reuseport && uid_eq(tb->fastuid, uid))) &&
>>>> smallest_size == -1)
>>>> goto success;
>>>> if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb,
>>>> true)) {
>>>> if ((reuse ||
>>>> (tb->fastreuseport > 0 &&
>>>> sk->sk_reuseport &&
>>>> !rcu_access_pointer(sk->sk_reuseport_cb)
>>>> &&
>>>> uid_eq(tb->fastuid, uid))) &&
>>>> smallest_size != -1 && --attempts >= 0) {
>>>> spin_unlock_bh(&head->lock);
>>>> goto again;
>>>> }
>>>> goto fail_unlock;
>>>> }
>>>>
>>>> AFAICT there is redundancy in these two conditionals. The same clause
>>>> is being checked in both: (tb->fastreuseport > 0 &&
>>>> !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport &&
>>>> uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the
>>>> first conditional should be hit, goto done, and the second will never
>>>> evaluate that part to true-- unless the sk is changed (do we need
>>>> READ_ONCE for sk->sk_reuseport_cb?).
>>>
>>> That's an interesting point... It looks like this function also
>>> changed in 4.6 from using a single local_bh_disable() at the beginning
>>> with several spin_lock(&head->lock) to exclusively
>>> spin_lock_bh(&head->lock) at each locking point. Perhaps the full bh
>>> disable variant was preventing the timers in your stack trace from
>>> running interleaved with this function before?
>>
>>
>> Could be, although dropping the lock shouldn't be able to affect the
>> search state. TBH, I'm a little lost in reading function, the
>> SO_REUSEPORT handling is pretty complicated. For instance,
>> rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that
>> function and also in every call to inet_csk_bind_conflict. I wonder if
>> we can simply this under the assumption that SO_REUSEPORT is only
>> allowed if the port number (snum) is explicitly specified.
>
>
> Ok first I have data for you Hannes, here's the time distributions before
> during and after the lockup (with all the debugging in place the box
> eventually recovers). I've attached it as a text file since it is long.
>
> Second is I was thinking about why we would spend so much time doing the
> ->owners list, and obviously it's because of the massive amount of timewait
> sockets on the owners list. I wrote the following dumb patch and tested it
> and the problem has disappeared completely. Now I don't know if this is
> right at all, but I thought it was weird we weren't copying the soreuseport
> option from the original socket onto the twsk. Is there are reason we
> aren't doing this currently? Does this help explain what is happening?
> Thanks,
>
I think that would explain it. We would be walking long lists of TW
sockets in inet_bind_bucket_for_each(tb, &head->chain). This should
break, although now I'm wondering if there's other ways we can get
into this situation. reuseport ensures that we can have long lists of
sockets in a single bucket, TW sockets can make that list really long.
Tom
> Josef
^ permalink raw reply
* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: George Spelvin @ 2016-12-15 22:42 UTC (permalink / raw)
To: ak, davem, David.Laight, ebiggers3, hannes, Jason,
kernel-hardening, linux-crypto, linux-kernel, linux, luto, netdev,
tom, torvalds, tytso, vegard.nossum
Cc: djb, jeanphilippe.aumasson
In-Reply-To: <20161215203003.31989-2-Jason@zx2c4.com>
> While SipHash is extremely fast for a cryptographically secure function,
> it is likely a tiny bit slower than the insecure jhash, and so replacements
> will be evaluated on a case-by-case basis based on whether or not the
> difference in speed is negligible and whether or not the current jhash usage
> poses a real security risk.
To quantify that, jhash is 27 instructions per 12 bytes of input, with a
dependency path length of 13 instructions. (24/12 in __jash_mix, plus
3/1 for adding the input to the state.) The final add + __jhash_final
is 24 instructions with a path length of 15, which is close enough for
this handwaving. Call it 18n instructions and 8n cycles for 8n bytes.
SipHash (on a 64-bit machine) is 14 instructions with a dependency path
length of 4 *per round*. Two rounds per 8 bytes, plus plus two adds
and one cycle per input word, plus four rounds to finish makes 30n+46
instructions and 9n+16 cycles for 8n bytes.
So *if* you have a 64-bit 4-way superscalar machine, it's not that much
slower once it gets going, but the four-round finalization is quite
noticeable for short inputs.
For typical kernel input lengths "within a factor of 2" is
probably more accurate than "a tiny bit".
You lose a factor of 2 if you machine is 2-way or non-superscalar,
and a second factor of 2 if it's a 32-bit machine.
I mention this because there are a lot of home routers and other netwoek
appliances running Linux on 32-bit ARM and MIPS processors. For those,
it's a factor of *eight*, which is a lot more than "a tiny bit".
The real killer is if you don't have enough registers; SipHash performs
horribly on i386 because it uses more state than i386 has registers.
(If i386 performance is desired, you might ask Jean-Philippe for some
rotate constants for a 32-bit variant with 64 bits of key. Note that
SipHash's security proof requires that key length + input length is
strictly less than the state size, so for a 4x32-bit variant, while
you could stretch the key length a little, you'd have a hard limit at
95 bits.)
A second point, the final XOR in SipHash is either a (very minor) design
mistake, or an opportunity for optimization, depending on how you look
at it. Look at the end of the function:
>+ SIPROUND;
>+ SIPROUND;
>+ return (v0 ^ v1) ^ (v2 ^ v3);
Expanding that out, you get:
+ v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32);
+ v2 += v3; v3 = rol64(v3, 16); v3 ^= v2;
+ v0 += v3; v3 = rol64(v3, 21); v3 ^= v0;
+ v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32);
+ return v0 ^ v1 ^ v2 ^ v3;
Since the final XOR includes both v0 and v3, it's undoing the "v3 ^= v0"
two lines earlier, so the value of v0 doesn't matter after its XOR into
v1 on line one.
The final SIPROUND and return can then be optimized to
+ v0 += v1; v1 = rol64(v1, 13); v1 ^= v0;
+ v2 += v3; v3 = rol64(v3, 16); v3 ^= v2;
+ v3 = rol64(v3, 21);
+ v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32);
+ return v1 ^ v2 ^ v3;
A 32-bit implementation could further tweak the 4 instructions of
v1 ^= v2; v2 = rol64(v2, 32); v1 ^= v2;
gcc 6.2.1 -O3 compiles it to basically:
v1.low ^= v2.low;
v1.high ^= v2.high;
v1.low ^= v2.high;
v1.high ^= v2.low;
but it could be written as:
v2.low ^= v2.high;
v1.low ^= v2.low;
v1.high ^= v2.low;
Alternatively, if it's for private use only (key not shared with other
systems), a slightly stronger variant would "return v1 ^ v3;".
(The final swap of v2 is dead code, but a compiler can spot that easily.)
^ permalink raw reply
* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: Jean-Philippe Aumasson @ 2016-12-15 23:00 UTC (permalink / raw)
To: George Spelvin, ak, davem, David.Laight, ebiggers3, hannes, Jason,
kernel-hardening, linux-crypto, linux-kernel, luto, netdev, tom,
torvalds, tytso, vegard.nossum
Cc: djb
In-Reply-To: <20161215224224.21447.qmail@ns.sciencehorizons.net>
[-- Attachment #1: Type: text/plain, Size: 4109 bytes --]
If a halved version of SipHash can bring significant performance boost
(with 32b words instead of 64b words) with an acceptable security level
(64-bit enough?) then we may design such a version.
Regarding output size, are 64 bits sufficient?
On Thu, 15 Dec 2016 at 23:42, George Spelvin <linux@sciencehorizons.net>
wrote:
> > While SipHash is extremely fast for a cryptographically secure function,
> > it is likely a tiny bit slower than the insecure jhash, and so
> replacements
> > will be evaluated on a case-by-case basis based on whether or not the
> > difference in speed is negligible and whether or not the current jhash
> usage
> > poses a real security risk.
>
> To quantify that, jhash is 27 instructions per 12 bytes of input, with a
> dependency path length of 13 instructions. (24/12 in __jash_mix, plus
> 3/1 for adding the input to the state.) The final add + __jhash_final
> is 24 instructions with a path length of 15, which is close enough for
> this handwaving. Call it 18n instructions and 8n cycles for 8n bytes.
>
> SipHash (on a 64-bit machine) is 14 instructions with a dependency path
> length of 4 *per round*. Two rounds per 8 bytes, plus plus two adds
> and one cycle per input word, plus four rounds to finish makes 30n+46
> instructions and 9n+16 cycles for 8n bytes.
>
> So *if* you have a 64-bit 4-way superscalar machine, it's not that much
> slower once it gets going, but the four-round finalization is quite
> noticeable for short inputs.
>
> For typical kernel input lengths "within a factor of 2" is
> probably more accurate than "a tiny bit".
>
> You lose a factor of 2 if you machine is 2-way or non-superscalar,
> and a second factor of 2 if it's a 32-bit machine.
>
> I mention this because there are a lot of home routers and other netwoek
> appliances running Linux on 32-bit ARM and MIPS processors. For those,
> it's a factor of *eight*, which is a lot more than "a tiny bit".
>
> The real killer is if you don't have enough registers; SipHash performs
> horribly on i386 because it uses more state than i386 has registers.
>
> (If i386 performance is desired, you might ask Jean-Philippe for some
> rotate constants for a 32-bit variant with 64 bits of key. Note that
> SipHash's security proof requires that key length + input length is
> strictly less than the state size, so for a 4x32-bit variant, while
> you could stretch the key length a little, you'd have a hard limit at
> 95 bits.)
>
>
> A second point, the final XOR in SipHash is either a (very minor) design
> mistake, or an opportunity for optimization, depending on how you look
> at it. Look at the end of the function:
>
> >+ SIPROUND;
> >+ SIPROUND;
> >+ return (v0 ^ v1) ^ (v2 ^ v3);
>
> Expanding that out, you get:
> + v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32);
> + v2 += v3; v3 = rol64(v3, 16); v3 ^= v2;
> + v0 += v3; v3 = rol64(v3, 21); v3 ^= v0;
> + v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32);
> + return v0 ^ v1 ^ v2 ^ v3;
>
> Since the final XOR includes both v0 and v3, it's undoing the "v3 ^= v0"
> two lines earlier, so the value of v0 doesn't matter after its XOR into
> v1 on line one.
>
> The final SIPROUND and return can then be optimized to
>
> + v0 += v1; v1 = rol64(v1, 13); v1 ^= v0;
> + v2 += v3; v3 = rol64(v3, 16); v3 ^= v2;
> + v3 = rol64(v3, 21);
> + v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32);
> + return v1 ^ v2 ^ v3;
>
> A 32-bit implementation could further tweak the 4 instructions of
> v1 ^= v2; v2 = rol64(v2, 32); v1 ^= v2;
>
> gcc 6.2.1 -O3 compiles it to basically:
> v1.low ^= v2.low;
> v1.high ^= v2.high;
> v1.low ^= v2.high;
> v1.high ^= v2.low;
> but it could be written as:
> v2.low ^= v2.high;
> v1.low ^= v2.low;
> v1.high ^= v2.low;
>
> Alternatively, if it's for private use only (key not shared with other
> systems), a slightly stronger variant would "return v1 ^ v3;".
> (The final swap of v2 is dead code, but a compiler can spot that easily.)
>
[-- Attachment #2: Type: text/html, Size: 6336 bytes --]
^ permalink raw reply
* [PULL] virtio, vhost: new device, fixes, speedups
From: Michael S. Tsirkin @ 2016-12-15 23:05 UTC (permalink / raw)
To: Linus Torvalds
Cc: mark.rutland, kvm, mst, virtualization, hch, stefan, krzk, felipe,
tklauser, liuyuan, arend.vanspriel, marcel, mkl, kvalo,
omarapazanadi, gregkh, linux-kernel, stable, netdev, torvalds
The following changes since commit a57cb1c1d7974c62a5c80f7869e35b492ace12cd:
Merge branch 'akpm' (patches from Andrew) (2016-12-14 17:25:18 -0800)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to 6bdf1e0efb04a1716373646cb6f35b73addca492:
Makefile: drop -D__CHECK_ENDIAN__ from cflags (2016-12-16 00:13:43 +0200)
----------------------------------------------------------------
virtio, vhost: new device, fixes, speedups
This includes the new virtio crypto device, and fixes all over the
place. In particular enabling endian-ness checks for sparse builds
found some bugs which this fixes. And it appears that everyone is in
agreement that disabling endian-ness sparse checks shouldn't be
necessary any longer.
So this enables them for everyone, and drops __CHECK_ENDIAN__
and __bitwise__ APIs.
IRQ handling in virtio has been refactored somewhat, the
larger switch to IRQ_SHARED will have to wait as
it proved too aggressive.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
----------------------------------------------------------------
Christoph Hellwig (4):
virtio_pci: use pci_alloc_irq_vectors
virtio_pci: remove the call to vp_free_vectors in vp_request_msix_vectors
virtio_pci: merge vp_free_vectors into vp_del_vqs
virtio_pci: split vp_try_to_find_vqs into INTx and MSI-X variants
Felipe Franciosi (1):
virtio_ring: fix description of virtqueue_get_buf
Gao feng (1):
vsock: lookup and setup guest_cid inside vhost_vsock_lock
Gonglei (3):
virtio_pci_modern: fix complaint by sparse
virtio_ring: fix complaint by sparse
crypto: add virtio-crypto driver
Jason Wang (2):
vhost: cache used event for better performance
vhost: remove unused feature bit
Mark Rutland (3):
tools/virtio: fix READ_ONCE()
vringh: kill off ACCESS_ONCE()
tools/virtio: use {READ,WRITE}_ONCE() in uaccess.h
Michael S. Tsirkin (18):
virtio_console: drop unused config fields
drm/virtio: fix endianness in primary_plane_update
drm/virtio: fix lock context imbalance
drm/virtio: annotate virtio_gpu_queue_ctrl_buffer_locked
vhost: make interval tree static inline
vhost: add missing __user annotations
vsock/virtio: add a missing __le annotation
vsock/virtio: mark an internal function static
vsock/virtio: fix src/dst cid format
virtio: clean up handling of request_irq failure
linux/types.h: enable endian checks for all sparse builds
tools: enable endian checks for all sparse builds
Documentation/sparse: drop __bitwise__
checkpatch: replace __bitwise__ with __bitwise
linux: drop __bitwise__ everywhere
Documentation/sparse: drop __CHECK_ENDIAN__
fs/logfs: drop __CHECK_ENDIAN__
Makefile: drop -D__CHECK_ENDIAN__ from cflags
Tobias Klauser (1):
vhost/scsi: Remove unused but set variable
Yuan Liu (1):
virtio_mmio: Set dev.release() to avoid warning
Documentation/translations/zh_CN/sparse.txt | 7 +-
arch/arm/plat-samsung/include/plat/gpio-cfg.h | 2 +-
drivers/crypto/virtio/virtio_crypto_common.h | 128 +++++
drivers/md/dm-cache-block-types.h | 6 +-
drivers/net/ethernet/sun/sunhme.h | 2 +-
drivers/net/wireless/intel/iwlwifi/iwl-fw-file.h | 4 +-
drivers/vhost/vhost.h | 3 +
drivers/virtio/virtio_pci_common.h | 1 -
fs/logfs/logfs.h | 4 +-
include/linux/mmzone.h | 2 +-
include/linux/serial_core.h | 4 +-
include/linux/types.h | 4 +-
include/scsi/iscsi_proto.h | 2 +-
include/target/target_core_base.h | 2 +-
include/uapi/linux/types.h | 4 -
include/uapi/linux/vhost.h | 2 -
include/uapi/linux/virtio_crypto.h | 450 +++++++++++++++++
include/uapi/linux/virtio_ids.h | 1 +
include/uapi/linux/virtio_types.h | 6 +-
net/ieee802154/6lowpan/6lowpan_i.h | 2 +-
net/mac80211/ieee80211_i.h | 4 +-
tools/include/linux/types.h | 4 -
tools/virtio/linux/compiler.h | 2 +-
tools/virtio/linux/uaccess.h | 9 +-
drivers/char/virtio_console.c | 14 +-
drivers/crypto/virtio/virtio_crypto_algs.c | 540 +++++++++++++++++++++
drivers/crypto/virtio/virtio_crypto_core.c | 476 ++++++++++++++++++
drivers/crypto/virtio/virtio_crypto_mgr.c | 264 ++++++++++
drivers/gpu/drm/virtio/virtgpu_plane.c | 4 +-
drivers/gpu/drm/virtio/virtgpu_vq.c | 6 +-
drivers/vhost/scsi.c | 2 -
drivers/vhost/vhost.c | 40 +-
drivers/vhost/vringh.c | 5 +-
drivers/vhost/vsock.c | 25 +-
drivers/virtio/virtio_mmio.c | 2 +
drivers/virtio/virtio_pci_common.c | 201 ++++----
drivers/virtio/virtio_pci_modern.c | 8 +-
drivers/virtio/virtio_ring.c | 6 +-
net/vmw_vsock/virtio_transport.c | 2 +-
net/vmw_vsock/virtio_transport_common.c | 17 +-
Documentation/dev-tools/sparse.rst | 14 +-
MAINTAINERS | 9 +
drivers/bluetooth/Makefile | 2 -
drivers/crypto/Kconfig | 2 +
drivers/crypto/Makefile | 1 +
drivers/crypto/virtio/Kconfig | 10 +
drivers/crypto/virtio/Makefile | 5 +
drivers/net/can/Makefile | 1 -
drivers/net/ethernet/altera/Makefile | 1 -
drivers/net/ethernet/atheros/alx/Makefile | 1 -
drivers/net/ethernet/freescale/Makefile | 2 -
drivers/net/wireless/ath/Makefile | 2 -
drivers/net/wireless/ath/wil6210/Makefile | 2 -
.../wireless/broadcom/brcm80211/brcmfmac/Makefile | 2 -
.../wireless/broadcom/brcm80211/brcmsmac/Makefile | 1 -
drivers/net/wireless/intel/iwlegacy/Makefile | 2 -
drivers/net/wireless/intel/iwlwifi/Makefile | 2 +-
drivers/net/wireless/intel/iwlwifi/dvm/Makefile | 2 +-
drivers/net/wireless/intel/iwlwifi/mvm/Makefile | 2 +-
drivers/net/wireless/intersil/orinoco/Makefile | 3 -
drivers/net/wireless/mediatek/mt7601u/Makefile | 2 -
drivers/net/wireless/realtek/rtlwifi/Makefile | 2 -
.../wireless/realtek/rtlwifi/btcoexist/Makefile | 2 -
.../wireless/realtek/rtlwifi/rtl8188ee/Makefile | 2 -
.../net/wireless/realtek/rtlwifi/rtl8192c/Makefile | 2 -
.../wireless/realtek/rtlwifi/rtl8192ce/Makefile | 2 -
.../wireless/realtek/rtlwifi/rtl8192cu/Makefile | 2 -
.../wireless/realtek/rtlwifi/rtl8192de/Makefile | 2 -
.../wireless/realtek/rtlwifi/rtl8192ee/Makefile | 2 -
.../wireless/realtek/rtlwifi/rtl8192se/Makefile | 2 -
.../wireless/realtek/rtlwifi/rtl8723ae/Makefile | 2 -
.../wireless/realtek/rtlwifi/rtl8723be/Makefile | 2 -
.../wireless/realtek/rtlwifi/rtl8723com/Makefile | 2 -
.../wireless/realtek/rtlwifi/rtl8821ae/Makefile | 2 -
drivers/net/wireless/ti/wl1251/Makefile | 2 -
drivers/net/wireless/ti/wlcore/Makefile | 2 -
drivers/staging/rtl8188eu/Makefile | 2 +-
drivers/staging/rtl8192e/Makefile | 2 -
drivers/staging/rtl8192e/rtl8192e/Makefile | 2 -
include/uapi/linux/Kbuild | 1 +
net/bluetooth/Makefile | 2 -
net/ieee802154/Makefile | 2 -
net/mac80211/Makefile | 2 +-
net/mac802154/Makefile | 2 -
net/wireless/Makefile | 2 -
scripts/checkpatch.pl | 4 +-
86 files changed, 2106 insertions(+), 280 deletions(-)
create mode 100644 drivers/crypto/virtio/virtio_crypto_common.h
create mode 100644 include/uapi/linux/virtio_crypto.h
create mode 100644 drivers/crypto/virtio/virtio_crypto_algs.c
create mode 100644 drivers/crypto/virtio/virtio_crypto_core.c
create mode 100644 drivers/crypto/virtio/virtio_crypto_mgr.c
create mode 100644 drivers/crypto/virtio/Kconfig
create mode 100644 drivers/crypto/virtio/Makefile
^ permalink raw reply
* Re: [net-next PATCH v6 0/5] XDP for virtio_net
From: Michael S. Tsirkin @ 2016-12-15 23:17 UTC (permalink / raw)
To: John Fastabend
Cc: daniel, netdev, alexei.starovoitov, john.r.fastabend, brouer,
tgraf, davem
In-Reply-To: <20161215200712.23639.53043.stgit@john-Precision-Tower-5810>
On Thu, Dec 15, 2016 at 12:12:04PM -0800, John Fastabend wrote:
> This implements virtio_net for the mergeable buffers and big_packet
> modes. I tested this with vhost_net running on qemu and did not see
> any issues. For testing num_buf > 1 I added a hack to vhost driver
> to only but 100 bytes per buffer.
>
> There are some restrictions for XDP to be enabled and work well
> (see patch 3) for more details.
>
> 1. GUEST_TSO{4|6} must be off
> 2. MTU must be less than PAGE_SIZE
> 3. queues must be available to dedicate to XDP
> 4. num_bufs received in mergeable buffers must be 1
> 5. big_packet mode must have all data on single page
>
> To test this I used pktgen in the hypervisor and ran the XDP sample
> programs xdp1 and xdp2 from ./samples/bpf in the host. The default
> mode that is used with these patches with Linux guest and QEMU/Linux
> hypervisor is the mergeable buffers mode. I tested this mode for 2+
> days running xdp2 without issues. Additionally I did a series of
> driver unload/load tests to check the allocate/release paths.
>
> To test the big_packets path I applied the following simple patch against
> the virtio driver forcing big_packets mode,
>
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2242,7 +2242,7 @@ static int virtnet_probe(struct virtio_device *vdev)
> vi->big_packets = true;
>
> if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> - vi->mergeable_rx_bufs = true;
> + vi->mergeable_rx_bufs = false;
>
> if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF) ||
> virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
>
> I then repeated the tests with xdp1 and xdp2. After letting them run
> for a few hours I called it good enough.
>
> Testing the unexpected case where virtio receives a packet across
> multiple buffers required patching the hypervisor vhost driver to
> convince it to send these unexpected packets. Then I used ping with
> the -s option to trigger the case with multiple buffers. This mode
> is not expected to be used but as MST pointed out per spec it is
> not strictly speaking illegal to generate multi-buffer packets so we
> need someway to handle these. The following patch can be used to
> generate multiple buffers,
>
>
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1777,7 +1777,8 @@ static int translate_desc(struct vhost_virtqueue
> *vq, u64
>
> _iov = iov + ret;
> size = node->size - addr + node->start;
> - _iov->iov_len = min((u64)len - s, size);
> + printk("%s: build 100 length headers!\n", __func__);
> + _iov->iov_len = min((u64)len - s, (u64)100);//size);
> _iov->iov_base = (void __user *)(unsigned long)
> (node->userspace_addr + addr - node->start);
> s += size;
>
> The qemu command I most frequently used for testing (although I did test
> various other combinations of devices) is the following,
>
> ./x86_64-softmmu/qemu-system-x86_64 \
> -hda /var/lib/libvirt/images/Fedora-test0.img \
> -m 4096 -enable-kvm -smp 2 \
> -netdev tap,id=hn0,queues=4,vhost=on \
> -device virtio-net-pci,netdev=hn0,mq=on,vectors=9,guest_tso4=off,guest_tso6=off \
> -serial stdio
>
> The options 'guest_tso4=off,guest_tso6=off' are required because we
> do not support LRO with XDP at the moment.
>
> Please review any comments/feedback welcome as always.
>
> Thanks,
> John
>
> ---
OK, I think we can queue this for -next.
It's fairly limited in the kind of hardware supported, we can and
probably should extend it further with time.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> John Fastabend (5):
> net: xdp: add invalid buffer warning
> virtio_net: Add XDP support
> virtio_net: add dedicated XDP transmit queues
> virtio_net: add XDP_TX support
> virtio_net: xdp, add slowpath case for non contiguous buffers
>
>
> drivers/net/virtio_net.c | 365 +++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/filter.h | 1
> net/core/filter.c | 6 +
> 3 files changed, 365 insertions(+), 7 deletions(-)
>
> --
> Signature
^ permalink raw reply
* Re: Soft lockup in inet_put_port on 4.6
From: Craig Gallek @ 2016-12-15 23:25 UTC (permalink / raw)
To: Tom Herbert
Cc: Josef Bacik, Hannes Frederic Sowa, Eric Dumazet,
Linux Kernel Network Developers
In-Reply-To: <CALx6S34dmOWiEtOFQMc1FGhqvLqxM5JkD6q5MvaZz0cGZmNM=A@mail.gmail.com>
On Thu, Dec 15, 2016 at 5:39 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Thu, Dec 15, 2016 at 10:53 AM, Josef Bacik <jbacik@fb.com> wrote:
>> On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>
>>> On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek <kraigatgoog@gmail.com>
>>> wrote:
>>>>
>>>> On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert <tom@herbertland.com>
>>>> wrote:
>>>>>
>>>>> I think there may be some suspicious code in inet_csk_get_port. At
>>>>> tb_found there is:
>>>>>
>>>>> if (((tb->fastreuse > 0 && reuse) ||
>>>>> (tb->fastreuseport > 0 &&
>>>>> !rcu_access_pointer(sk->sk_reuseport_cb) &&
>>>>> sk->sk_reuseport && uid_eq(tb->fastuid, uid))) &&
>>>>> smallest_size == -1)
>>>>> goto success;
>>>>> if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb,
>>>>> true)) {
>>>>> if ((reuse ||
>>>>> (tb->fastreuseport > 0 &&
>>>>> sk->sk_reuseport &&
>>>>> !rcu_access_pointer(sk->sk_reuseport_cb)
>>>>> &&
>>>>> uid_eq(tb->fastuid, uid))) &&
>>>>> smallest_size != -1 && --attempts >= 0) {
>>>>> spin_unlock_bh(&head->lock);
>>>>> goto again;
>>>>> }
>>>>> goto fail_unlock;
>>>>> }
>>>>>
>>>>> AFAICT there is redundancy in these two conditionals. The same clause
>>>>> is being checked in both: (tb->fastreuseport > 0 &&
>>>>> !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport &&
>>>>> uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the
>>>>> first conditional should be hit, goto done, and the second will never
>>>>> evaluate that part to true-- unless the sk is changed (do we need
>>>>> READ_ONCE for sk->sk_reuseport_cb?).
>>>>
>>>> That's an interesting point... It looks like this function also
>>>> changed in 4.6 from using a single local_bh_disable() at the beginning
>>>> with several spin_lock(&head->lock) to exclusively
>>>> spin_lock_bh(&head->lock) at each locking point. Perhaps the full bh
>>>> disable variant was preventing the timers in your stack trace from
>>>> running interleaved with this function before?
>>>
>>>
>>> Could be, although dropping the lock shouldn't be able to affect the
>>> search state. TBH, I'm a little lost in reading function, the
>>> SO_REUSEPORT handling is pretty complicated. For instance,
>>> rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that
>>> function and also in every call to inet_csk_bind_conflict. I wonder if
>>> we can simply this under the assumption that SO_REUSEPORT is only
>>> allowed if the port number (snum) is explicitly specified.
>>
>>
>> Ok first I have data for you Hannes, here's the time distributions before
>> during and after the lockup (with all the debugging in place the box
>> eventually recovers). I've attached it as a text file since it is long.
>>
>> Second is I was thinking about why we would spend so much time doing the
>> ->owners list, and obviously it's because of the massive amount of timewait
>> sockets on the owners list. I wrote the following dumb patch and tested it
>> and the problem has disappeared completely. Now I don't know if this is
>> right at all, but I thought it was weird we weren't copying the soreuseport
>> option from the original socket onto the twsk. Is there are reason we
>> aren't doing this currently? Does this help explain what is happening?
>> Thanks,
>>
> I think that would explain it. We would be walking long lists of TW
> sockets in inet_bind_bucket_for_each(tb, &head->chain). This should
> break, although now I'm wondering if there's other ways we can get
> into this situation. reuseport ensures that we can have long lists of
> sockets in a single bucket, TW sockets can make that list really long.
What if the time-wait timer implementation was changed to do more
opportunistic removals? In this case, you seem to have a coordinated
timer event causing many independent locking events on the bucket in
question. If one of those firing events realized it could handle all
of them, you could greatly reduce the contention. The fact that they
all hash to the same bucket may make this even easier...
> Tom
>
>> Josef
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox