* Re: [PATCH net-next] bridge: multicast to unicast
From: Felix Fietkau @ 2017-01-06 13:52 UTC (permalink / raw)
To: Johannes Berg, Linus Lüssing, netdev
Cc: David S . Miller, Stephen Hemminger, bridge, linux-kernel,
linux-wireless, Michael Braun
In-Reply-To: <1483706872.4089.8.camel@sipsolutions.net>
On 2017-01-06 13:47, Johannes Berg wrote:
> On Mon, 2017-01-02 at 20:32 +0100, Linus Lüssing wrote:
>> Implements an optional, per bridge port flag and feature to deliver
>> multicast packets to any host on the according port via unicast
>> individually. This is done by copying the packet per host and
>> changing the multicast destination MAC to a unicast one accordingly.
>
> How does this compare and/or relate to the multicast-to-unicast feature
> we were going to add to the wifi stack, particularly mac80211? Do we
> perhaps not need that feature at all, if bridging will have it?
>
> I suppose that the feature there could apply also to locally generated
> traffic when the AP interface isn't in a bridge, but I think I could
> live with requiring the AP to be put into a bridge to achieve a similar
> configuration?
>
> Additionally, on an unrelated note, this seems to apply generically to
> all kinds of frames, losing information by replacing the address.
> Shouldn't it have similar limitations as the wifi stack feature has
> then, like only applying to ARP, IPv4, IPv6 and not general protocols?
>
> Also, it should probably come with the same caveat as we documented for
> the wifi feature:
>
> Note that this may break certain expectations of the receiver,
> such as the ability to drop unicast IP packets received within
> multicast L2 frames, or the ability to not send ICMP destination
> unreachable messages for packets received in L2 multicast (which
> is required, but the receiver can't tell the difference if this
> new option is enabled.)
>
>
> I'll hold off sending my tree in until we see that we really need both
> features, or decide that we want the wifi feature *instead* of the
> bridge feature.
The bridge layer can use IGMP snooping to ensure that the multicast
stream is only transmitted to clients that are actually a member of the
group. Can the mac80211 feature do the same?
- Felix
^ permalink raw reply
* Re: [PATCH net-next] bridge: multicast to unicast
From: Johannes Berg @ 2017-01-06 13:54 UTC (permalink / raw)
To: Felix Fietkau, Linus Lüssing, netdev
Cc: bridge, linux-wireless, linux-kernel, David S . Miller,
Michael Braun
In-Reply-To: <8836daaa-9638-4502-d079-fd428595f822@nbd.name>
> The bridge layer can use IGMP snooping to ensure that the multicast
> stream is only transmitted to clients that are actually a member of
> the group. Can the mac80211 feature do the same?
No, it'll convert the packet for all clients that are behind that
netdev. But that's an argument for dropping the mac80211 feature, which
hasn't been merged upstream yet, no?
johannes
^ permalink raw reply
* Re: [PATCH net-next] bridge: multicast to unicast
From: Felix Fietkau @ 2017-01-06 13:54 UTC (permalink / raw)
To: Johannes Berg, Linus Lüssing, netdev
Cc: David S . Miller, Stephen Hemminger, bridge, linux-kernel,
linux-wireless, Michael Braun
In-Reply-To: <1483710841.12677.1.camel@sipsolutions.net>
On 2017-01-06 14:54, Johannes Berg wrote:
>
>> The bridge layer can use IGMP snooping to ensure that the multicast
>> stream is only transmitted to clients that are actually a member of
>> the group. Can the mac80211 feature do the same?
>
> No, it'll convert the packet for all clients that are behind that
> netdev. But that's an argument for dropping the mac80211 feature, which
> hasn't been merged upstream yet, no?
Right.
- Felix
^ permalink raw reply
* Re: [PATCHv2 net-next 1/3] sctp: add stream arrays in asoc
From: Xin Long @ 2017-01-06 13:57 UTC (permalink / raw)
To: Marcelo Ricardo Leitner
Cc: network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem
In-Reply-To: <20170104133905.GA3781@localhost.localdomain>
On Wed, Jan 4, 2017 at 9:39 PM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Tue, Jan 03, 2017 at 01:59:46PM +0800, Xin Long wrote:
>> This patch is to add streamout and streamin arrays in asoc, initialize
>> them in sctp_process_init and free them in sctp_association_free.
>>
>> Stream arrays are used to replace ssnmap to save more stream things in
>> the next patch.
>>
>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>> ---
>> include/net/sctp/structs.h | 18 ++++++++++++++++++
>> net/sctp/associola.c | 19 +++++++++++++++++++
>> net/sctp/sm_make_chunk.c | 17 ++++++++++++++++-
>> 3 files changed, 53 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
>> index 87d56cc..549f17d 100644
>> --- a/include/net/sctp/structs.h
>> +++ b/include/net/sctp/structs.h
>> @@ -1331,6 +1331,18 @@ struct sctp_inithdr_host {
>> __u32 initial_tsn;
>> };
>>
>> +struct sctp_stream_out {
>> + __u16 ssn;
>> + __u8 state;
>> +};
>> +
>> +struct sctp_stream_in {
>> + __u16 ssn;
>> +};
>> +
>> +#define SCTP_STREAM_CLOSED 0x00
>> +#define SCTP_STREAM_OPEN 0x01
>> +
>> /* SCTP_GET_ASSOC_STATS counters */
>> struct sctp_priv_assoc_stats {
>> /* Maximum observed rto in the association during subsequent
>> @@ -1879,6 +1891,12 @@ struct sctp_association {
>> temp:1, /* Is it a temporary association? */
>> prsctp_enable:1;
>>
>> + /* stream arrays */
>> + struct sctp_stream_out *streamout;
>> + struct sctp_stream_in *streamin;
>> + __u16 streamoutcnt;
>> + __u16 streamincnt;
>> +
>> struct sctp_priv_assoc_stats stats;
>>
>> int sent_cnt_removable;
>> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
>> index d3cc30c..290ec4d 100644
>> --- a/net/sctp/associola.c
>> +++ b/net/sctp/associola.c
>> @@ -361,6 +361,10 @@ void sctp_association_free(struct sctp_association *asoc)
>> /* Free ssnmap storage. */
>> sctp_ssnmap_free(asoc->ssnmap);
>>
>> + /* Free stream information. */
>> + kfree(asoc->streamout);
>> + kfree(asoc->streamin);
>> +
>> /* Clean up the bound address list. */
>> sctp_bind_addr_free(&asoc->base.bind_addr);
>>
>> @@ -1130,6 +1134,8 @@ void sctp_assoc_update(struct sctp_association *asoc,
>> * has been discarded and needs retransmission.
>> */
>> if (asoc->state >= SCTP_STATE_ESTABLISHED) {
>> + int i;
>> +
>> asoc->next_tsn = new->next_tsn;
>> asoc->ctsn_ack_point = new->ctsn_ack_point;
>> asoc->adv_peer_ack_point = new->adv_peer_ack_point;
>> @@ -1139,6 +1145,12 @@ void sctp_assoc_update(struct sctp_association *asoc,
>> */
>> sctp_ssnmap_clear(asoc->ssnmap);
>>
>> + for (i = 0; i < asoc->streamoutcnt; i++)
>> + asoc->streamout[i].ssn = 0;
>> +
>> + for (i = 0; i < asoc->streamincnt; i++)
>> + asoc->streamin[i].ssn = 0;
>> +
>> /* Flush the ULP reassembly and ordered queue.
>> * Any data there will now be stale and will
>> * cause problems.
>> @@ -1168,6 +1180,13 @@ void sctp_assoc_update(struct sctp_association *asoc,
>> new->ssnmap = NULL;
>> }
>>
>> + if (!asoc->streamin && !asoc->streamout) {
>> + asoc->streamout = new->streamout;
>> + asoc->streamin = new->streamin;
>> + new->streamout = NULL;
>> + new->streamin = NULL;
>> + }
>> +
>> if (!asoc->assoc_id) {
>> /* get a new association id since we don't have one
>> * yet.
>> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
>> index 9e9690b..eeadeef 100644
>> --- a/net/sctp/sm_make_chunk.c
>> +++ b/net/sctp/sm_make_chunk.c
>> @@ -2442,13 +2442,28 @@ int sctp_process_init(struct sctp_association *asoc, struct sctp_chunk *chunk,
>> * association.
>> */
>> if (!asoc->temp) {
>> - int error;
>> + int error, i;
>> +
>> + asoc->streamoutcnt = asoc->c.sinit_num_ostreams;
>> + asoc->streamincnt = asoc->c.sinit_max_instreams;
>>
>> asoc->ssnmap = sctp_ssnmap_new(asoc->c.sinit_max_instreams,
>> asoc->c.sinit_num_ostreams, gfp);
>> if (!asoc->ssnmap)
>> goto clean_up;
>>
>> + asoc->streamout = kcalloc(asoc->streamoutcnt,
>> + sizeof(*asoc->streamout), gfp);
>> + if (!asoc->streamout)
>> + goto clean_up;
>> + for (i = 0; i < asoc->streamoutcnt; i++)
>> + asoc->streamout[i].state = SCTP_STREAM_OPEN;
>> +
>> + asoc->streamin = kcalloc(asoc->streamincnt,
>> + sizeof(*asoc->streamin), gfp);
>> + if (!asoc->streamin)
>> + goto clean_up;
>> +
>
> Xin, I understand the need to remove the 'ssnmap' term from the charts
> here as it will be, but lets try to put all the inner details of stream
> handling in a dedicated file.
>
> On the original patchset you were creating stream.c for RFC 6525 stuff.
> We probably can create it earlier and concentrate everything
> stream-related in there, so we keep it more contained. Thanks
will improve and repost.
>
>
>> error = sctp_assoc_set_id(asoc, gfp);
>> if (error)
>> goto clean_up;
>> --
>> 2.1.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
^ permalink raw reply
* [PATCH net-next] cxgb4: Add port description for new cards.
From: Ganesh Goudar @ 2017-01-06 11:22 UTC (permalink / raw)
To: netdev, davem; +Cc: nirranjan, hariprasad, Ganesh Goudar
Add port description for 25G and 100G cards, and also
change few port descriptions in compliance with the new
naming convention.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 30 ++++++++++++++++++------------
1 file changed, 18 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index cd5f437..8862fbd 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -5382,22 +5382,28 @@ unsigned int t4_get_mps_bg_map(struct adapter *adap, int idx)
const char *t4_get_port_type_description(enum fw_port_type port_type)
{
static const char *const port_type_description[] = {
- "R XFI",
- "R XAUI",
- "T SGMII",
- "T XFI",
- "T XAUI",
+ "Fiber_XFI",
+ "Fiber_XAUI",
+ "BT_SGMII",
+ "BT_XFI",
+ "BT_XAUI",
"KX4",
"CX4",
"KX",
"KR",
- "R SFP+",
- "KR/KX",
- "KR/KX/KX4",
- "R QSFP_10G",
- "R QSA",
- "R QSFP",
- "R BP40_BA",
+ "SFP",
+ "BP_AP",
+ "BP4_AP",
+ "QSFP_10G",
+ "QSA",
+ "QSFP",
+ "BP40_BA",
+ "KR4_100G",
+ "CR4_QSFP",
+ "CR_QSFP",
+ "CR2_QSFP",
+ "SFP28",
+ "KR_SFP28",
};
if (port_type < ARRAY_SIZE(port_type_description))
--
2.1.0
^ permalink raw reply related
* [PATCHv3 net-next] sctp: prepare asoc stream for stream reconf
From: Xin Long @ 2017-01-06 14:18 UTC (permalink / raw)
To: network dev, linux-sctp
Cc: Marcelo Ricardo Leitner, Neil Horman, Vlad Yasevich, davem
sctp stream reconf, described in RFC 6525, needs a structure to
save per stream information in assoc, like stream state.
In the future, sctp stream scheduler also needs it to save some
stream scheduler params and queues.
This patchset is to prepare the stream array in assoc for stream
reconf. It defines sctp_stream that includes stream arrays inside
to replace ssnmap.
Note that we use different structures for IN and OUT streams, as
the members in per OUT stream will get more and more different
from per IN stream.
v1->v2:
- put these patches into a smaller group.
v2->v3:
- define sctp_stream to contain stream arrays, and create stream.c
to put stream-related functions.
- merge 3 patches into 1, as new sctp_stream has the same name
with before.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
include/net/sctp/sctp.h | 1 -
include/net/sctp/structs.h | 76 +++++++++++----------------
net/sctp/Makefile | 2 +-
net/sctp/associola.c | 13 +++--
net/sctp/objcnt.c | 2 -
net/sctp/sm_make_chunk.c | 10 ++--
net/sctp/sm_statefuns.c | 3 +-
net/sctp/ssnmap.c | 125 ---------------------------------------------
net/sctp/stream.c | 85 ++++++++++++++++++++++++++++++
net/sctp/ulpqueue.c | 36 ++++++-------
10 files changed, 147 insertions(+), 206 deletions(-)
delete mode 100644 net/sctp/ssnmap.c
create mode 100644 net/sctp/stream.c
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index d8833a8..598d938 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -283,7 +283,6 @@ extern atomic_t sctp_dbg_objcnt_chunk;
extern atomic_t sctp_dbg_objcnt_bind_addr;
extern atomic_t sctp_dbg_objcnt_bind_bucket;
extern atomic_t sctp_dbg_objcnt_addr;
-extern atomic_t sctp_dbg_objcnt_ssnmap;
extern atomic_t sctp_dbg_objcnt_datamsg;
extern atomic_t sctp_dbg_objcnt_keys;
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 87d56cc..4741ec2 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -82,7 +82,6 @@ struct sctp_outq;
struct sctp_bind_addr;
struct sctp_ulpq;
struct sctp_ep_common;
-struct sctp_ssnmap;
struct crypto_shash;
@@ -377,54 +376,22 @@ typedef struct sctp_sender_hb_info {
__u64 hb_nonce;
} __packed sctp_sender_hb_info_t;
-/*
- * RFC 2960 1.3.2 Sequenced Delivery within Streams
- *
- * The term "stream" is used in SCTP to refer to a sequence of user
- * messages that are to be delivered to the upper-layer protocol in
- * order with respect to other messages within the same stream. This is
- * in contrast to its usage in TCP, where it refers to a sequence of
- * bytes (in this document a byte is assumed to be eight bits).
- * ...
- *
- * This is the structure we use to track both our outbound and inbound
- * SSN, or Stream Sequence Numbers.
- */
-
-struct sctp_stream {
- __u16 *ssn;
- unsigned int len;
-};
-
-struct sctp_ssnmap {
- struct sctp_stream in;
- struct sctp_stream out;
-};
-
-struct sctp_ssnmap *sctp_ssnmap_new(__u16 in, __u16 out,
- gfp_t gfp);
-void sctp_ssnmap_free(struct sctp_ssnmap *map);
-void sctp_ssnmap_clear(struct sctp_ssnmap *map);
+struct sctp_stream *sctp_stream_new(__u16 incnt, __u16 outcnt, gfp_t gfp);
+void sctp_stream_free(struct sctp_stream *stream);
+void sctp_stream_clear(struct sctp_stream *stream);
/* What is the current SSN number for this stream? */
-static inline __u16 sctp_ssn_peek(struct sctp_stream *stream, __u16 id)
-{
- return stream->ssn[id];
-}
+#define sctp_ssn_peek(stream, type, sid) \
+ ((stream)->type[sid].ssn)
/* Return the next SSN number for this stream. */
-static inline __u16 sctp_ssn_next(struct sctp_stream *stream, __u16 id)
-{
- return stream->ssn[id]++;
-}
+#define sctp_ssn_next(stream, type, sid) \
+ ((stream)->type[sid].ssn++)
/* Skip over this ssn and all below. */
-static inline void sctp_ssn_skip(struct sctp_stream *stream, __u16 id,
- __u16 ssn)
-{
- stream->ssn[id] = ssn+1;
-}
-
+#define sctp_ssn_skip(stream, type, sid, ssn) \
+ ((stream)->type[sid].ssn = ssn + 1)
+
/*
* Pointers to address related SCTP functions.
* (i.e. things that depend on the address family.)
@@ -1331,6 +1298,25 @@ struct sctp_inithdr_host {
__u32 initial_tsn;
};
+struct sctp_stream_out {
+ __u16 ssn;
+ __u8 state;
+};
+
+struct sctp_stream_in {
+ __u16 ssn;
+};
+
+struct sctp_stream {
+ struct sctp_stream_out *out;
+ struct sctp_stream_in *in;
+ __u16 outcnt;
+ __u16 incnt;
+};
+
+#define SCTP_STREAM_CLOSED 0x00
+#define SCTP_STREAM_OPEN 0x01
+
/* SCTP_GET_ASSOC_STATS counters */
struct sctp_priv_assoc_stats {
/* Maximum observed rto in the association during subsequent
@@ -1746,8 +1732,8 @@ struct sctp_association {
/* Default receive parameters */
__u32 default_rcv_context;
- /* This tracks outbound ssn for a given stream. */
- struct sctp_ssnmap *ssnmap;
+ /* Stream arrays */
+ struct sctp_stream *stream;
/* All outbound chunks go through this structure. */
struct sctp_outq outqueue;
diff --git a/net/sctp/Makefile b/net/sctp/Makefile
index 6c4f749..70f1b57 100644
--- a/net/sctp/Makefile
+++ b/net/sctp/Makefile
@@ -11,7 +11,7 @@ sctp-y := sm_statetable.o sm_statefuns.o sm_sideeffect.o \
transport.o chunk.o sm_make_chunk.o ulpevent.o \
inqueue.o outqueue.o ulpqueue.o \
tsnmap.o bind_addr.o socket.o primitive.o \
- output.o input.o debug.o ssnmap.o auth.o \
+ output.o input.o debug.o stream.o auth.o \
offload.o
sctp_probe-y := probe.o
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index d3cc30c..36294f7 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -358,8 +358,8 @@ void sctp_association_free(struct sctp_association *asoc)
sctp_tsnmap_free(&asoc->peer.tsn_map);
- /* Free ssnmap storage. */
- sctp_ssnmap_free(asoc->ssnmap);
+ /* Free stream information. */
+ sctp_stream_free(asoc->stream);
/* Clean up the bound address list. */
sctp_bind_addr_free(&asoc->base.bind_addr);
@@ -1137,7 +1137,7 @@ void sctp_assoc_update(struct sctp_association *asoc,
/* Reinitialize SSN for both local streams
* and peer's streams.
*/
- sctp_ssnmap_clear(asoc->ssnmap);
+ sctp_stream_clear(asoc->stream);
/* Flush the ULP reassembly and ordered queue.
* Any data there will now be stale and will
@@ -1162,10 +1162,9 @@ void sctp_assoc_update(struct sctp_association *asoc,
asoc->ctsn_ack_point = asoc->next_tsn - 1;
asoc->adv_peer_ack_point = asoc->ctsn_ack_point;
- if (!asoc->ssnmap) {
- /* Move the ssnmap. */
- asoc->ssnmap = new->ssnmap;
- new->ssnmap = NULL;
+ if (!asoc->stream) {
+ asoc->stream = new->stream;
+ new->stream = NULL;
}
if (!asoc->assoc_id) {
diff --git a/net/sctp/objcnt.c b/net/sctp/objcnt.c
index 40e7fac..105ac33 100644
--- a/net/sctp/objcnt.c
+++ b/net/sctp/objcnt.c
@@ -51,7 +51,6 @@ SCTP_DBG_OBJCNT(bind_addr);
SCTP_DBG_OBJCNT(bind_bucket);
SCTP_DBG_OBJCNT(chunk);
SCTP_DBG_OBJCNT(addr);
-SCTP_DBG_OBJCNT(ssnmap);
SCTP_DBG_OBJCNT(datamsg);
SCTP_DBG_OBJCNT(keys);
@@ -67,7 +66,6 @@ static sctp_dbg_objcnt_entry_t sctp_dbg_objcnt[] = {
SCTP_DBG_OBJCNT_ENTRY(bind_addr),
SCTP_DBG_OBJCNT_ENTRY(bind_bucket),
SCTP_DBG_OBJCNT_ENTRY(addr),
- SCTP_DBG_OBJCNT_ENTRY(ssnmap),
SCTP_DBG_OBJCNT_ENTRY(datamsg),
SCTP_DBG_OBJCNT_ENTRY(keys),
};
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 9e9690b..a15d824 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -1536,7 +1536,7 @@ void sctp_chunk_assign_ssn(struct sctp_chunk *chunk)
/* All fragments will be on the same stream */
sid = ntohs(chunk->subh.data_hdr->stream);
- stream = &chunk->asoc->ssnmap->out;
+ stream = chunk->asoc->stream;
/* Now assign the sequence number to the entire message.
* All fragments must have the same stream sequence number.
@@ -1547,9 +1547,9 @@ void sctp_chunk_assign_ssn(struct sctp_chunk *chunk)
ssn = 0;
} else {
if (lchunk->chunk_hdr->flags & SCTP_DATA_LAST_FRAG)
- ssn = sctp_ssn_next(stream, sid);
+ ssn = sctp_ssn_next(stream, out, sid);
else
- ssn = sctp_ssn_peek(stream, sid);
+ ssn = sctp_ssn_peek(stream, out, sid);
}
lchunk->subh.data_hdr->ssn = htons(ssn);
@@ -2444,9 +2444,9 @@ int sctp_process_init(struct sctp_association *asoc, struct sctp_chunk *chunk,
if (!asoc->temp) {
int error;
- asoc->ssnmap = sctp_ssnmap_new(asoc->c.sinit_max_instreams,
+ asoc->stream = sctp_stream_new(asoc->c.sinit_max_instreams,
asoc->c.sinit_num_ostreams, gfp);
- if (!asoc->ssnmap)
+ if (!asoc->stream)
goto clean_up;
error = sctp_assoc_set_id(asoc, gfp);
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 3382ef2..0ceded3 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -6274,9 +6274,8 @@ static int sctp_eat_data(const struct sctp_association *asoc,
* and is invalid.
*/
ssn = ntohs(data_hdr->ssn);
- if (ordered && SSN_lt(ssn, sctp_ssn_peek(&asoc->ssnmap->in, sid))) {
+ if (ordered && SSN_lt(ssn, sctp_ssn_peek(asoc->stream, in, sid)))
return SCTP_IERROR_PROTO_VIOLATION;
- }
/* Send the data up to the user. Note: Schedule the
* SCTP_CMD_CHUNK_ULP cmd before the SCTP_CMD_GEN_SACK, as the SACK
diff --git a/net/sctp/ssnmap.c b/net/sctp/ssnmap.c
deleted file mode 100644
index b9c8521..0000000
--- a/net/sctp/ssnmap.c
+++ /dev/null
@@ -1,125 +0,0 @@
-/* SCTP kernel implementation
- * Copyright (c) 2003 International Business Machines, Corp.
- *
- * This file is part of the SCTP kernel implementation
- *
- * These functions manipulate sctp SSN tracker.
- *
- * This SCTP implementation is free software;
- * you can redistribute it and/or modify it under the terms of
- * the GNU General Public License as published by
- * the Free Software Foundation; either version 2, or (at your option)
- * any later version.
- *
- * This SCTP implementation is distributed in the hope that it
- * will be useful, but WITHOUT ANY WARRANTY; without even the implied
- * ************************
- * warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- * See the GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with GNU CC; see the file COPYING. If not, see
- * <http://www.gnu.org/licenses/>.
- *
- * Please send any bug reports or fixes you make to the
- * email address(es):
- * lksctp developers <linux-sctp@vger.kernel.org>
- *
- * Written or modified by:
- * Jon Grimm <jgrimm@us.ibm.com>
- */
-
-#include <linux/types.h>
-#include <linux/slab.h>
-#include <net/sctp/sctp.h>
-#include <net/sctp/sm.h>
-
-static struct sctp_ssnmap *sctp_ssnmap_init(struct sctp_ssnmap *map, __u16 in,
- __u16 out);
-
-/* Storage size needed for map includes 2 headers and then the
- * specific needs of in or out streams.
- */
-static inline size_t sctp_ssnmap_size(__u16 in, __u16 out)
-{
- return sizeof(struct sctp_ssnmap) + (in + out) * sizeof(__u16);
-}
-
-
-/* Create a new sctp_ssnmap.
- * Allocate room to store at least 'len' contiguous TSNs.
- */
-struct sctp_ssnmap *sctp_ssnmap_new(__u16 in, __u16 out,
- gfp_t gfp)
-{
- struct sctp_ssnmap *retval;
- int size;
-
- size = sctp_ssnmap_size(in, out);
- if (size <= KMALLOC_MAX_SIZE)
- retval = kmalloc(size, gfp);
- else
- retval = (struct sctp_ssnmap *)
- __get_free_pages(gfp, get_order(size));
- if (!retval)
- goto fail;
-
- if (!sctp_ssnmap_init(retval, in, out))
- goto fail_map;
-
- SCTP_DBG_OBJCNT_INC(ssnmap);
-
- return retval;
-
-fail_map:
- if (size <= KMALLOC_MAX_SIZE)
- kfree(retval);
- else
- free_pages((unsigned long)retval, get_order(size));
-fail:
- return NULL;
-}
-
-
-/* Initialize a block of memory as a ssnmap. */
-static struct sctp_ssnmap *sctp_ssnmap_init(struct sctp_ssnmap *map, __u16 in,
- __u16 out)
-{
- memset(map, 0x00, sctp_ssnmap_size(in, out));
-
- /* Start 'in' stream just after the map header. */
- map->in.ssn = (__u16 *)&map[1];
- map->in.len = in;
-
- /* Start 'out' stream just after 'in'. */
- map->out.ssn = &map->in.ssn[in];
- map->out.len = out;
-
- return map;
-}
-
-/* Clear out the ssnmap streams. */
-void sctp_ssnmap_clear(struct sctp_ssnmap *map)
-{
- size_t size;
-
- size = (map->in.len + map->out.len) * sizeof(__u16);
- memset(map->in.ssn, 0x00, size);
-}
-
-/* Dispose of a ssnmap. */
-void sctp_ssnmap_free(struct sctp_ssnmap *map)
-{
- int size;
-
- if (unlikely(!map))
- return;
-
- size = sctp_ssnmap_size(map->in.len, map->out.len);
- if (size <= KMALLOC_MAX_SIZE)
- kfree(map);
- else
- free_pages((unsigned long)map, get_order(size));
-
- SCTP_DBG_OBJCNT_DEC(ssnmap);
-}
diff --git a/net/sctp/stream.c b/net/sctp/stream.c
new file mode 100644
index 0000000..f86de43
--- /dev/null
+++ b/net/sctp/stream.c
@@ -0,0 +1,85 @@
+/* SCTP kernel implementation
+ * (C) Copyright IBM Corp. 2001, 2004
+ * Copyright (c) 1999-2000 Cisco, Inc.
+ * Copyright (c) 1999-2001 Motorola, Inc.
+ * Copyright (c) 2001 Intel Corp.
+ *
+ * This file is part of the SCTP kernel implementation
+ *
+ * These functions manipulate sctp tsn mapping array.
+ *
+ * This SCTP implementation is free software;
+ * you can redistribute it and/or modify it under the terms of
+ * the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This SCTP implementation is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; without even the implied
+ * ************************
+ * warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with GNU CC; see the file COPYING. If not, see
+ * <http://www.gnu.org/licenses/>.
+ *
+ * Please send any bug reports or fixes you make to the
+ * email address(es):
+ * lksctp developers <linux-sctp@vger.kernel.org>
+ *
+ * Written or modified by:
+ * Xin Long <lucien.xin@gmail.com>
+ */
+
+#include <net/sctp/sctp.h>
+
+struct sctp_stream *sctp_stream_new(__u16 incnt, __u16 outcnt, gfp_t gfp)
+{
+ struct sctp_stream *stream;
+ int i;
+
+ stream = kzalloc(sizeof(*stream), gfp);
+ if (!stream)
+ return NULL;
+
+ stream->outcnt = outcnt;
+ stream->out = kcalloc(stream->outcnt, sizeof(*stream->out), gfp);
+ if (!stream->out) {
+ kfree(stream);
+ return NULL;
+ }
+ for (i = 0; i < stream->outcnt; i++)
+ stream->out[i].state = SCTP_STREAM_OPEN;
+
+ stream->incnt = incnt;
+ stream->in = kcalloc(stream->incnt, sizeof(*stream->in), gfp);
+ if (!stream->in) {
+ kfree(stream->out);
+ kfree(stream);
+ return NULL;
+ }
+
+ return stream;
+}
+
+void sctp_stream_free(struct sctp_stream *stream)
+{
+ if (unlikely(!stream))
+ return;
+
+ kfree(stream->out);
+ kfree(stream->in);
+ kfree(stream);
+}
+
+void sctp_stream_clear(struct sctp_stream *stream)
+{
+ int i;
+
+ for (i = 0; i < stream->outcnt; i++)
+ stream->out[i].ssn = 0;
+
+ for (i = 0; i < stream->incnt; i++)
+ stream->in[i].ssn = 0;
+}
diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
index 84d0fda..aa3624d 100644
--- a/net/sctp/ulpqueue.c
+++ b/net/sctp/ulpqueue.c
@@ -760,11 +760,11 @@ static void sctp_ulpq_retrieve_ordered(struct sctp_ulpq *ulpq,
struct sk_buff_head *event_list;
struct sk_buff *pos, *tmp;
struct sctp_ulpevent *cevent;
- struct sctp_stream *in;
+ struct sctp_stream *stream;
__u16 sid, csid, cssn;
sid = event->stream;
- in = &ulpq->asoc->ssnmap->in;
+ stream = ulpq->asoc->stream;
event_list = (struct sk_buff_head *) sctp_event2skb(event)->prev;
@@ -782,11 +782,11 @@ static void sctp_ulpq_retrieve_ordered(struct sctp_ulpq *ulpq,
if (csid < sid)
continue;
- if (cssn != sctp_ssn_peek(in, sid))
+ if (cssn != sctp_ssn_peek(stream, in, sid))
break;
- /* Found it, so mark in the ssnmap. */
- sctp_ssn_next(in, sid);
+ /* Found it, so mark in the stream. */
+ sctp_ssn_next(stream, in, sid);
__skb_unlink(pos, &ulpq->lobby);
@@ -849,7 +849,7 @@ static struct sctp_ulpevent *sctp_ulpq_order(struct sctp_ulpq *ulpq,
struct sctp_ulpevent *event)
{
__u16 sid, ssn;
- struct sctp_stream *in;
+ struct sctp_stream *stream;
/* Check if this message needs ordering. */
if (SCTP_DATA_UNORDERED & event->msg_flags)
@@ -858,10 +858,10 @@ static struct sctp_ulpevent *sctp_ulpq_order(struct sctp_ulpq *ulpq,
/* Note: The stream ID must be verified before this routine. */
sid = event->stream;
ssn = event->ssn;
- in = &ulpq->asoc->ssnmap->in;
+ stream = ulpq->asoc->stream;
/* Is this the expected SSN for this stream ID? */
- if (ssn != sctp_ssn_peek(in, sid)) {
+ if (ssn != sctp_ssn_peek(stream, in, sid)) {
/* We've received something out of order, so find where it
* needs to be placed. We order by stream and then by SSN.
*/
@@ -870,7 +870,7 @@ static struct sctp_ulpevent *sctp_ulpq_order(struct sctp_ulpq *ulpq,
}
/* Mark that the next chunk has been found. */
- sctp_ssn_next(in, sid);
+ sctp_ssn_next(stream, in, sid);
/* Go find any other chunks that were waiting for
* ordering.
@@ -888,12 +888,12 @@ static void sctp_ulpq_reap_ordered(struct sctp_ulpq *ulpq, __u16 sid)
struct sk_buff *pos, *tmp;
struct sctp_ulpevent *cevent;
struct sctp_ulpevent *event;
- struct sctp_stream *in;
+ struct sctp_stream *stream;
struct sk_buff_head temp;
struct sk_buff_head *lobby = &ulpq->lobby;
__u16 csid, cssn;
- in = &ulpq->asoc->ssnmap->in;
+ stream = ulpq->asoc->stream;
/* We are holding the chunks by stream, by SSN. */
skb_queue_head_init(&temp);
@@ -912,7 +912,7 @@ static void sctp_ulpq_reap_ordered(struct sctp_ulpq *ulpq, __u16 sid)
continue;
/* see if this ssn has been marked by skipping */
- if (!SSN_lt(cssn, sctp_ssn_peek(in, csid)))
+ if (!SSN_lt(cssn, sctp_ssn_peek(stream, in, csid)))
break;
__skb_unlink(pos, lobby);
@@ -932,8 +932,8 @@ static void sctp_ulpq_reap_ordered(struct sctp_ulpq *ulpq, __u16 sid)
csid = cevent->stream;
cssn = cevent->ssn;
- if (csid == sid && cssn == sctp_ssn_peek(in, csid)) {
- sctp_ssn_next(in, csid);
+ if (csid == sid && cssn == sctp_ssn_peek(stream, in, csid)) {
+ sctp_ssn_next(stream, in, csid);
__skb_unlink(pos, lobby);
__skb_queue_tail(&temp, pos);
event = sctp_skb2event(pos);
@@ -955,17 +955,17 @@ static void sctp_ulpq_reap_ordered(struct sctp_ulpq *ulpq, __u16 sid)
*/
void sctp_ulpq_skip(struct sctp_ulpq *ulpq, __u16 sid, __u16 ssn)
{
- struct sctp_stream *in;
+ struct sctp_stream *stream;
/* Note: The stream ID must be verified before this routine. */
- in = &ulpq->asoc->ssnmap->in;
+ stream = ulpq->asoc->stream;
/* Is this an old SSN? If so ignore. */
- if (SSN_lt(ssn, sctp_ssn_peek(in, sid)))
+ if (SSN_lt(ssn, sctp_ssn_peek(stream, in, sid)))
return;
/* Mark that we are no longer expecting this SSN or lower. */
- sctp_ssn_skip(in, sid, ssn);
+ sctp_ssn_skip(stream, in, sid, ssn);
/* Go find any other chunks that were waiting for
* ordering and deliver them if needed.
--
2.1.0
^ permalink raw reply related
* Re: [PATCHv2 net-next 05/16] net: mvpp2: introduce PPv2.2 HW descriptors and adapt accessors
From: Russell King - ARM Linux @ 2017-01-06 14:29 UTC (permalink / raw)
To: Thomas Petazzoni
Cc: Mark Rutland, devicetree, Yehuda Yitschak, Jason Cooper,
Pawel Moll, Ian Campbell, netdev, Hanna Hawa, Nadav Haklai,
Rob Herring, Andrew Lunn, Kumar Gala, Gregory Clement,
Stefan Chulski, Marcin Wojtas, David S. Miller, linux-arm-kernel,
Sebastian Hesselbarth
In-Reply-To: <1482943592-12556-6-git-send-email-thomas.petazzoni@free-electrons.com>
On Wed, Dec 28, 2016 at 05:46:21PM +0100, Thomas Petazzoni wrote:
> This commit adds the definition of the PPv2.2 HW descriptors, adjusts
> the mvpp2_tx_desc and mvpp2_rx_desc structures accordingly, and adapts
> the accessors to work on both PPv2.1 and PPv2.2.
>
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
...
> + /* On PPv2.2, the situation is more complicated,
> + * because there is only 40 bits to store the virtual
> + * address, which is not sufficient. So on 64 bits
> + * systems, we use phys_to_virt() to get the virtual
> + * address from the physical address, which is fine
> + * because the kernel linear mapping includes the
> + * entire 40 bits physical address space. On 32 bits
> + * systems however, we can't use phys_to_virt(), but
> + * since virtual addresses are 32 bits only, there is
> + * enough space in the RX descriptor for the full
> + * virtual address.
> + */
> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
> + dma_addr_t dma_addr =
> + rx_desc->pp22.buf_phys_addr_key_hash & DMA_BIT_MASK(40);
> + phys_addr_t phys_addr =
> + dma_to_phys(port->dev->dev.parent, dma_addr);
> +
> + return (unsigned long)phys_to_virt(phys_addr);
> +#else
> + return rx_desc->pp22.buf_cookie_misc & DMA_BIT_MASK(40);
> +#endif
I'm not sure that's the best way of selecting the difference. It seems
that the issue here is the size of the virtual address, so why not test
the size of a virtual address pointer?
if (8 * sizeof(rx_desc) > 40) {
/* do phys addr dance */
} else {
return rx_desc->pp22.buf_cookie_misc & DMA_BIT_MASK(40);
}
It also means that we get compile coverage over both sides of the
conditional.
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply
* Re: [PATCHv2 net-next 06/16] net: mvpp2: adjust the allocation/free of BM pools for PPv2.2
From: Russell King - ARM Linux @ 2017-01-06 14:32 UTC (permalink / raw)
To: Thomas Petazzoni
Cc: Mark Rutland, devicetree, Yehuda Yitschak, Jason Cooper,
Pawel Moll, Ian Campbell, netdev, Hanna Hawa, Nadav Haklai,
Rob Herring, Andrew Lunn, Kumar Gala, Gregory Clement,
Stefan Chulski, Marcin Wojtas, David S. Miller, linux-arm-kernel,
Sebastian Hesselbarth
In-Reply-To: <1482943592-12556-7-git-send-email-thomas.petazzoni@free-electrons.com>
On Wed, Dec 28, 2016 at 05:46:22PM +0100, Thomas Petazzoni wrote:
> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
> + if (priv->hw_version == MVPP22) {
Maybe
if (sizeof(dma_addr_t) > sizeof(u32) && priv->hw_version == MVPP22) {
to get better compile coverage?
> + u32 val;
> + u32 paddr_highbits;
> +
> + val = mvpp2_read(priv, MVPP2_BM_ADDR_HIGH_ALLOC);
> + paddr_highbits = (val & MVPP2_BM_ADDR_HIGH_PHYS_MASK);
> +
> + *paddr |= (dma_addr_t)paddr_highbits << 32;
> + *vaddr = (unsigned long)phys_to_virt(dma_to_phys(dev, *paddr));
> + }
> +#endif
> +}
> +
> /* Free all buffers from the pool */
> static void mvpp2_bm_bufs_free(struct device *dev, struct mvpp2 *priv,
> struct mvpp2_bm_pool *bm_pool)
> @@ -3616,10 +3671,8 @@ static void mvpp2_bm_bufs_free(struct device *dev, struct mvpp2 *priv,
> dma_addr_t buf_phys_addr;
> unsigned long vaddr;
>
> - /* Get buffer virtual address (indirect access) */
> - buf_phys_addr = mvpp2_read(priv,
> - MVPP2_BM_PHY_ALLOC_REG(bm_pool->id));
> - vaddr = mvpp2_read(priv, MVPP2_BM_VIRT_ALLOC_REG);
> + mvpp2_bm_bufs_get_addrs(dev, priv, bm_pool,
> + &buf_phys_addr, &vaddr);
>
> dma_unmap_single(dev, buf_phys_addr,
> bm_pool->buf_size, DMA_FROM_DEVICE);
> @@ -3651,7 +3704,7 @@ static int mvpp2_bm_pool_destroy(struct platform_device *pdev,
> val |= MVPP2_BM_STOP_MASK;
> mvpp2_write(priv, MVPP2_BM_POOL_CTRL_REG(bm_pool->id), val);
>
> - dma_free_coherent(&pdev->dev, sizeof(u32) * bm_pool->size,
> + dma_free_coherent(&pdev->dev, bm_pool->size_bytes,
> bm_pool->virt_addr,
> bm_pool->phys_addr);
> return 0;
> @@ -3787,8 +3840,19 @@ static inline void mvpp2_bm_pool_put(struct mvpp2_port *port, int pool,
> dma_addr_t buf_phys_addr,
> unsigned long buf_virt_addr)
> {
> - mvpp2_write(port->priv, MVPP2_BM_VIRT_RLS_REG, buf_virt_addr);
> - mvpp2_write(port->priv, MVPP2_BM_PHY_RLS_REG(pool), buf_phys_addr);
> +#if defined(CONFIG_ARCH_DMA_ADDR_T_64BIT)
> + u32 val;
> +
> + val = upper_32_bits(buf_phys_addr) & MVPP22_BM_ADDR_HIGH_PHYS_RLS_MASK;
> + val |= (upper_32_bits(buf_virt_addr) &
> + MVPP22_BM_ADDR_HIGH_VIRT_RLS_MASK)
> + << MVPP22_BM_ADDR_HIGH_VIRT_RLS_SHIFT;
> + mvpp2_write(port->priv, MVPP22_BM_ADDR_HIGH_RLS_REG, val);
> +#endif
Similar compile-time conditional?
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply
* Re: [PATCHv2 net-next 05/16] net: mvpp2: introduce PPv2.2 HW descriptors and adapt accessors
From: Robin Murphy @ 2017-01-06 14:44 UTC (permalink / raw)
To: Thomas Petazzoni
Cc: Mark Rutland, devicetree, Yehuda Yitschak, Jason Cooper,
Pawel Moll, Ian Campbell, netdev, Hanna Hawa,
Russell King - ARM Linux, Nadav Haklai, Rob Herring, Andrew Lunn,
Kumar Gala, Gregory Clement, Stefan Chulski, Marcin Wojtas,
David S. Miller, linux-arm-kernel, Sebastian Hesselbarth
In-Reply-To: <20170106142901.GC14217@n2100.armlinux.org.uk>
On 06/01/17 14:29, Russell King - ARM Linux wrote:
> On Wed, Dec 28, 2016 at 05:46:21PM +0100, Thomas Petazzoni wrote:
>> This commit adds the definition of the PPv2.2 HW descriptors, adjusts
>> the mvpp2_tx_desc and mvpp2_rx_desc structures accordingly, and adapts
>> the accessors to work on both PPv2.1 and PPv2.2.
>>
>> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> ...
>> + /* On PPv2.2, the situation is more complicated,
>> + * because there is only 40 bits to store the virtual
>> + * address, which is not sufficient. So on 64 bits
>> + * systems, we use phys_to_virt() to get the virtual
>> + * address from the physical address, which is fine
>> + * because the kernel linear mapping includes the
>> + * entire 40 bits physical address space. On 32 bits
>> + * systems however, we can't use phys_to_virt(), but
>> + * since virtual addresses are 32 bits only, there is
>> + * enough space in the RX descriptor for the full
>> + * virtual address.
>> + */
>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>> + dma_addr_t dma_addr =
>> + rx_desc->pp22.buf_phys_addr_key_hash & DMA_BIT_MASK(40);
>> + phys_addr_t phys_addr =
>> + dma_to_phys(port->dev->dev.parent, dma_addr);
Ugh, this looks bogus. dma_to_phys(), in the arm64 case at least, is
essentially a SWIOTLB internal helper function which has to be
implemented in architecture code because reasons. Calling it from a
driver is almost certainly wrong (it doesn't even exist on most
architectures). Besides, if this is really a genuine dma_addr_t obtained
from a DMA API call, you cannot infer it to be related to a CPU physical
address, or convertible to one at all.
>> +
>> + return (unsigned long)phys_to_virt(phys_addr);
>> +#else
>> + return rx_desc->pp22.buf_cookie_misc & DMA_BIT_MASK(40);
>> +#endif
>
> I'm not sure that's the best way of selecting the difference.
Given that CONFIG_ARCH_DMA_ADDR_T_64BIT could be enabled on 32-bit LPAE
systems, indeed it definitely isn't.
Robin.
> It seems
> that the issue here is the size of the virtual address, so why not test
> the size of a virtual address pointer?
>
> if (8 * sizeof(rx_desc) > 40) {
> /* do phys addr dance */
> } else {
> return rx_desc->pp22.buf_cookie_misc & DMA_BIT_MASK(40);
> }
>
> It also means that we get compile coverage over both sides of the
> conditional.
>
^ permalink raw reply
* Re: [PATCHv2 net-next 10/16] net: mvpp2: handle register mapping and access for PPv2.2
From: Russell King - ARM Linux @ 2017-01-06 14:46 UTC (permalink / raw)
To: Thomas Petazzoni
Cc: Mark Rutland, devicetree, Yehuda Yitschak, Jason Cooper,
Pawel Moll, Ian Campbell, netdev, Hanna Hawa, Nadav Haklai,
Rob Herring, Andrew Lunn, Kumar Gala, Gregory Clement,
Stefan Chulski, Marcin Wojtas, David S. Miller, linux-arm-kernel,
Sebastian Hesselbarth
In-Reply-To: <1482943592-12556-11-git-send-email-thomas.petazzoni@free-electrons.com>
On Wed, Dec 28, 2016 at 05:46:26PM +0100, Thomas Petazzoni wrote:
> This commit adjusts the mvpp2 driver register mapping and access logic
> to support PPv2.2, to handle a number of differences.
>
> Due to how the registers are laid out in memory, the Device Tree binding
> for the "reg" property is different:
>
> - On PPv2.1, we had a first area for the common registers, and then one
> area per port.
>
> - On PPv2.2, we have a first area for the common registers, and a
> second area for all the per-ports registers.
>
> In addition, on PPv2.2, the area for the common registers is split into
> so-called "address spaces" of 64 KB each. They allow to access the same
> registers, but from different CPUs. Hence the introduction of cpu_base[]
> in 'struct mvpp2', and the modification of the mvpp2_write() and
> mvpp2_read() register accessors. For PPv2.1, the compatibility is
> preserved by using an "address space" size of 0.
I'm not entirely sure this is the best solution - every register access
will be wrapped with a preempt_disable() and preempt_enable(). At
every site, when preempt is enabled, we will end up with code to:
- get the thread info
- increment the preempt count
- access the register
- decrement the preempt count
- test resulting preempt count and branch to __preempt_schedule()
If tracing is enabled, it gets much worse, because the increment and
decrement happen out of line, and are even more expensive.
If a function is going to make several register accesses, it's going
to be much more efficient to do:
void __iomem *base = priv->cpu_base[get_cpu()];
...
put_cpu();
which means we don't end up with multiple instances of the preempt code
consecutive accesses.
I think this is an example where having driver-private accessors for
readl()/writel() is far from a good idea.
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply
* Re: [PATCH net-next v4 0/4] Fix OdroidC2 Gigabit Tx link issue
From: Russell King - ARM Linux @ 2017-01-06 15:05 UTC (permalink / raw)
To: Jerome Brunet
Cc: Andrew Lunn, Florian Fainelli, Alexandre TORGUE, Neil Armstrong,
Martin Blumenstingl, netdev, linux-kernel, Yegor Yefremov,
Julia Lawall, devicetree, Andre Roth, Kevin Hilman, Carlo Caione,
Giuseppe Cavallaro, linux-amlogic, Andreas Färber,
linux-arm-kernel
In-Reply-To: <1483710621.28003.74.camel@baylibre.com>
(quick reply...)
On Fri, Jan 06, 2017 at 02:50:21PM +0100, Jerome Brunet wrote:
> So I'm not sure I understand, are you against EEE integration in phylib
> entirely, or specifically against the test I added in set_eee to filter
> out broken modes ?
I'm happy to see EEE integrated into phylib, but I think the current
implementation is very buggy and needs a rewrite.
> > BTW, one of the problems (not caused by your patch) is that changing
> > the EEE advertisment does not (on all PHY drivers) cause the link to
> > be renegotiated - there's no call to phy_start_aneg() when the advert
> > changes, and even if there was, there's no guarantee that
> > phy_start_aneg() will even set the AN restart bit in the control
> > register.
> >
> > However, given that you're hooking into the set_eee function, I'm not
> > sure why you placed your EEE advertisment thing into config_aneg() -
> > isn't it more an initialisation thing (so should be in
> > config_init()?)
>
> What I change is what the PHY advertise, so it seems logical to do it
> where "genphy_config_advert" was called. Just taking the existing code
> as an example
You need to adjust the adverisment in two places:
1. On initialisation, when you need to change the default value.
2. Whenever the user requests a different EEE advertisment.
You don't need to do it each time config_aneg() is called - nothing's
going to change the EEE advertisment in that path. Hence, to check
it each and every time seems like a waste of CPU cycles.
However, there's another path that needs to be considered, which the
current EEE code fails to do, and that is the resume path. Nothing
at present saves and restores the EEE settings, they are completely
lost if the PHY is powered down. This is just another symptom of the
current poor quality EEE implementation in phylib, and another reason
why I say above that the EEE code is in need of a rewrite... which is
something I will be looking at.
If the EEE settings are properly saved and restored over suspend/
resume, then the previously programmed EEE advertisment would also
be restored.
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply
* [PATCH 0/3] xen: optimize xenbus performance
From: Juergen Gross @ 2017-01-06 15:05 UTC (permalink / raw)
To: linux-kernel, xen-devel
Cc: Juergen Gross, wei.liu2, netdev, paul.durrant, boris.ostrovsky,
roger.pau
The xenbus driver used for communication with Xenstore (all kernel
accesses to Xenstore and in case of Xenstore living in another domain
all accesses of the local domain to Xenstore) is rather simple
especially regarding multiple concurrent accesses: they are just being
serialized in spite of Xenstore being capable to handle multiple
parallel accesses.
Clean up the external interface(s) of xenbus and optimize its
performance by allowing multiple concurrent accesses to Xenstore.
Juergen Gross (3):
xen: clean up xenbus internal headers
xen: modify xenstore watch event interface
xen: optimize xenbus driver for multiple concurrent xenstore accesses
drivers/block/xen-blkback/xenbus.c | 6 +-
drivers/net/xen-netback/xenbus.c | 8 +-
drivers/xen/cpu_hotplug.c | 5 +-
drivers/xen/manage.c | 6 +-
drivers/xen/xen-balloon.c | 2 +-
drivers/xen/xen-pciback/xenbus.c | 2 +-
drivers/xen/xenbus/xenbus.h | 134 ++++++++
drivers/xen/xenbus/xenbus_client.c | 6 +-
drivers/xen/xenbus/xenbus_comms.c | 319 +++++++++++++++--
drivers/xen/xenbus/xenbus_comms.h | 51 ---
drivers/xen/xenbus/xenbus_dev_backend.c | 2 +-
drivers/xen/xenbus/xenbus_dev_frontend.c | 213 +++++++-----
drivers/xen/xenbus/xenbus_probe.c | 14 +-
drivers/xen/xenbus/xenbus_probe.h | 88 -----
drivers/xen/xenbus/xenbus_probe_backend.c | 11 +-
drivers/xen/xenbus/xenbus_probe_frontend.c | 17 +-
drivers/xen/xenbus/xenbus_xs.c | 535 +++++++++++++----------------
drivers/xen/xenfs/super.c | 2 +-
drivers/xen/xenfs/xenstored.c | 2 +-
include/xen/xenbus.h | 18 +-
20 files changed, 830 insertions(+), 611 deletions(-)
create mode 100644 drivers/xen/xenbus/xenbus.h
delete mode 100644 drivers/xen/xenbus/xenbus_comms.h
delete mode 100644 drivers/xen/xenbus/xenbus_probe.h
Cc: konrad.wilk@oracle.com
Cc: roger.pau@citrix.com
Cc: wei.liu2@citrix.com
Cc: paul.durrant@citrix.com
Cc: netdev@vger.kernel.org
--
2.10.2
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply
* [PATCH 2/3] xen: modify xenstore watch event interface
From: Juergen Gross @ 2017-01-06 15:05 UTC (permalink / raw)
To: linux-kernel, xen-devel
Cc: Juergen Gross, wei.liu2, netdev, paul.durrant, boris.ostrovsky,
roger.pau
In-Reply-To: <20170106150544.10836-1-jgross@suse.com>
Today a Xenstore watch event is delivered via a callback function
declared as:
void (*callback)(struct xenbus_watch *,
const char **vec, unsigned int len);
As all watch events only ever come with two parameters (path and token)
changing the prototype to:
void (*callback)(struct xenbus_watch *,
const char *path, const char *token);
is the natural thing to do.
Apply this change and adapt all users.
Cc: konrad.wilk@oracle.com
Cc: roger.pau@citrix.com
Cc: wei.liu2@citrix.com
Cc: paul.durrant@citrix.com
Cc: netdev@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
---
drivers/block/xen-blkback/xenbus.c | 6 +++---
drivers/net/xen-netback/xenbus.c | 8 ++++----
drivers/xen/cpu_hotplug.c | 5 ++---
drivers/xen/manage.c | 6 +++---
drivers/xen/xen-balloon.c | 2 +-
drivers/xen/xen-pciback/xenbus.c | 2 +-
drivers/xen/xenbus/xenbus.h | 6 +++---
drivers/xen/xenbus/xenbus_client.c | 4 ++--
drivers/xen/xenbus/xenbus_dev_frontend.c | 21 ++++++++-------------
drivers/xen/xenbus/xenbus_probe.c | 11 ++++-------
drivers/xen/xenbus/xenbus_probe_backend.c | 8 ++++----
drivers/xen/xenbus/xenbus_probe_frontend.c | 14 +++++++-------
drivers/xen/xenbus/xenbus_xs.c | 29 ++++++++++++++---------------
include/xen/xenbus.h | 6 +++---
14 files changed, 59 insertions(+), 69 deletions(-)
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 415e79b..8fe61b5 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -38,8 +38,8 @@ struct backend_info {
static struct kmem_cache *xen_blkif_cachep;
static void connect(struct backend_info *);
static int connect_ring(struct backend_info *);
-static void backend_changed(struct xenbus_watch *, const char **,
- unsigned int);
+static void backend_changed(struct xenbus_watch *, const char *,
+ const char *);
static void xen_blkif_free(struct xen_blkif *blkif);
static void xen_vbd_free(struct xen_vbd *vbd);
@@ -661,7 +661,7 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
* ready, connect.
*/
static void backend_changed(struct xenbus_watch *watch,
- const char **vec, unsigned int len)
+ const char *path, const char *token)
{
int err;
unsigned major;
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 3124eae..d8a40fa 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -723,7 +723,7 @@ static int xen_net_read_mac(struct xenbus_device *dev, u8 mac[])
}
static void xen_net_rate_changed(struct xenbus_watch *watch,
- const char **vec, unsigned int len)
+ const char *path, const char *token)
{
struct xenvif *vif = container_of(watch, struct xenvif, credit_watch);
struct xenbus_device *dev = xenvif_to_xenbus_device(vif);
@@ -780,7 +780,7 @@ static void xen_unregister_credit_watch(struct xenvif *vif)
}
static void xen_mcast_ctrl_changed(struct xenbus_watch *watch,
- const char **vec, unsigned int len)
+ const char *path, const char *token)
{
struct xenvif *vif = container_of(watch, struct xenvif,
mcast_ctrl_watch);
@@ -855,8 +855,8 @@ static void unregister_hotplug_status_watch(struct backend_info *be)
}
static void hotplug_status_changed(struct xenbus_watch *watch,
- const char **vec,
- unsigned int vec_size)
+ const char *path,
+ const char *token)
{
struct backend_info *be = container_of(watch,
struct backend_info,
diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c
index 5676aef..7a4daa2 100644
--- a/drivers/xen/cpu_hotplug.c
+++ b/drivers/xen/cpu_hotplug.c
@@ -68,13 +68,12 @@ static void vcpu_hotplug(unsigned int cpu)
}
static void handle_vcpu_hotplug_event(struct xenbus_watch *watch,
- const char **vec, unsigned int len)
+ const char *path, const char *token)
{
unsigned int cpu;
char *cpustr;
- const char *node = vec[XS_WATCH_PATH];
- cpustr = strstr(node, "cpu/");
+ cpustr = strstr(path, "cpu/");
if (cpustr != NULL) {
sscanf(cpustr, "cpu/%u", &cpu);
vcpu_hotplug(cpu);
diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 26e5e85..ca62c09 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -218,7 +218,7 @@ static struct shutdown_handler shutdown_handlers[] = {
};
static void shutdown_handler(struct xenbus_watch *watch,
- const char **vec, unsigned int len)
+ const char *path, const char *token)
{
char *str;
struct xenbus_transaction xbt;
@@ -266,8 +266,8 @@ static void shutdown_handler(struct xenbus_watch *watch,
}
#ifdef CONFIG_MAGIC_SYSRQ
-static void sysrq_handler(struct xenbus_watch *watch, const char **vec,
- unsigned int len)
+static void sysrq_handler(struct xenbus_watch *watch, const char *path,
+ const char *token)
{
char sysrq_key = '\0';
struct xenbus_transaction xbt;
diff --git a/drivers/xen/xen-balloon.c b/drivers/xen/xen-balloon.c
index 79865b8..e7715cb 100644
--- a/drivers/xen/xen-balloon.c
+++ b/drivers/xen/xen-balloon.c
@@ -55,7 +55,7 @@ static int register_balloon(struct device *dev);
/* React to a change in the target key */
static void watch_target(struct xenbus_watch *watch,
- const char **vec, unsigned int len)
+ const char *path, const char *token)
{
unsigned long long new_target;
int err;
diff --git a/drivers/xen/xen-pciback/xenbus.c b/drivers/xen/xen-pciback/xenbus.c
index 3f0aee0..3814b44 100644
--- a/drivers/xen/xen-pciback/xenbus.c
+++ b/drivers/xen/xen-pciback/xenbus.c
@@ -652,7 +652,7 @@ static int xen_pcibk_setup_backend(struct xen_pcibk_device *pdev)
}
static void xen_pcibk_be_watch(struct xenbus_watch *watch,
- const char **vec, unsigned int len)
+ const char *path, const char *token)
{
struct xen_pcibk_device *pdev =
container_of(watch, struct xen_pcibk_device, be_watch);
diff --git a/drivers/xen/xenbus/xenbus.h b/drivers/xen/xenbus/xenbus.h
index 6a80c1e..bd95c21 100644
--- a/drivers/xen/xenbus/xenbus.h
+++ b/drivers/xen/xenbus/xenbus.h
@@ -39,8 +39,8 @@ struct xen_bus_type {
int (*get_bus_id)(char bus_id[XEN_BUS_ID_SIZE], const char *nodename);
int (*probe)(struct xen_bus_type *bus, const char *type,
const char *dir);
- void (*otherend_changed)(struct xenbus_watch *watch, const char **vec,
- unsigned int len);
+ void (*otherend_changed)(struct xenbus_watch *watch, const char *path,
+ const char *token);
struct bus_type bus;
};
@@ -83,7 +83,7 @@ int xenbus_dev_resume(struct device *dev);
int xenbus_dev_cancel(struct device *dev);
void xenbus_otherend_changed(struct xenbus_watch *watch,
- const char **vec, unsigned int len,
+ const char *path, const char *token,
int ignore_on_shutdown);
int xenbus_read_otherend_details(struct xenbus_device *xendev,
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 23edf53..9586c24 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -115,7 +115,7 @@ EXPORT_SYMBOL_GPL(xenbus_strstate);
int xenbus_watch_path(struct xenbus_device *dev, const char *path,
struct xenbus_watch *watch,
void (*callback)(struct xenbus_watch *,
- const char **, unsigned int))
+ const char *, const char *))
{
int err;
@@ -153,7 +153,7 @@ EXPORT_SYMBOL_GPL(xenbus_watch_path);
int xenbus_watch_pathfmt(struct xenbus_device *dev,
struct xenbus_watch *watch,
void (*callback)(struct xenbus_watch *,
- const char **, unsigned int),
+ const char *, const char *),
const char *pathfmt, ...)
{
int err;
diff --git a/drivers/xen/xenbus/xenbus_dev_frontend.c b/drivers/xen/xenbus/xenbus_dev_frontend.c
index e2bc9b3..e4b9847 100644
--- a/drivers/xen/xenbus/xenbus_dev_frontend.c
+++ b/drivers/xen/xenbus/xenbus_dev_frontend.c
@@ -258,26 +258,23 @@ static struct watch_adapter *alloc_watch_adapter(const char *path,
}
static void watch_fired(struct xenbus_watch *watch,
- const char **vec,
- unsigned int len)
+ const char *path,
+ const char *token)
{
struct watch_adapter *adap;
struct xsd_sockmsg hdr;
- const char *path, *token;
- int path_len, tok_len, body_len, data_len = 0;
+ const char *token_caller;
+ int path_len, tok_len, body_len;
int ret;
LIST_HEAD(staging_q);
adap = container_of(watch, struct watch_adapter, watch);
- path = vec[XS_WATCH_PATH];
- token = adap->token;
+ token_caller = adap->token;
path_len = strlen(path) + 1;
- tok_len = strlen(token) + 1;
- if (len > 2)
- data_len = vec[len] - vec[2] + 1;
- body_len = path_len + tok_len + data_len;
+ tok_len = strlen(token_caller) + 1;
+ body_len = path_len + tok_len;
hdr.type = XS_WATCH_EVENT;
hdr.len = body_len;
@@ -288,9 +285,7 @@ static void watch_fired(struct xenbus_watch *watch,
if (!ret)
ret = queue_reply(&staging_q, path, path_len);
if (!ret)
- ret = queue_reply(&staging_q, token, tok_len);
- if (!ret && len > 2)
- ret = queue_reply(&staging_q, vec[2], data_len);
+ ret = queue_reply(&staging_q, token_caller, tok_len);
if (!ret) {
/* success: pass reply list onto watcher */
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 6baffbb..74888ca 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -169,7 +169,7 @@ int xenbus_read_otherend_details(struct xenbus_device *xendev,
EXPORT_SYMBOL_GPL(xenbus_read_otherend_details);
void xenbus_otherend_changed(struct xenbus_watch *watch,
- const char **vec, unsigned int len,
+ const char *path, const char *token,
int ignore_on_shutdown)
{
struct xenbus_device *dev =
@@ -180,18 +180,15 @@ void xenbus_otherend_changed(struct xenbus_watch *watch,
/* Protect us against watches firing on old details when the otherend
details change, say immediately after a resume. */
if (!dev->otherend ||
- strncmp(dev->otherend, vec[XS_WATCH_PATH],
- strlen(dev->otherend))) {
- dev_dbg(&dev->dev, "Ignoring watch at %s\n",
- vec[XS_WATCH_PATH]);
+ strncmp(dev->otherend, path, strlen(dev->otherend))) {
+ dev_dbg(&dev->dev, "Ignoring watch at %s\n", path);
return;
}
state = xenbus_read_driver_state(dev->otherend);
dev_dbg(&dev->dev, "state is %d, (%s), %s, %s\n",
- state, xenbus_strstate(state), dev->otherend_watch.node,
- vec[XS_WATCH_PATH]);
+ state, xenbus_strstate(state), dev->otherend_watch.node, path);
/*
* Ignore xenbus transitions during shutdown. This prevents us doing
diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c
index f46b4dc..b0bed4f 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -181,9 +181,9 @@ static int xenbus_probe_backend(struct xen_bus_type *bus, const char *type,
}
static void frontend_changed(struct xenbus_watch *watch,
- const char **vec, unsigned int len)
+ const char *path, const char *token)
{
- xenbus_otherend_changed(watch, vec, len, 0);
+ xenbus_otherend_changed(watch, path, token, 0);
}
static struct xen_bus_type xenbus_backend = {
@@ -204,11 +204,11 @@ static struct xen_bus_type xenbus_backend = {
};
static void backend_changed(struct xenbus_watch *watch,
- const char **vec, unsigned int len)
+ const char *path, const char *token)
{
DPRINTK("");
- xenbus_dev_changed(vec[XS_WATCH_PATH], &xenbus_backend);
+ xenbus_dev_changed(path, &xenbus_backend);
}
static struct xenbus_watch be_watch = {
diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
index d7b77a6..19e45ce 100644
--- a/drivers/xen/xenbus/xenbus_probe_frontend.c
+++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
@@ -86,9 +86,9 @@ static int xenbus_uevent_frontend(struct device *_dev,
static void backend_changed(struct xenbus_watch *watch,
- const char **vec, unsigned int len)
+ const char *path, const char *token)
{
- xenbus_otherend_changed(watch, vec, len, 1);
+ xenbus_otherend_changed(watch, path, token, 1);
}
static void xenbus_frontend_delayed_resume(struct work_struct *w)
@@ -153,11 +153,11 @@ static struct xen_bus_type xenbus_frontend = {
};
static void frontend_changed(struct xenbus_watch *watch,
- const char **vec, unsigned int len)
+ const char *path, const char *token)
{
DPRINTK("");
- xenbus_dev_changed(vec[XS_WATCH_PATH], &xenbus_frontend);
+ xenbus_dev_changed(path, &xenbus_frontend);
}
@@ -332,13 +332,13 @@ static DECLARE_WAIT_QUEUE_HEAD(backend_state_wq);
static int backend_state;
static void xenbus_reset_backend_state_changed(struct xenbus_watch *w,
- const char **v, unsigned int l)
+ const char *path, const char *token)
{
- if (xenbus_scanf(XBT_NIL, v[XS_WATCH_PATH], "", "%i",
+ if (xenbus_scanf(XBT_NIL, path, "", "%i",
&backend_state) != 1)
backend_state = XenbusStateUnknown;
printk(KERN_DEBUG "XENBUS: backend %s %s\n",
- v[XS_WATCH_PATH], xenbus_strstate(backend_state));
+ path, xenbus_strstate(backend_state));
wake_up(&backend_state_wq);
}
diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
index 4c49d87..ebc768f 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -64,8 +64,8 @@ struct xs_stored_msg {
/* Queued watch events. */
struct {
struct xenbus_watch *handle;
- char **vec;
- unsigned int vec_size;
+ const char *path;
+ const char *token;
} watch;
} u;
};
@@ -765,7 +765,7 @@ void unregister_xenbus_watch(struct xenbus_watch *watch)
if (msg->u.watch.handle != watch)
continue;
list_del(&msg->list);
- kfree(msg->u.watch.vec);
+ kfree(msg->u.watch.path);
kfree(msg);
}
spin_unlock(&watch_events_lock);
@@ -833,11 +833,10 @@ static int xenwatch_thread(void *unused)
if (ent != &watch_events) {
msg = list_entry(ent, struct xs_stored_msg, list);
- msg->u.watch.handle->callback(
- msg->u.watch.handle,
- (const char **)msg->u.watch.vec,
- msg->u.watch.vec_size);
- kfree(msg->u.watch.vec);
+ msg->u.watch.handle->callback(msg->u.watch.handle,
+ msg->u.watch.path,
+ msg->u.watch.token);
+ kfree(msg->u.watch.path);
kfree(msg);
}
@@ -903,24 +902,24 @@ static int process_msg(void)
body[msg->hdr.len] = '\0';
if (msg->hdr.type == XS_WATCH_EVENT) {
- msg->u.watch.vec = split(body, msg->hdr.len,
- &msg->u.watch.vec_size);
- if (IS_ERR(msg->u.watch.vec)) {
- err = PTR_ERR(msg->u.watch.vec);
+ if (count_strings(body, msg->hdr.len) != 2) {
+ err = -EINVAL;
kfree(msg);
+ kfree(body);
goto out;
}
+ msg->u.watch.path = (const char *)body;
+ msg->u.watch.token = (const char *)strchr(body, '\0') + 1;
spin_lock(&watches_lock);
- msg->u.watch.handle = find_watch(
- msg->u.watch.vec[XS_WATCH_TOKEN]);
+ msg->u.watch.handle = find_watch(msg->u.watch.token);
if (msg->u.watch.handle != NULL) {
spin_lock(&watch_events_lock);
list_add_tail(&msg->list, &watch_events);
wake_up(&watch_events_waitq);
spin_unlock(&watch_events_lock);
} else {
- kfree(msg->u.watch.vec);
+ kfree(body);
kfree(msg);
}
spin_unlock(&watches_lock);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 98f73a2..869c816 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -61,7 +61,7 @@ struct xenbus_watch
/* Callback (executed in a process context with no locks held). */
void (*callback)(struct xenbus_watch *,
- const char **vec, unsigned int len);
+ const char *path, const char *token);
};
@@ -193,11 +193,11 @@ void xenbus_probe(struct work_struct *);
int xenbus_watch_path(struct xenbus_device *dev, const char *path,
struct xenbus_watch *watch,
void (*callback)(struct xenbus_watch *,
- const char **, unsigned int));
+ const char *, const char *));
__printf(4, 5)
int xenbus_watch_pathfmt(struct xenbus_device *dev, struct xenbus_watch *watch,
void (*callback)(struct xenbus_watch *,
- const char **, unsigned int),
+ const char *, const char *),
const char *pathfmt, ...);
int xenbus_switch_state(struct xenbus_device *dev, enum xenbus_state new_state);
--
2.10.2
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply related
* Re: [PATCH iproute2 2/3] ip vrf: Improve cgroup2 error messages
From: David Ahern @ 2017-01-06 15:05 UTC (permalink / raw)
To: Sergei Shtylyov, netdev, stephen
In-Reply-To: <f9858cd0-277b-02cd-35dc-ffc24862f736@cogentembedded.com>
>> @@ -80,13 +80,21 @@ char *find_cgroup2_mount(void)
>>
>> if (mount("none", mnt, CGROUP2_FS_NAME, 0, NULL)) {
>> /* EBUSY means already mounted */
>> - if (errno != EBUSY) {
>> + if (errno == EBUSY)
>> + goto out;
>> +
>> + if (errno == ENODEV) {
>> fprintf(stderr,
>> "Failed to mount cgroup2. Are CGROUPS enabled in your kernel?\n");
>> - free(mnt);
>> - return NULL;
>> + } else {
>> + fprintf(stderr,
>> + "Failed to mount cgroup2: %s\n",
>> + strerror(errno));
>> }
>
> How about a *switch* instead?
I did consider it. Did not make the code simpler or easier to read.
^ permalink raw reply
* [PATCH net-next] net: ipv6: put autoconf routes into per-interface tables
From: Lorenzo Colitti @ 2017-01-06 15:30 UTC (permalink / raw)
To: netdev
Cc: zenczykowski, hannes, ek, hideaki.yoshifuji, davem, dsa, drosen,
Lorenzo Colitti
Currently, IPv6 router discovery always puts routes into
RT6_TABLE_MAIN. This makes it difficult to maintain and switch
between multiple simultaneous network connections (e.g., wifi
and wired).
To work around this connection managers typically either move
autoconfiguration to userspace entirely (e.g., dhcpcd) or take
the routes they want and re-add them to the main table as static
routes with low metrics (e.g., NetworkManager). This puts the
burden on the connection manager to watch netlink or listen to
RAs to see if the routes have changed, delete the routes when
their lifetime expires, etc. This is complex and often not
implemented correctly.
This patch adds a per-interface sysctl to have the kernel put
autoconf routes into different tables. This allows each interface
to have its own routing table if desired. Choosing the default
interface, or using different interfaces at the same time on a
per-socket or per-packet basis) can be done using policy routing
mechanisms that use as SO_BINDTODEVICE / IPV6_PKTINFO, mark-based
routing, or UID-based routing to select specific routing tables.
The sysctl behaves as follows:
- = 0: default. Put routes into RT6_TABLE_MAIN if the interface
is not in a VRF, or into the VRF table if it is.
- > 0: manual. Put routes into the specified table.
- < 0: automatic. Add the absolute value of the sysctl to the
device's ifindex, and use that table.
The automatic mode is most useful in conjunction with
net.ipv6.conf.default.accept_ra_rt_table. A connection manager
or distribution can set this to, say, -1000 on boot, and
thereafter know that routes received on every interface will
always be in that interface's routing table, and that the mapping
between interfaces and routing tables is deterministic. It also
ensures that if an interface is created and immediately receives
an RA, the route will go into the correct routing table without
needing any intervention from userspace.
The automatic mode (with conf.default.accept_ra_rt_table = -1000)
has been used in Android since 5.0.
Tested: compiles allnoconfig, allyesconfig, allmodconfig
Tested: passes existing Android kernel unit tests
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
Documentation/networking/ip-sysctl.txt | 13 +++++++++++
include/linux/ipv6.h | 1 +
include/net/addrconf.h | 2 ++
include/uapi/linux/ipv6.h | 1 +
net/ipv6/addrconf.c | 40 +++++++++++++++++++++++++++++++---
net/ipv6/route.c | 11 +++++-----
6 files changed, 59 insertions(+), 9 deletions(-)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 7dd65c9cf7..d1311d8f33 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1471,6 +1471,19 @@ accept_ra_rt_info_max_plen - INTEGER
Functional default: 0 if accept_ra_rtr_pref is enabled.
-1 if accept_ra_rtr_pref is disabled.
+accept_ra_rt_table - INTEGER
+ Which table to put routes created by Router Advertisements into.
+
+ = 0: Use the main table if the device is not in a VRF, and the
+ VRF table if it is.
+ > 0: Use the specified table.
+ < 0: Add the absolute value to the receiving interface index,
+ and use that table. For example, if set to -1000, an RA
+ received on interface index 4 will create routes in
+ table 1004.
+
+ Default: 0
+
accept_ra_rtr_pref - BOOLEAN
Accept Router Preference in RA.
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 671d014e64..55d75074aa 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -69,6 +69,7 @@ struct ipv6_devconf {
__s32 seg6_require_hmac;
#endif
__u32 enhanced_dad;
+ __s32 accept_ra_rt_table;
struct ctl_table_header *sysctl_header;
};
diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 8f998afc13..e1bd2bc027 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -242,6 +242,8 @@ static inline bool ipv6_is_mld(struct sk_buff *skb, int nexthdr, int offset)
void addrconf_prefix_rcv(struct net_device *dev,
u8 *opt, int len, bool sllao);
+u32 addrconf_rt_table(const struct net_device *dev, u32 default_table);
+
/*
* anycast prototypes (anycast.c)
*/
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index eaf65dc82e..95c3553242 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -182,6 +182,7 @@ enum {
DEVCONF_SEG6_ENABLED,
DEVCONF_SEG6_REQUIRE_HMAC,
DEVCONF_ENHANCED_DAD,
+ DEVCONF_ACCEPT_RA_RT_TABLE,
DEVCONF_MAX
};
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index c1e124bc8e..d4a6b877f8 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -243,6 +243,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
.seg6_require_hmac = 0,
#endif
.enhanced_dad = 1,
+ .accept_ra_rt_table = 0,
};
static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
@@ -294,6 +295,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
.seg6_require_hmac = 0,
#endif
.enhanced_dad = 1,
+ .accept_ra_rt_table = 0,
};
/* Check if a valid qdisc is available */
@@ -2210,6 +2212,30 @@ static void ipv6_try_regen_rndid(struct inet6_dev *idev, struct in6_addr *tmpad
ipv6_regen_rndid(idev);
}
+#ifdef CONFIG_IPV6_MULTIPLE_TABLES
+u32 addrconf_rt_table(const struct net_device *dev, u32 default_table)
+{
+ struct inet6_dev *idev = in6_dev_get(dev);
+ u32 table;
+ int sysctl = idev->cnf.accept_ra_rt_table;
+
+ if (sysctl == 0)
+ table = l3mdev_fib_table(dev) ? : default_table;
+ else if (sysctl > 0)
+ table = (u32)sysctl;
+ else
+ table = (unsigned int)dev->ifindex + (-sysctl);
+
+ in6_dev_put(idev);
+ return table;
+}
+#else
+u32 addrconf_rt_table(const struct net_device *dev, u32 default_table)
+{
+ return RT6_TABLE_DFLT;
+}
+#endif
+
/*
* Add prefix route.
*/
@@ -2219,7 +2245,7 @@ addrconf_prefix_route(struct in6_addr *pfx, int plen, struct net_device *dev,
unsigned long expires, u32 flags)
{
struct fib6_config cfg = {
- .fc_table = l3mdev_fib_table(dev) ? : RT6_TABLE_PREFIX,
+ .fc_table = addrconf_rt_table(dev, RT6_TABLE_PREFIX),
.fc_metric = IP6_RT_PRIO_ADDRCONF,
.fc_ifindex = dev->ifindex,
.fc_expires = expires,
@@ -2252,9 +2278,9 @@ static struct rt6_info *addrconf_get_prefix_route(const struct in6_addr *pfx,
struct fib6_node *fn;
struct rt6_info *rt = NULL;
struct fib6_table *table;
- u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_PREFIX;
- table = fib6_get_table(dev_net(dev), tb_id);
+ table = fib6_get_table(dev_net(dev),
+ addrconf_rt_table(dev, RT6_TABLE_PREFIX));
if (!table)
return NULL;
@@ -4975,6 +5001,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
array[DEVCONF_SEG6_REQUIRE_HMAC] = cnf->seg6_require_hmac;
#endif
array[DEVCONF_ENHANCED_DAD] = cnf->enhanced_dad;
+ array[DEVCONF_ACCEPT_RA_RT_TABLE] = cnf->accept_ra_rt_table;
}
static inline size_t inet6_ifla6_size(void)
@@ -6090,6 +6117,13 @@ static const struct ctl_table addrconf_sysctl[] = {
.proc_handler = proc_dointvec,
},
{
+ .procname = "accept_ra_rt_table",
+ .data = &ipv6_devconf.accept_ra_rt_table,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
/* sentinel */
}
};
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8417c41d8e..86469ec27f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2345,13 +2345,12 @@ static struct rt6_info *rt6_get_route_info(struct net *net,
const struct in6_addr *gwaddr,
struct net_device *dev)
{
- u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_INFO;
int ifindex = dev->ifindex;
struct fib6_node *fn;
struct rt6_info *rt = NULL;
struct fib6_table *table;
- table = fib6_get_table(net, tb_id);
+ table = fib6_get_table(net, addrconf_rt_table(dev, RT6_TABLE_INFO));
if (!table)
return NULL;
@@ -2392,7 +2391,7 @@ static struct rt6_info *rt6_add_route_info(struct net *net,
.fc_nlinfo.nl_net = net,
};
- cfg.fc_table = l3mdev_fib_table(dev) ? : RT6_TABLE_INFO,
+ cfg.fc_table = addrconf_rt_table(dev, RT6_TABLE_INFO);
cfg.fc_dst = *prefix;
cfg.fc_gateway = *gwaddr;
@@ -2408,11 +2407,11 @@ static struct rt6_info *rt6_add_route_info(struct net *net,
struct rt6_info *rt6_get_dflt_router(const struct in6_addr *addr, struct net_device *dev)
{
- u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_DFLT;
struct rt6_info *rt;
struct fib6_table *table;
- table = fib6_get_table(dev_net(dev), tb_id);
+ table = fib6_get_table(dev_net(dev),
+ addrconf_rt_table(dev, RT6_TABLE_DFLT));
if (!table)
return NULL;
@@ -2434,7 +2433,7 @@ struct rt6_info *rt6_add_dflt_router(const struct in6_addr *gwaddr,
unsigned int pref)
{
struct fib6_config cfg = {
- .fc_table = l3mdev_fib_table(dev) ? : RT6_TABLE_DFLT,
+ .fc_table = addrconf_rt_table(dev, RT6_TABLE_DFLT),
.fc_metric = IP6_RT_PRIO_USER,
.fc_ifindex = dev->ifindex,
.fc_flags = RTF_GATEWAY | RTF_ADDRCONF | RTF_DEFAULT |
--
2.11.0.390.gc69c2f50cf-goog
^ permalink raw reply related
* RE: [PATCH 2/3] xen: modify xenstore watch event interface
From: Paul Durrant @ 2017-01-06 15:38 UTC (permalink / raw)
To: Juergen Gross, linux-kernel@vger.kernel.org,
xen-devel@lists.xenproject.org
Cc: boris.ostrovsky@oracle.com, konrad.wilk@oracle.com,
Roger Pau Monne, Wei Liu, netdev@vger.kernel.org
In-Reply-To: <20170106150544.10836-3-jgross@suse.com>
> -----Original Message-----
> From: Juergen Gross [mailto:jgross@suse.com]
> Sent: 06 January 2017 15:06
> To: linux-kernel@vger.kernel.org; xen-devel@lists.xenproject.org
> Cc: boris.ostrovsky@oracle.com; Juergen Gross <jgross@suse.com>;
> konrad.wilk@oracle.com; Roger Pau Monne <roger.pau@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>;
> netdev@vger.kernel.org
> Subject: [PATCH 2/3] xen: modify xenstore watch event interface
>
> Today a Xenstore watch event is delivered via a callback function
> declared as:
>
> void (*callback)(struct xenbus_watch *,
> const char **vec, unsigned int len);
>
> As all watch events only ever come with two parameters (path and token)
> changing the prototype to:
>
> void (*callback)(struct xenbus_watch *,
> const char *path, const char *token);
>
> is the natural thing to do.
>
> Apply this change and adapt all users.
>
> Cc: konrad.wilk@oracle.com
> Cc: roger.pau@citrix.com
> Cc: wei.liu2@citrix.com
> Cc: paul.durrant@citrix.com
> Cc: netdev@vger.kernel.org
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
xen-netback changes...
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
> ---
> drivers/block/xen-blkback/xenbus.c | 6 +++---
> drivers/net/xen-netback/xenbus.c | 8 ++++----
> drivers/xen/cpu_hotplug.c | 5 ++---
> drivers/xen/manage.c | 6 +++---
> drivers/xen/xen-balloon.c | 2 +-
> drivers/xen/xen-pciback/xenbus.c | 2 +-
> drivers/xen/xenbus/xenbus.h | 6 +++---
> drivers/xen/xenbus/xenbus_client.c | 4 ++--
> drivers/xen/xenbus/xenbus_dev_frontend.c | 21 ++++++++-------------
> drivers/xen/xenbus/xenbus_probe.c | 11 ++++-------
> drivers/xen/xenbus/xenbus_probe_backend.c | 8 ++++----
> drivers/xen/xenbus/xenbus_probe_frontend.c | 14 +++++++-------
> drivers/xen/xenbus/xenbus_xs.c | 29 ++++++++++++++---------------
> include/xen/xenbus.h | 6 +++---
> 14 files changed, 59 insertions(+), 69 deletions(-)
>
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-
> blkback/xenbus.c
> index 415e79b..8fe61b5 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -38,8 +38,8 @@ struct backend_info {
> static struct kmem_cache *xen_blkif_cachep;
> static void connect(struct backend_info *);
> static int connect_ring(struct backend_info *);
> -static void backend_changed(struct xenbus_watch *, const char **,
> - unsigned int);
> +static void backend_changed(struct xenbus_watch *, const char *,
> + const char *);
> static void xen_blkif_free(struct xen_blkif *blkif);
> static void xen_vbd_free(struct xen_vbd *vbd);
>
> @@ -661,7 +661,7 @@ static int xen_blkbk_probe(struct xenbus_device
> *dev,
> * ready, connect.
> */
> static void backend_changed(struct xenbus_watch *watch,
> - const char **vec, unsigned int len)
> + const char *path, const char *token)
> {
> int err;
> unsigned major;
> diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-
> netback/xenbus.c
> index 3124eae..d8a40fa 100644
> --- a/drivers/net/xen-netback/xenbus.c
> +++ b/drivers/net/xen-netback/xenbus.c
> @@ -723,7 +723,7 @@ static int xen_net_read_mac(struct xenbus_device
> *dev, u8 mac[])
> }
>
> static void xen_net_rate_changed(struct xenbus_watch *watch,
> - const char **vec, unsigned int len)
> + const char *path, const char *token)
> {
> struct xenvif *vif = container_of(watch, struct xenvif, credit_watch);
> struct xenbus_device *dev = xenvif_to_xenbus_device(vif);
> @@ -780,7 +780,7 @@ static void xen_unregister_credit_watch(struct xenvif
> *vif)
> }
>
> static void xen_mcast_ctrl_changed(struct xenbus_watch *watch,
> - const char **vec, unsigned int len)
> + const char *path, const char *token)
> {
> struct xenvif *vif = container_of(watch, struct xenvif,
> mcast_ctrl_watch);
> @@ -855,8 +855,8 @@ static void unregister_hotplug_status_watch(struct
> backend_info *be)
> }
>
> static void hotplug_status_changed(struct xenbus_watch *watch,
> - const char **vec,
> - unsigned int vec_size)
> + const char *path,
> + const char *token)
> {
> struct backend_info *be = container_of(watch,
> struct backend_info,
> diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c
> index 5676aef..7a4daa2 100644
> --- a/drivers/xen/cpu_hotplug.c
> +++ b/drivers/xen/cpu_hotplug.c
> @@ -68,13 +68,12 @@ static void vcpu_hotplug(unsigned int cpu)
> }
>
> static void handle_vcpu_hotplug_event(struct xenbus_watch *watch,
> - const char **vec, unsigned int len)
> + const char *path, const char *token)
> {
> unsigned int cpu;
> char *cpustr;
> - const char *node = vec[XS_WATCH_PATH];
>
> - cpustr = strstr(node, "cpu/");
> + cpustr = strstr(path, "cpu/");
> if (cpustr != NULL) {
> sscanf(cpustr, "cpu/%u", &cpu);
> vcpu_hotplug(cpu);
> diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
> index 26e5e85..ca62c09 100644
> --- a/drivers/xen/manage.c
> +++ b/drivers/xen/manage.c
> @@ -218,7 +218,7 @@ static struct shutdown_handler shutdown_handlers[]
> = {
> };
>
> static void shutdown_handler(struct xenbus_watch *watch,
> - const char **vec, unsigned int len)
> + const char *path, const char *token)
> {
> char *str;
> struct xenbus_transaction xbt;
> @@ -266,8 +266,8 @@ static void shutdown_handler(struct xenbus_watch
> *watch,
> }
>
> #ifdef CONFIG_MAGIC_SYSRQ
> -static void sysrq_handler(struct xenbus_watch *watch, const char **vec,
> - unsigned int len)
> +static void sysrq_handler(struct xenbus_watch *watch, const char *path,
> + const char *token)
> {
> char sysrq_key = '\0';
> struct xenbus_transaction xbt;
> diff --git a/drivers/xen/xen-balloon.c b/drivers/xen/xen-balloon.c
> index 79865b8..e7715cb 100644
> --- a/drivers/xen/xen-balloon.c
> +++ b/drivers/xen/xen-balloon.c
> @@ -55,7 +55,7 @@ static int register_balloon(struct device *dev);
>
> /* React to a change in the target key */
> static void watch_target(struct xenbus_watch *watch,
> - const char **vec, unsigned int len)
> + const char *path, const char *token)
> {
> unsigned long long new_target;
> int err;
> diff --git a/drivers/xen/xen-pciback/xenbus.c b/drivers/xen/xen-
> pciback/xenbus.c
> index 3f0aee0..3814b44 100644
> --- a/drivers/xen/xen-pciback/xenbus.c
> +++ b/drivers/xen/xen-pciback/xenbus.c
> @@ -652,7 +652,7 @@ static int xen_pcibk_setup_backend(struct
> xen_pcibk_device *pdev)
> }
>
> static void xen_pcibk_be_watch(struct xenbus_watch *watch,
> - const char **vec, unsigned int len)
> + const char *path, const char *token)
> {
> struct xen_pcibk_device *pdev =
> container_of(watch, struct xen_pcibk_device, be_watch);
> diff --git a/drivers/xen/xenbus/xenbus.h b/drivers/xen/xenbus/xenbus.h
> index 6a80c1e..bd95c21 100644
> --- a/drivers/xen/xenbus/xenbus.h
> +++ b/drivers/xen/xenbus/xenbus.h
> @@ -39,8 +39,8 @@ struct xen_bus_type {
> int (*get_bus_id)(char bus_id[XEN_BUS_ID_SIZE], const char
> *nodename);
> int (*probe)(struct xen_bus_type *bus, const char *type,
> const char *dir);
> - void (*otherend_changed)(struct xenbus_watch *watch, const char
> **vec,
> - unsigned int len);
> + void (*otherend_changed)(struct xenbus_watch *watch, const char
> *path,
> + const char *token);
> struct bus_type bus;
> };
>
> @@ -83,7 +83,7 @@ int xenbus_dev_resume(struct device *dev);
> int xenbus_dev_cancel(struct device *dev);
>
> void xenbus_otherend_changed(struct xenbus_watch *watch,
> - const char **vec, unsigned int len,
> + const char *path, const char *token,
> int ignore_on_shutdown);
>
> int xenbus_read_otherend_details(struct xenbus_device *xendev,
> diff --git a/drivers/xen/xenbus/xenbus_client.c
> b/drivers/xen/xenbus/xenbus_client.c
> index 23edf53..9586c24 100644
> --- a/drivers/xen/xenbus/xenbus_client.c
> +++ b/drivers/xen/xenbus/xenbus_client.c
> @@ -115,7 +115,7 @@ EXPORT_SYMBOL_GPL(xenbus_strstate);
> int xenbus_watch_path(struct xenbus_device *dev, const char *path,
> struct xenbus_watch *watch,
> void (*callback)(struct xenbus_watch *,
> - const char **, unsigned int))
> + const char *, const char *))
> {
> int err;
>
> @@ -153,7 +153,7 @@ EXPORT_SYMBOL_GPL(xenbus_watch_path);
> int xenbus_watch_pathfmt(struct xenbus_device *dev,
> struct xenbus_watch *watch,
> void (*callback)(struct xenbus_watch *,
> - const char **, unsigned int),
> + const char *, const char *),
> const char *pathfmt, ...)
> {
> int err;
> diff --git a/drivers/xen/xenbus/xenbus_dev_frontend.c
> b/drivers/xen/xenbus/xenbus_dev_frontend.c
> index e2bc9b3..e4b9847 100644
> --- a/drivers/xen/xenbus/xenbus_dev_frontend.c
> +++ b/drivers/xen/xenbus/xenbus_dev_frontend.c
> @@ -258,26 +258,23 @@ static struct watch_adapter
> *alloc_watch_adapter(const char *path,
> }
>
> static void watch_fired(struct xenbus_watch *watch,
> - const char **vec,
> - unsigned int len)
> + const char *path,
> + const char *token)
> {
> struct watch_adapter *adap;
> struct xsd_sockmsg hdr;
> - const char *path, *token;
> - int path_len, tok_len, body_len, data_len = 0;
> + const char *token_caller;
> + int path_len, tok_len, body_len;
> int ret;
> LIST_HEAD(staging_q);
>
> adap = container_of(watch, struct watch_adapter, watch);
>
> - path = vec[XS_WATCH_PATH];
> - token = adap->token;
> + token_caller = adap->token;
>
> path_len = strlen(path) + 1;
> - tok_len = strlen(token) + 1;
> - if (len > 2)
> - data_len = vec[len] - vec[2] + 1;
> - body_len = path_len + tok_len + data_len;
> + tok_len = strlen(token_caller) + 1;
> + body_len = path_len + tok_len;
>
> hdr.type = XS_WATCH_EVENT;
> hdr.len = body_len;
> @@ -288,9 +285,7 @@ static void watch_fired(struct xenbus_watch *watch,
> if (!ret)
> ret = queue_reply(&staging_q, path, path_len);
> if (!ret)
> - ret = queue_reply(&staging_q, token, tok_len);
> - if (!ret && len > 2)
> - ret = queue_reply(&staging_q, vec[2], data_len);
> + ret = queue_reply(&staging_q, token_caller, tok_len);
>
> if (!ret) {
> /* success: pass reply list onto watcher */
> diff --git a/drivers/xen/xenbus/xenbus_probe.c
> b/drivers/xen/xenbus/xenbus_probe.c
> index 6baffbb..74888ca 100644
> --- a/drivers/xen/xenbus/xenbus_probe.c
> +++ b/drivers/xen/xenbus/xenbus_probe.c
> @@ -169,7 +169,7 @@ int xenbus_read_otherend_details(struct
> xenbus_device *xendev,
> EXPORT_SYMBOL_GPL(xenbus_read_otherend_details);
>
> void xenbus_otherend_changed(struct xenbus_watch *watch,
> - const char **vec, unsigned int len,
> + const char *path, const char *token,
> int ignore_on_shutdown)
> {
> struct xenbus_device *dev =
> @@ -180,18 +180,15 @@ void xenbus_otherend_changed(struct
> xenbus_watch *watch,
> /* Protect us against watches firing on old details when the otherend
> details change, say immediately after a resume. */
> if (!dev->otherend ||
> - strncmp(dev->otherend, vec[XS_WATCH_PATH],
> - strlen(dev->otherend))) {
> - dev_dbg(&dev->dev, "Ignoring watch at %s\n",
> - vec[XS_WATCH_PATH]);
> + strncmp(dev->otherend, path, strlen(dev->otherend))) {
> + dev_dbg(&dev->dev, "Ignoring watch at %s\n", path);
> return;
> }
>
> state = xenbus_read_driver_state(dev->otherend);
>
> dev_dbg(&dev->dev, "state is %d, (%s), %s, %s\n",
> - state, xenbus_strstate(state), dev->otherend_watch.node,
> - vec[XS_WATCH_PATH]);
> + state, xenbus_strstate(state), dev->otherend_watch.node,
> path);
>
> /*
> * Ignore xenbus transitions during shutdown. This prevents us doing
> diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c
> b/drivers/xen/xenbus/xenbus_probe_backend.c
> index f46b4dc..b0bed4f 100644
> --- a/drivers/xen/xenbus/xenbus_probe_backend.c
> +++ b/drivers/xen/xenbus/xenbus_probe_backend.c
> @@ -181,9 +181,9 @@ static int xenbus_probe_backend(struct
> xen_bus_type *bus, const char *type,
> }
>
> static void frontend_changed(struct xenbus_watch *watch,
> - const char **vec, unsigned int len)
> + const char *path, const char *token)
> {
> - xenbus_otherend_changed(watch, vec, len, 0);
> + xenbus_otherend_changed(watch, path, token, 0);
> }
>
> static struct xen_bus_type xenbus_backend = {
> @@ -204,11 +204,11 @@ static struct xen_bus_type xenbus_backend = {
> };
>
> static void backend_changed(struct xenbus_watch *watch,
> - const char **vec, unsigned int len)
> + const char *path, const char *token)
> {
> DPRINTK("");
>
> - xenbus_dev_changed(vec[XS_WATCH_PATH], &xenbus_backend);
> + xenbus_dev_changed(path, &xenbus_backend);
> }
>
> static struct xenbus_watch be_watch = {
> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c
> b/drivers/xen/xenbus/xenbus_probe_frontend.c
> index d7b77a6..19e45ce 100644
> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c
> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
> @@ -86,9 +86,9 @@ static int xenbus_uevent_frontend(struct device *_dev,
>
>
> static void backend_changed(struct xenbus_watch *watch,
> - const char **vec, unsigned int len)
> + const char *path, const char *token)
> {
> - xenbus_otherend_changed(watch, vec, len, 1);
> + xenbus_otherend_changed(watch, path, token, 1);
> }
>
> static void xenbus_frontend_delayed_resume(struct work_struct *w)
> @@ -153,11 +153,11 @@ static struct xen_bus_type xenbus_frontend = {
> };
>
> static void frontend_changed(struct xenbus_watch *watch,
> - const char **vec, unsigned int len)
> + const char *path, const char *token)
> {
> DPRINTK("");
>
> - xenbus_dev_changed(vec[XS_WATCH_PATH], &xenbus_frontend);
> + xenbus_dev_changed(path, &xenbus_frontend);
> }
>
>
> @@ -332,13 +332,13 @@ static
> DECLARE_WAIT_QUEUE_HEAD(backend_state_wq);
> static int backend_state;
>
> static void xenbus_reset_backend_state_changed(struct xenbus_watch *w,
> - const char **v, unsigned int l)
> + const char *path, const char *token)
> {
> - if (xenbus_scanf(XBT_NIL, v[XS_WATCH_PATH], "", "%i",
> + if (xenbus_scanf(XBT_NIL, path, "", "%i",
> &backend_state) != 1)
> backend_state = XenbusStateUnknown;
> printk(KERN_DEBUG "XENBUS: backend %s %s\n",
> - v[XS_WATCH_PATH],
> xenbus_strstate(backend_state));
> + path, xenbus_strstate(backend_state));
> wake_up(&backend_state_wq);
> }
>
> diff --git a/drivers/xen/xenbus/xenbus_xs.c
> b/drivers/xen/xenbus/xenbus_xs.c
> index 4c49d87..ebc768f 100644
> --- a/drivers/xen/xenbus/xenbus_xs.c
> +++ b/drivers/xen/xenbus/xenbus_xs.c
> @@ -64,8 +64,8 @@ struct xs_stored_msg {
> /* Queued watch events. */
> struct {
> struct xenbus_watch *handle;
> - char **vec;
> - unsigned int vec_size;
> + const char *path;
> + const char *token;
> } watch;
> } u;
> };
> @@ -765,7 +765,7 @@ void unregister_xenbus_watch(struct xenbus_watch
> *watch)
> if (msg->u.watch.handle != watch)
> continue;
> list_del(&msg->list);
> - kfree(msg->u.watch.vec);
> + kfree(msg->u.watch.path);
> kfree(msg);
> }
> spin_unlock(&watch_events_lock);
> @@ -833,11 +833,10 @@ static int xenwatch_thread(void *unused)
>
> if (ent != &watch_events) {
> msg = list_entry(ent, struct xs_stored_msg, list);
> - msg->u.watch.handle->callback(
> - msg->u.watch.handle,
> - (const char **)msg->u.watch.vec,
> - msg->u.watch.vec_size);
> - kfree(msg->u.watch.vec);
> + msg->u.watch.handle->callback(msg-
> >u.watch.handle,
> + msg->u.watch.path,
> + msg->u.watch.token);
> + kfree(msg->u.watch.path);
> kfree(msg);
> }
>
> @@ -903,24 +902,24 @@ static int process_msg(void)
> body[msg->hdr.len] = '\0';
>
> if (msg->hdr.type == XS_WATCH_EVENT) {
> - msg->u.watch.vec = split(body, msg->hdr.len,
> - &msg->u.watch.vec_size);
> - if (IS_ERR(msg->u.watch.vec)) {
> - err = PTR_ERR(msg->u.watch.vec);
> + if (count_strings(body, msg->hdr.len) != 2) {
> + err = -EINVAL;
> kfree(msg);
> + kfree(body);
> goto out;
> }
> + msg->u.watch.path = (const char *)body;
> + msg->u.watch.token = (const char *)strchr(body, '\0') + 1;
>
> spin_lock(&watches_lock);
> - msg->u.watch.handle = find_watch(
> - msg->u.watch.vec[XS_WATCH_TOKEN]);
> + msg->u.watch.handle = find_watch(msg->u.watch.token);
> if (msg->u.watch.handle != NULL) {
> spin_lock(&watch_events_lock);
> list_add_tail(&msg->list, &watch_events);
> wake_up(&watch_events_waitq);
> spin_unlock(&watch_events_lock);
> } else {
> - kfree(msg->u.watch.vec);
> + kfree(body);
> kfree(msg);
> }
> spin_unlock(&watches_lock);
> diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
> index 98f73a2..869c816 100644
> --- a/include/xen/xenbus.h
> +++ b/include/xen/xenbus.h
> @@ -61,7 +61,7 @@ struct xenbus_watch
>
> /* Callback (executed in a process context with no locks held). */
> void (*callback)(struct xenbus_watch *,
> - const char **vec, unsigned int len);
> + const char *path, const char *token);
> };
>
>
> @@ -193,11 +193,11 @@ void xenbus_probe(struct work_struct *);
> int xenbus_watch_path(struct xenbus_device *dev, const char *path,
> struct xenbus_watch *watch,
> void (*callback)(struct xenbus_watch *,
> - const char **, unsigned int));
> + const char *, const char *));
> __printf(4, 5)
> int xenbus_watch_pathfmt(struct xenbus_device *dev, struct
> xenbus_watch *watch,
> void (*callback)(struct xenbus_watch *,
> - const char **, unsigned int),
> + const char *, const char *),
> const char *pathfmt, ...);
>
> int xenbus_switch_state(struct xenbus_device *dev, enum xenbus_state
> new_state);
> --
> 2.10.2
^ permalink raw reply
* Re: [PATCH net-next] net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering.
From: Alexander Duyck @ 2017-01-06 15:41 UTC (permalink / raw)
To: Mao Wenan; +Cc: Netdev, Jeff Kirsher
In-Reply-To: <1483696364-8680-1-git-send-email-maowenan@huawei.com>
On Fri, Jan 6, 2017 at 1:52 AM, Mao Wenan <maowenan@huawei.com> wrote:
> Relax ordering(RO) is one feature of 82599 NIC, to enable this feature can
> enhance the performance for some cpu architecure, such as SPARC and so on.
> Currently it only supports one special cpu architecture(SPARC) in 82599
> driver to enable RO feature, this is not very common for other cpu architecture
> which really needs RO feature.
> This patch add one common config CONFIG_ARCH_WANT_RELAX_ORDER to set RO feature,
> and should define CONFIG_ARCH_WANT_RELAX_ORDER in sparc Kconfig firstly.
>
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> ---
> arch/sparc/Kconfig | 1 +
> drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +-
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
> index cf4034c..68ac5c7 100644
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -44,6 +44,7 @@ config SPARC
> select CPU_NO_EFFICIENT_FFS
> select HAVE_ARCH_HARDENED_USERCOPY
> select PROVE_LOCKING_SMALL if PROVE_LOCKING
> + select ARCH_WANT_RELAX_ORDER
>
> config SPARC32
> def_bool !64BIT
I'm pretty sure this is incomplete. I think you need to add a couple
lines to arch/Kconfig so that the config option itself is listed
somewhere. You might look at using something like HAVE_CMPXCHG_DOUBLE
as an example.
- Alex
^ permalink raw reply
* RE: [PATCHv3 net-next] sctp: prepare asoc stream for stream reconf
From: David Laight @ 2017-01-06 15:50 UTC (permalink / raw)
To: 'Xin Long', network dev, linux-sctp@vger.kernel.org
Cc: Marcelo Ricardo Leitner, Neil Horman, Vlad Yasevich,
davem@davemloft.net
In-Reply-To: <efd6462731ca0b18a3039f9537dda61e0ed72430.1483712313.git.lucien.xin@gmail.com>
From: Xin Long
> Sent: 06 January 2017 14:19
> sctp stream reconf, described in RFC 6525, needs a structure to
> save per stream information in assoc, like stream state.
>
> In the future, sctp stream scheduler also needs it to save some
> stream scheduler params and queues.
>
> This patchset is to prepare the stream array in assoc for stream
> reconf. It defines sctp_stream that includes stream arrays inside
> to replace ssnmap.
>
> Note that we use different structures for IN and OUT streams, as
> the members in per OUT stream will get more and more different
> from per IN stream.
...
> /* What is the current SSN number for this stream? */
> -static inline __u16 sctp_ssn_peek(struct sctp_stream *stream, __u16 id)
> -{
> - return stream->ssn[id];
> -}
> +#define sctp_ssn_peek(stream, type, sid) \
> + ((stream)->type[sid].ssn)
>
> /* Return the next SSN number for this stream. */
> -static inline __u16 sctp_ssn_next(struct sctp_stream *stream, __u16 id)
> -{
> - return stream->ssn[id]++;
> -}
> +#define sctp_ssn_next(stream, type, sid) \
> + ((stream)->type[sid].ssn++)
>
> /* Skip over this ssn and all below. */
> -static inline void sctp_ssn_skip(struct sctp_stream *stream, __u16 id,
> - __u16 ssn)
> -{
> - stream->ssn[id] = ssn+1;
> -}
> -
> +#define sctp_ssn_skip(stream, type, sid, ssn) \
> + ((stream)->type[sid].ssn = ssn + 1)
...
Is there any reason to convert these from inline functions to #defines?
Inline functions give better type checking and are usually preferred.
David
^ permalink raw reply
* RE: [PATCH v2] net: stmmac: fix maxmtu assignment to be within valid range
From: Kweh, Hock Leong @ 2017-01-06 15:55 UTC (permalink / raw)
To: David S. Miller, Joao Pinto, Giuseppe CAVALLARO,
seraphin.bonnaffe@st.com, Jarod Wilson, Andy Shevchenko
Cc: Alexandre TORGUE, Joachim Eastwood, Niklas Cassel, Johan Hovold,
pavel@ucw.cz, lars.persson@axis.com, netdev, LKML
In-Reply-To: <1483697306-10063-1-git-send-email-hock.leong.kweh@intel.com>
> -----Original Message-----
> From: Kweh, Hock Leong
> Sent: Friday, January 06, 2017 6:08 PM
> To: David S. Miller <davem@davemloft.net>; Joao Pinto
> <Joao.Pinto@synopsys.com>; Giuseppe CAVALLARO <peppe.cavallaro@st.com>;
> seraphin.bonnaffe@st.com; Jarod Wilson <jarod@redhat.com>; Andy
> Shevchenko <andy.shevchenko@gmail.com>
> Cc: Alexandre TORGUE <alexandre.torgue@gmail.com>; Joachim Eastwood
> <manabian@gmail.com>; Niklas Cassel <niklas.cassel@axis.com>; Johan Hovold
> <johan@kernel.org>; pavel@ucw.cz; Kweh, Hock Leong
> <hock.leong.kweh@intel.com>; lars.persson@axis.com; netdev
> <netdev@vger.kernel.org>; LKML <linux-kernel@vger.kernel.org>
> Subject: [PATCH v2] net: stmmac: fix maxmtu assignment to be within valid
> range
>
> From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>
>
> There is no checking valid value of maxmtu when getting it from device tree.
> This resolution added the checking condition to ensure the assignment is made
> within a valid range.
>
> Signed-off-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>
I am going to submit V3.
> ---
> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 92ac006..4df555e 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -3345,8 +3345,14 @@ int stmmac_dvr_probe(struct device *device,
> ndev->max_mtu = JUMBO_LEN;
> else
> ndev->max_mtu = SKB_MAX_HEAD(NET_SKB_PAD +
> NET_IP_ALIGN);
> - if (priv->plat->maxmtu < ndev->max_mtu)
> +
> + if ((priv->plat->maxmtu < ndev->max_mtu) &&
> + (priv->plat->maxmtu >= ndev->min_mtu))
> ndev->max_mtu = priv->plat->maxmtu;
> + else if (priv->plat->maxmtu != 0)
> + netdev_warn(priv->dev,
> + "%s: warning: maxmtu having invalid value (%d)\n",
> + __func__, priv->plat->maxmtu);
>
> if (flow_ctrl)
> priv->flow_ctrl = FLOW_AUTO; /* RX/TX pause on */
> --
> 1.7.9.5
^ permalink raw reply
* Re: [PATCHv3 net-next] sctp: prepare asoc stream for stream reconf
From: Marcelo Ricardo Leitner @ 2017-01-06 15:56 UTC (permalink / raw)
To: David Laight
Cc: 'Xin Long', network dev, linux-sctp@vger.kernel.org,
Neil Horman, Vlad Yasevich, davem@davemloft.net
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB025926D@AcuExch.aculab.com>
On Fri, Jan 06, 2017 at 03:50:36PM +0000, David Laight wrote:
> From: Xin Long
> > Sent: 06 January 2017 14:19
> > sctp stream reconf, described in RFC 6525, needs a structure to
> > save per stream information in assoc, like stream state.
> >
> > In the future, sctp stream scheduler also needs it to save some
> > stream scheduler params and queues.
> >
> > This patchset is to prepare the stream array in assoc for stream
> > reconf. It defines sctp_stream that includes stream arrays inside
> > to replace ssnmap.
> >
> > Note that we use different structures for IN and OUT streams, as
> > the members in per OUT stream will get more and more different
> > from per IN stream.
> ...
> > /* What is the current SSN number for this stream? */
> > -static inline __u16 sctp_ssn_peek(struct sctp_stream *stream, __u16 id)
> > -{
> > - return stream->ssn[id];
> > -}
> > +#define sctp_ssn_peek(stream, type, sid) \
> > + ((stream)->type[sid].ssn)
> >
> > /* Return the next SSN number for this stream. */
> > -static inline __u16 sctp_ssn_next(struct sctp_stream *stream, __u16 id)
> > -{
> > - return stream->ssn[id]++;
> > -}
> > +#define sctp_ssn_next(stream, type, sid) \
> > + ((stream)->type[sid].ssn++)
> >
> > /* Skip over this ssn and all below. */
> > -static inline void sctp_ssn_skip(struct sctp_stream *stream, __u16 id,
> > - __u16 ssn)
> > -{
> > - stream->ssn[id] = ssn+1;
> > -}
> > -
> > +#define sctp_ssn_skip(stream, type, sid, ssn) \
> > + ((stream)->type[sid].ssn = ssn + 1)
> ...
>
> Is there any reason to convert these from inline functions to #defines?
> Inline functions give better type checking and are usually preferred.
Yes, it's to avoid specializing these and also avoid a condition in
them. Now inbound and outbound streams are handled by different structs.
Please see the new struct sctp_stream definition.
Marcelo
^ permalink raw reply
* [next PATCH 00/11] ixgbe: Add support for writable pages and build_skb
From: Alexander Duyck @ 2017-01-06 16:06 UTC (permalink / raw)
To: intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev
This patch set enables support for using the recent changes that allow for
unmapping pages without invalidating their contents via
DMA_ATTR_SKIP_CPU_SYNC. With this change DMA pages should be writable and
as a result we should be able to make use of build_skb which can be used to
drop the skb->head memory allocation, header parsing, and memcpy from the
receive path which can greatly help to improve performance.
My main concern at this point is that there might be an architecture where
I didn't get DMA_ATTR_SKIP_CPU_SYNC implemented that might still need it.
For that reason I have also added a ethtool private flag called out as
"legacy-rx". If a platform encounters an issue where the Rx can possibly
corrupt data it can be enbled by running:
ethtool --set-priv-flags DEVNAME legacy-rx on
The testing matrix for all of these patches is going to be pretty
extensive. Basically we want to test these patches on as many platforms
and architectures as possible with as many features being toggled as
possible including RSC, FCoE, SR-IOV, and Jumbo Frames all while receiving
traffic.
Within the patches there is also some intialization changes. Specifically
I have updated the code paths to defer clearing the rings until we are
about to initialize them and give them to hardware. By doing this we are
able to avoid having to dirty memory we don't need to which should help to
improve suspend/resume times for when we start looking at possibly using
the suspend/resume approach for migration of interface in VMs.
---
Alexander Duyck (11):
ixgbe: Add function for checking to see if we can reuse page
ixgbe: Only DMA sync frame length
ixgbe: Update driver to make use of DMA attributes in Rx path
ixgbe: Update code to better handle incrementing page count
ixgbe: Make use of order 1 pages and 3K buffers independent of FCoE
ixgbe: Use length to determine if descriptor is done
ixgbe: Break out Rx buffer page management
ixgbe: Add support for padding packet
ixgbe: Add private flag to control buffer mode
ixgbe: Add support for build_skb
ixgbe: Don't bother clearing buffer memory for descriptor rings
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 45 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 58 ++
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 584 ++++++++++++++--------
3 files changed, 465 insertions(+), 222 deletions(-)
^ permalink raw reply
* [next PATCH 02/11] ixgbe: Only DMA sync frame length
From: Alexander Duyck @ 2017-01-06 16:06 UTC (permalink / raw)
To: intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev
In-Reply-To: <20170106155448.1501.31298.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
On some platforms, syncing a buffer for DMA is expensive. Rather than
sync the whole 2K receive buffer, only synchronise the length of the
frame, which will typically be the MTU, or a much smaller TCP ACK.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index e80d885af4d3..dbbf5223ace2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1858,7 +1858,7 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
dma_sync_single_range_for_cpu(rx_ring->dev,
IXGBE_CB(skb)->dma,
frag->page_offset,
- ixgbe_rx_bufsz(rx_ring),
+ skb_frag_size(frag),
DMA_FROM_DEVICE);
}
IXGBE_CB(skb)->dma = 0;
@@ -1999,12 +1999,11 @@ static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer,
**/
static bool ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
struct ixgbe_rx_buffer *rx_buffer,
- union ixgbe_adv_rx_desc *rx_desc,
+ unsigned int size,
struct sk_buff *skb)
{
struct page *page = rx_buffer->page;
unsigned char *va = page_address(page) + rx_buffer->page_offset;
- unsigned int size = le16_to_cpu(rx_desc->wb.upper.length);
#if (PAGE_SIZE < 8192)
unsigned int truesize = ixgbe_rx_bufsz(rx_ring);
#else
@@ -2036,6 +2035,7 @@ static bool ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
union ixgbe_adv_rx_desc *rx_desc)
{
+ unsigned int size = le16_to_cpu(rx_desc->wb.upper.length);
struct ixgbe_rx_buffer *rx_buffer;
struct sk_buff *skb;
struct page *page;
@@ -2090,14 +2090,14 @@ static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
dma_sync_single_range_for_cpu(rx_ring->dev,
rx_buffer->dma,
rx_buffer->page_offset,
- ixgbe_rx_bufsz(rx_ring),
+ size,
DMA_FROM_DEVICE);
rx_buffer->skb = NULL;
}
/* pull page into skb */
- if (ixgbe_add_rx_frag(rx_ring, rx_buffer, rx_desc, skb)) {
+ if (ixgbe_add_rx_frag(rx_ring, rx_buffer, size, skb)) {
/* hand second half of page back to the ring */
ixgbe_reuse_rx_page(rx_ring, rx_buffer);
} else if (IXGBE_CB(skb)->dma == rx_buffer->dma) {
^ permalink raw reply related
* [next PATCH 01/11] ixgbe: Add function for checking to see if we can reuse page
From: Alexander Duyck @ 2017-01-06 16:06 UTC (permalink / raw)
To: intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev
In-Reply-To: <20170106155448.1501.31298.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
This patch consolidates the code for the ixgbe driver so that it is more
inline with what is already in igb. The general idea is to just
consolidate functions that represent logical steps in the Rx process so we
can later update them more easily.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 70 +++++++++++++++----------
1 file changed, 41 insertions(+), 29 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 3beadc8c7a0a..e80d885af4d3 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1947,6 +1947,41 @@ static inline bool ixgbe_page_is_reserved(struct page *page)
return (page_to_nid(page) != numa_mem_id()) || page_is_pfmemalloc(page);
}
+static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer,
+ struct page *page,
+ const unsigned int truesize)
+{
+#if (PAGE_SIZE >= 8192)
+ unsigned int last_offset = ixgbe_rx_pg_size(rx_ring) -
+ ixgbe_rx_bufsz(rx_ring);
+#endif
+ /* avoid re-using remote pages */
+ if (unlikely(ixgbe_page_is_reserved(page)))
+ return false;
+
+#if (PAGE_SIZE < 8192)
+ /* if we are only owner of page we can reuse it */
+ if (unlikely(page_count(page) != 1))
+ return false;
+
+ /* flip page offset to other buffer */
+ rx_buffer->page_offset ^= truesize;
+#else
+ /* move offset up to the next cache line */
+ rx_buffer->page_offset += truesize;
+
+ if (rx_buffer->page_offset > last_offset)
+ return false;
+#endif
+
+ /* Even if we own the page, we are not allowed to use atomic_set()
+ * This would break get_page_unless_zero() users.
+ */
+ page_ref_inc(page);
+
+ return true;
+}
+
/**
* ixgbe_add_rx_frag - Add contents of Rx buffer to sk_buff
* @rx_ring: rx descriptor ring to transact packets on
@@ -1968,18 +2003,18 @@ static bool ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
struct sk_buff *skb)
{
struct page *page = rx_buffer->page;
+ unsigned char *va = page_address(page) + rx_buffer->page_offset;
unsigned int size = le16_to_cpu(rx_desc->wb.upper.length);
#if (PAGE_SIZE < 8192)
unsigned int truesize = ixgbe_rx_bufsz(rx_ring);
#else
unsigned int truesize = ALIGN(size, L1_CACHE_BYTES);
- unsigned int last_offset = ixgbe_rx_pg_size(rx_ring) -
- ixgbe_rx_bufsz(rx_ring);
#endif
- if ((size <= IXGBE_RX_HDR_SIZE) && !skb_is_nonlinear(skb)) {
- unsigned char *va = page_address(page) + rx_buffer->page_offset;
+ if (unlikely(skb_is_nonlinear(skb)))
+ goto add_tail_frag;
+ if (size <= IXGBE_RX_HDR_SIZE) {
memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
/* page is not reserved, we can reuse buffer as-is */
@@ -1991,34 +2026,11 @@ static bool ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
return false;
}
+add_tail_frag:
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
rx_buffer->page_offset, size, truesize);
- /* avoid re-using remote pages */
- if (unlikely(ixgbe_page_is_reserved(page)))
- return false;
-
-#if (PAGE_SIZE < 8192)
- /* if we are only owner of page we can reuse it */
- if (unlikely(page_count(page) != 1))
- return false;
-
- /* flip page offset to other buffer */
- rx_buffer->page_offset ^= truesize;
-#else
- /* move offset up to the next cache line */
- rx_buffer->page_offset += truesize;
-
- if (rx_buffer->page_offset > last_offset)
- return false;
-#endif
-
- /* Even if we own the page, we are not allowed to use atomic_set()
- * This would break get_page_unless_zero() users.
- */
- page_ref_inc(page);
-
- return true;
+ return ixgbe_can_reuse_rx_page(rx_buffer, page, truesize);
}
static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
^ permalink raw reply related
* [next PATCH 03/11] ixgbe: Update driver to make use of DMA attributes in Rx path
From: Alexander Duyck @ 2017-01-06 16:06 UTC (permalink / raw)
To: intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev
In-Reply-To: <20170106155448.1501.31298.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
This patch adds support for DMA_ATTR_SKIP_CPU_SYNC and
DMA_ATTR_WEAK_ORDERING. By enabling both of these for the Rx path we are
able to see performance improvements on architectures that implement either
one due to the fact that page mapping and unmapping only has to sync what
is actually being used instead of the entire buffer. In addition by
enabling the weak ordering attribute enables a performance improvement for
architectures that can associate a memory ordering with a DMA buffer such
as Sparc.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 3 +
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 56 +++++++++++++++++--------
2 files changed, 40 insertions(+), 19 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 9c6ccfc34177..97e74deecae2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -107,6 +107,9 @@
/* How many Rx Buffers do we bundle into one write to the hardware ? */
#define IXGBE_RX_BUFFER_WRITE 16 /* Must be power of 2 */
+#define IXGBE_RX_DMA_ATTR \
+ (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
+
enum ixgbe_tx_flags {
/* cmd_type flags */
IXGBE_TX_FLAGS_HW_VLAN = 0x01,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index dbbf5223ace2..062b984ffdf4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1583,8 +1583,10 @@ static bool ixgbe_alloc_mapped_page(struct ixgbe_ring *rx_ring,
}
/* map page for use */
- dma = dma_map_page(rx_ring->dev, page, 0,
- ixgbe_rx_pg_size(rx_ring), DMA_FROM_DEVICE);
+ dma = dma_map_page_attrs(rx_ring->dev, page, 0,
+ ixgbe_rx_pg_size(rx_ring),
+ DMA_FROM_DEVICE,
+ IXGBE_RX_DMA_ATTR);
/*
* if mapping failed free memory back to system since
@@ -1627,6 +1629,12 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
if (!ixgbe_alloc_mapped_page(rx_ring, bi))
break;
+ /* sync the buffer for use by the device */
+ dma_sync_single_range_for_device(rx_ring->dev, bi->dma,
+ bi->page_offset,
+ ixgbe_rx_bufsz(rx_ring),
+ DMA_FROM_DEVICE);
+
/*
* Refresh the desc even if buffer_addrs didn't change
* because each write-back erases this info.
@@ -1849,8 +1857,10 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
{
/* if the page was released unmap it, else just sync our portion */
if (unlikely(IXGBE_CB(skb)->page_released)) {
- dma_unmap_page(rx_ring->dev, IXGBE_CB(skb)->dma,
- ixgbe_rx_pg_size(rx_ring), DMA_FROM_DEVICE);
+ dma_unmap_page_attrs(rx_ring->dev, IXGBE_CB(skb)->dma,
+ ixgbe_rx_pg_size(rx_ring),
+ DMA_FROM_DEVICE,
+ IXGBE_RX_DMA_ATTR);
IXGBE_CB(skb)->page_released = false;
} else {
struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[0];
@@ -1934,12 +1944,6 @@ static void ixgbe_reuse_rx_page(struct ixgbe_ring *rx_ring,
/* transfer page from old buffer to new buffer */
*new_buff = *old_buff;
-
- /* sync the buffer for use by the device */
- dma_sync_single_range_for_device(rx_ring->dev, new_buff->dma,
- new_buff->page_offset,
- ixgbe_rx_bufsz(rx_ring),
- DMA_FROM_DEVICE);
}
static inline bool ixgbe_page_is_reserved(struct page *page)
@@ -2105,9 +2109,10 @@ static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
IXGBE_CB(skb)->page_released = true;
} else {
/* we are not reusing the buffer so unmap it */
- dma_unmap_page(rx_ring->dev, rx_buffer->dma,
- ixgbe_rx_pg_size(rx_ring),
- DMA_FROM_DEVICE);
+ dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma,
+ ixgbe_rx_pg_size(rx_ring),
+ DMA_FROM_DEVICE,
+ IXGBE_RX_DMA_ATTR);
}
/* clear contents of buffer_info */
@@ -4941,10 +4946,11 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
if (rx_buffer->skb) {
struct sk_buff *skb = rx_buffer->skb;
if (IXGBE_CB(skb)->page_released)
- dma_unmap_page(dev,
- IXGBE_CB(skb)->dma,
- ixgbe_rx_bufsz(rx_ring),
- DMA_FROM_DEVICE);
+ dma_unmap_page_attrs(dev,
+ IXGBE_CB(skb)->dma,
+ ixgbe_rx_pg_size(rx_ring),
+ DMA_FROM_DEVICE,
+ IXGBE_RX_DMA_ATTR);
dev_kfree_skb(skb);
rx_buffer->skb = NULL;
}
@@ -4952,8 +4958,20 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
if (!rx_buffer->page)
continue;
- dma_unmap_page(dev, rx_buffer->dma,
- ixgbe_rx_pg_size(rx_ring), DMA_FROM_DEVICE);
+ /* Invalidate cache lines that may have been written to by
+ * device so that we avoid corrupting memory.
+ */
+ dma_sync_single_range_for_cpu(rx_ring->dev,
+ rx_buffer->dma,
+ rx_buffer->page_offset,
+ ixgbe_rx_bufsz(rx_ring),
+ DMA_FROM_DEVICE);
+
+ /* free resources associated with mapping */
+ dma_unmap_page_attrs(dev, rx_buffer->dma,
+ ixgbe_rx_pg_size(rx_ring),
+ DMA_FROM_DEVICE,
+ IXGBE_RX_DMA_ATTR);
__free_pages(rx_buffer->page, ixgbe_rx_pg_order(rx_ring));
rx_buffer->page = NULL;
^ permalink raw reply related
* [next PATCH 04/11] ixgbe: Update code to better handle incrementing page count
From: Alexander Duyck @ 2017-01-06 16:06 UTC (permalink / raw)
To: intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev
In-Reply-To: <20170106155448.1501.31298.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
Batch the page count updates instead of doing them one at a time. By doing
this we can improve the overall performance as the atomic increment
operations can be expensive due to the fact that on x86 they are locked
operations which can cause stalls. By doing bulk updates we can
consolidate the stall which should help to improve the overall receive
performance.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 7 ++++
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 39 ++++++++++++++++---------
2 files changed, 31 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 97e74deecae2..717c65b0deb2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -198,7 +198,12 @@ struct ixgbe_rx_buffer {
struct sk_buff *skb;
dma_addr_t dma;
struct page *page;
- unsigned int page_offset;
+#if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536)
+ __u32 page_offset;
+#else
+ __u16 page_offset;
+#endif
+ __u16 pagecnt_bias;
};
struct ixgbe_queue_stats {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 062b984ffdf4..519b6a2b65c1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1602,6 +1602,7 @@ static bool ixgbe_alloc_mapped_page(struct ixgbe_ring *rx_ring,
bi->dma = dma;
bi->page = page;
bi->page_offset = 0;
+ bi->pagecnt_bias = 1;
return true;
}
@@ -1959,13 +1960,15 @@ static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer,
unsigned int last_offset = ixgbe_rx_pg_size(rx_ring) -
ixgbe_rx_bufsz(rx_ring);
#endif
+ unsigned int pagecnt_bias = rx_buffer->pagecnt_bias--;
+
/* avoid re-using remote pages */
if (unlikely(ixgbe_page_is_reserved(page)))
return false;
#if (PAGE_SIZE < 8192)
/* if we are only owner of page we can reuse it */
- if (unlikely(page_count(page) != 1))
+ if (unlikely(page_count(page) != pagecnt_bias))
return false;
/* flip page offset to other buffer */
@@ -1978,10 +1981,14 @@ static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer,
return false;
#endif
- /* Even if we own the page, we are not allowed to use atomic_set()
- * This would break get_page_unless_zero() users.
+ /* If we have drained the page fragment pool we need to update
+ * the pagecnt_bias and page count so that we fully restock the
+ * number of references the driver holds.
*/
- page_ref_inc(page);
+ if (unlikely(pagecnt_bias == 1)) {
+ page_ref_add(page, USHRT_MAX);
+ rx_buffer->pagecnt_bias = USHRT_MAX;
+ }
return true;
}
@@ -2025,7 +2032,6 @@ static bool ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
return true;
/* this page cannot be reused so discard it */
- __free_pages(page, ixgbe_rx_pg_order(rx_ring));
return false;
}
@@ -2104,15 +2110,19 @@ static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
if (ixgbe_add_rx_frag(rx_ring, rx_buffer, size, skb)) {
/* hand second half of page back to the ring */
ixgbe_reuse_rx_page(rx_ring, rx_buffer);
- } else if (IXGBE_CB(skb)->dma == rx_buffer->dma) {
- /* the page has been released from the ring */
- IXGBE_CB(skb)->page_released = true;
} else {
- /* we are not reusing the buffer so unmap it */
- dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma,
- ixgbe_rx_pg_size(rx_ring),
- DMA_FROM_DEVICE,
- IXGBE_RX_DMA_ATTR);
+ if (IXGBE_CB(skb)->dma == rx_buffer->dma) {
+ /* the page has been released from the ring */
+ IXGBE_CB(skb)->page_released = true;
+ } else {
+ /* we are not reusing the buffer so unmap it */
+ dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma,
+ ixgbe_rx_pg_size(rx_ring),
+ DMA_FROM_DEVICE,
+ IXGBE_RX_DMA_ATTR);
+ }
+ __page_frag_cache_drain(page,
+ rx_buffer->pagecnt_bias);
}
/* clear contents of buffer_info */
@@ -4972,7 +4982,8 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
ixgbe_rx_pg_size(rx_ring),
DMA_FROM_DEVICE,
IXGBE_RX_DMA_ATTR);
- __free_pages(rx_buffer->page, ixgbe_rx_pg_order(rx_ring));
+ __page_frag_cache_drain(rx_buffer->page,
+ rx_buffer->pagecnt_bias);
rx_buffer->page = NULL;
}
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox