* Re: [PATCH RFC 4/6] net: rfkill: gpio: add device tree support
From: Mark Brown @ 2014-01-22 12:38 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Alexandre Courbot, Heikki Krogerus, devicetree@vger.kernel.org,
netdev, Linus Walleij, linux-wireless, linux-kernel,
David S. Miller, linux-gpio@vger.kernel.org, linux-sunxi,
Maxime Ripard, Chen-Yu Tsai, Mika Westerberg, Johannes Berg,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <201401211950.07011.arnd@arndb.de>
[-- Attachment #1.1: Type: text/plain, Size: 725 bytes --]
On Tue, Jan 21, 2014 at 07:50:06PM +0100, Arnd Bergmann wrote:
> I should have another look at the debugfs representation, but isn't
> there a global namespace that gets used for all gpios? Neither the
> con_id nor the name that the driver picks would be globally unique
> and stable across kernel versions, so they don't make a good user
> interface.
Currently the globally unique name is the GPIO number but that's not
stable since it gets dynamically assigned.
> I think what we want here is a system-wide unique identifier for
> each gpio line that may get represented in debugfs, and use a new
> DT mechanism to communicate that.
We've mostly been using things based off dev_name() for stability.
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* Re: [PATCH] net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer
From: Neil Horman @ 2014-01-22 12:30 UTC (permalink / raw)
To: Matija Glavinic Pecotic; +Cc: linux-sctp, Alexander Sverdlin, netdev
In-Reply-To: <52D8D544.5050501@nsn.com>
On Fri, Jan 17, 2014 at 08:01:24AM +0100, Matija Glavinic Pecotic wrote:
> Implementation of (a)rwnd calculation might lead to severe performance issues
> and associations completely stalling. These problems are described and solution
> is proposed which improves lksctp's robustness in congestion state.
>
> 1) Sudden drop of a_rwnd and incomplete window recovery afterwards
>
> Data accounted in sctp_assoc_rwnd_decrease takes only payload size (sctp data),
> but size of sk_buff, which is blamed against receiver buffer, is not accounted
> in rwnd. Theoretically, this should not be the problem as actual size of buffer
> is double the amount requested on the socket (SO_RECVBUF). Problem here is
> that this will have bad scaling for data which is less then sizeof sk_buff.
> E.g. in 4G (LTE) networks, link interfacing radio side will have a large portion
> of traffic of this size (less then 100B).
>
> An example of sudden drop and incomplete window recovery is given below. Node B
> exhibits problematic behavior. Node A initiates association and B is configured
> to advertise rwnd of 10000. A sends messages of size 43B (size of typical sctp
> message in 4G (LTE) network). On B data is left in buffer by not reading socket
> in userspace.
>
> Lets examine when we will hit pressure state and declare rwnd to be 0 for
> scenario with above stated parameters (rwnd == 10000, chunk size == 43, each
> chunk is sent in separate sctp packet)
>
> Logic is implemented in sctp_assoc_rwnd_decrease:
>
> socket_buffer (see below) is maximum size which can be held in socket buffer
> (sk_rcvbuf). current_alloced is amount of data currently allocated (rx_count)
>
> A simple expression is given for which it will be examined after how many
> packets for above stated parameters we enter pressure state:
>
> We start by condition which has to be met in order to enter pressure state:
>
> socket_buffer < currently_alloced;
>
> currently_alloced is represented as size of sctp packets received so far and not
> yet delivered to userspace. x is the number of chunks/packets (since there is no
> bundling, and each chunk is delivered in separate packet, we can observe each
> chunk also as sctp packet, and what is important here, having its own sk_buff):
>
> socket_buffer < x*each_sctp_packet;
>
> each_sctp_packet is sctp chunk size + sizeof(struct sk_buff). socket_buffer is
> twice the amount of initially requested size of socket buffer, which is in case
> of sctp, twice the a_rwnd requested:
>
> 2*rwnd < x*(payload+sizeof(struc sk_buff));
>
> sizeof(struct sk_buff) is 190 (3.13.0-rc4+). Above is stated that rwnd is 10000
> and each payload size is 43
>
> 20000 < x(43+190);
>
> x > 20000/233;
>
> x ~> 84;
>
> After ~84 messages, pressure state is entered and 0 rwnd is advertised while
> received 84*43B ~= 3612B sctp data. This is why external observer notices sudden
> drop from 6474 to 0, as it will be now shown in example:
>
> IP A.34340 > B.12345: sctp (1) [INIT] [init tag: 1875509148] [rwnd: 81920] [OS: 10] [MIS: 65535] [init TSN: 1096057017]
> IP B.12345 > A.34340: sctp (1) [INIT ACK] [init tag: 3198966556] [rwnd: 10000] [OS: 10] [MIS: 10] [init TSN: 902132839]
> IP A.34340 > B.12345: sctp (1) [COOKIE ECHO]
> IP B.12345 > A.34340: sctp (1) [COOKIE ACK]
> IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057017] [SID: 0] [SSEQ 0] [PPID 0x18]
> IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057017] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
> IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057018] [SID: 0] [SSEQ 1] [PPID 0x18]
> IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057018] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
> IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057019] [SID: 0] [SSEQ 2] [PPID 0x18]
> IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057019] [a_rwnd 9914] [#gap acks 0] [#dup tsns 0]
> <...>
> IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057098] [SID: 0] [SSEQ 81] [PPID 0x18]
> IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057098] [a_rwnd 6517] [#gap acks 0] [#dup tsns 0]
> IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057099] [SID: 0] [SSEQ 82] [PPID 0x18]
> IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057099] [a_rwnd 6474] [#gap acks 0] [#dup tsns 0]
> IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057100] [SID: 0] [SSEQ 83] [PPID 0x18]
>
> --> Sudden drop
>
> IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]
>
> At this point, rwnd_press stores current rwnd value so it can be later restored
> in sctp_assoc_rwnd_increase. This however doesn't happen as condition to start
> slowly increasing rwnd until rwnd_press is returned to rwnd is never met. This
> condition is not met since rwnd, after it hit 0, must first reach rwnd_press by
> adding amount which is read from userspace. Let us observe values in above
> example. Initial a_rwnd is 10000, pressure was hit when rwnd was ~6500 and the
> amount of actual sctp data currently waiting to be delivered to userspace
> is ~3500. When userspace starts to read, sctp_assoc_rwnd_increase will be blamed
> only for sctp data, which is ~3500. Condition is never met, and when userspace
> reads all data, rwnd stays on 3569.
>
> IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 1505] [#gap acks 0] [#dup tsns 0]
> IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 3010] [#gap acks 0] [#dup tsns 0]
> IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057101] [SID: 0] [SSEQ 84] [PPID 0x18]
> IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057101] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]
>
> --> At this point userspace read everything, rwnd recovered only to 3569
>
> IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057102] [SID: 0] [SSEQ 85] [PPID 0x18]
> IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057102] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]
>
> Reproduction is straight forward, it is enough for sender to send packets of
> size less then sizeof(struct sk_buff) and receiver keeping them in its buffers.
>
> 2) Minute size window for associations sharing the same socket buffer
>
> In case multiple associations share the same socket, and same socket buffer
> (sctp.rcvbuf_policy == 0), different scenarios exist in which congestion on one
> of the associations can permanently drop rwnd of other association(s).
>
> Situation will be typically observed as one association suddenly having rwnd
> dropped to size of last packet received and never recovering beyond that point.
> Different scenarios will lead to it, but all have in common that one of the
> associations (let it be association from 1)) nearly depleted socket buffer, and
> the other association blames socket buffer just for the amount enough to start
> the pressure. This association will enter pressure state, set rwnd_press and
> announce 0 rwnd.
> When data is read by userspace, similar situation as in 1) will occur, rwnd will
> increase just for the size read by userspace but rwnd_press will be high enough
> so that association doesn't have enough credit to reach rwnd_press and restore
> to previous state. This case is special case of 1), being worse as there is, in
> the worst case, only one packet in buffer for which size rwnd will be increased.
> Consequence is association which has very low maximum rwnd ('minute size', in
> our case down to 43B - size of packet which caused pressure) and as such
> unusable.
>
> Scenario happened in the field and labs frequently after congestion state (link
> breaks, different probabilities of packet drop, packet reordering) and with
> scenario 1) preceding. Here is given a deterministic scenario for reproduction:
>
> From node A establish two associations on the same socket, with rcvbuf_policy
> being set to share one common buffer (sctp.rcvbuf_policy == 0). On association 1
> repeat scenario from 1), that is, bring it down to 0 and restore up. Observe
> scenario 1). Use small payload size (here we use 43). Once rwnd is 'recovered',
> bring it down close to 0, as in just one more packet would close it. This has as
> a consequence that association number 2 is able to receive (at least) one more
> packet which will bring it in pressure state. E.g. if association 2 had rwnd of
> 10000, packet received was 43, and we enter at this point into pressure,
> rwnd_press will have 9957. Once payload is delivered to userspace, rwnd will
> increase for 43, but conditions to restore rwnd to original state, just as in
> 1), will never be satisfied.
>
> --> Association 1, between A.y and B.12345
>
> IP A.55915 > B.12345: sctp (1) [INIT] [init tag: 836880897] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 4032536569]
> IP B.12345 > A.55915: sctp (1) [INIT ACK] [init tag: 2873310749] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3799315613]
> IP A.55915 > B.12345: sctp (1) [COOKIE ECHO]
> IP B.12345 > A.55915: sctp (1) [COOKIE ACK]
>
> --> Association 2, between A.z and B.12346
>
> IP A.55915 > B.12346: sctp (1) [INIT] [init tag: 534798321] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 2099285173]
> IP B.12346 > A.55915: sctp (1) [INIT ACK] [init tag: 516668823] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3676403240]
> IP A.55915 > B.12346: sctp (1) [COOKIE ECHO]
> IP B.12346 > A.55915: sctp (1) [COOKIE ACK]
>
> --> Deplete socket buffer by sending messages of size 43B over association 1
>
> IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315613] [SID: 0] [SSEQ 0] [PPID 0x18]
> IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315613] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
>
> <...>
>
> IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315696] [a_rwnd 6388] [#gap acks 0] [#dup tsns 0]
> IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315697] [SID: 0] [SSEQ 84] [PPID 0x18]
> IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315697] [a_rwnd 6345] [#gap acks 0] [#dup tsns 0]
>
> --> Sudden drop on 1
>
> IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315698] [SID: 0] [SSEQ 85] [PPID 0x18]
> IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315698] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]
>
> --> Here userspace read, rwnd 'recovered' to 3698, now deplete again using
> association 1 so there is place in buffer for only one more packet
>
> IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315799] [SID: 0] [SSEQ 186] [PPID 0x18]
> IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315799] [a_rwnd 86] [#gap acks 0] [#dup tsns 0]
> IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315800] [SID: 0] [SSEQ 187] [PPID 0x18]
> IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]
>
> --> Socket buffer is almost depleted, but there is space for one more packet,
> send them over association 2, size 43B
>
> IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403240] [SID: 0] [SSEQ 0] [PPID 0x18]
> IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403240] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]
>
> --> Immediate drop
>
> IP A.60995 > B.12346: sctp (1) [SACK] [cum ack 387491510] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]
>
> --> Read everything from the socket, both association recover up to maximum rwnd
> they are capable of reaching, note that association 1 recovered up to 3698,
> and association 2 recovered only to 43
>
> IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 1548] [#gap acks 0] [#dup tsns 0]
> IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 3053] [#gap acks 0] [#dup tsns 0]
> IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315801] [SID: 0] [SSEQ 188] [PPID 0x18]
> IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315801] [a_rwnd 3698] [#gap acks 0] [#dup tsns 0]
> IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403241] [SID: 0] [SSEQ 1] [PPID 0x18]
> IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403241] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]
>
> A careful reader might wonder why it is necessary to reproduce 1) prior
> reproduction of 2). It is simply easier to observe when to send packet over
> association 2 which will push association into the pressure state.
>
> Proposed solution:
>
> Both problems share the same root cause, and that is improper scaling of socket
> buffer with rwnd. Solution in which sizeof(sk_buff) is taken into concern while
> calculating rwnd is not possible due to fact that there is no linear
> relationship between amount of data blamed in increase/decrease with IP packet
> in which payload arrived. Even in case such solution would be followed,
> complexity of the code would increase. Due to nature of current rwnd handling,
> slow increase (in sctp_assoc_rwnd_increase) of rwnd after pressure state is
> entered is rationale, but it gives false representation to the sender of current
> buffer space. Furthermore, it implements additional congestion control mechanism
> which is defined on implementation, and not on standard basis.
>
> Proposed solution simplifies whole algorithm having on mind definition from rfc:
>
> o Receiver Window (rwnd): This gives the sender an indication of the space
> available in the receiver's inbound buffer.
>
> Core of the proposed solution is given with these lines:
>
> sctp_assoc_rwnd_account:
> if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0)
> asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1;
> else
> asoc->rwnd = 0;
>
> We advertise to sender (half of) actual space we have. Half is in the braces
> depending whether you would like to observe size of socket buffer as SO_RECVBUF
> or twice the amount, i.e. size is the one visible from userspace, that is,
> from kernelspace.
> In this way sender is given with good approximation of our buffer space,
> regardless of the buffer policy - we always advertise what we have. Proposed
> solution fixes described problems and removes necessity for rwnd restoration
> algorithm. Finally, as proposed solution is simplification, some lines of code,
> along with some bytes in struct sctp_association are saved.
>
> Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nsn.com>
>
General comment - While this seems to make sense to me generally speaking,
doesn't it currently violate section 6 of the RFC?
A SCTP receiver MUST be able to receive a minimum of 1500 bytes in
one SCTP packet. This means that a SCTP endpoint MUST NOT indicate
less than 1500 bytes in its Initial a_rwnd sent in the INIT or INIT
ACK.
Since we set the initial rwnd value to the larger of sk->sk_rcvbuf/2 or
SCTP_MIN_RWND (1500 bytes), won't we potentially advertize half that amount? It
seems we need to double the minimum socket receive buffer here.
Neil
> ---
>
> --- net-next.orig/net/sctp/associola.c
> +++ net-next/net/sctp/associola.c
> @@ -1367,44 +1367,35 @@ static inline bool sctp_peer_needs_updat
> return false;
> }
>
> -/* Increase asoc's rwnd by len and send any window update SACK if needed. */
> -void sctp_assoc_rwnd_increase(struct sctp_association *asoc, unsigned int len)
> +/* Account asoc's rwnd for the approximated state in the buffer,
> + * and check whether SACK needs to be sent.
> + */
> +void sctp_assoc_rwnd_account(struct sctp_association *asoc, int check_sack)
> {
> + int rx_count;
> struct sctp_chunk *sack;
> struct timer_list *timer;
>
> - if (asoc->rwnd_over) {
> - if (asoc->rwnd_over >= len) {
> - asoc->rwnd_over -= len;
> - } else {
> - asoc->rwnd += (len - asoc->rwnd_over);
> - asoc->rwnd_over = 0;
> - }
> - } else {
> - asoc->rwnd += len;
> - }
> + if (asoc->ep->rcvbuf_policy)
> + rx_count = atomic_read(&asoc->rmem_alloc);
> + else
> + rx_count = atomic_read(&asoc->base.sk->sk_rmem_alloc);
>
> - /* If we had window pressure, start recovering it
> - * once our rwnd had reached the accumulated pressure
> - * threshold. The idea is to recover slowly, but up
> - * to the initial advertised window.
> - */
> - if (asoc->rwnd_press && asoc->rwnd >= asoc->rwnd_press) {
> - int change = min(asoc->pathmtu, asoc->rwnd_press);
> - asoc->rwnd += change;
> - asoc->rwnd_press -= change;
> - }
> + if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0)
> + asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1;
> + else
> + asoc->rwnd = 0;
>
> - pr_debug("%s: asoc:%p rwnd increased by %d to (%u, %u) - %u\n",
> - __func__, asoc, len, asoc->rwnd, asoc->rwnd_over,
> - asoc->a_rwnd);
> + pr_debug("%s: asoc:%p rwnd=%u, rx_count=%d, sk_rcvbuf=%d\n",
> + __func__, asoc, asoc->rwnd, rx_count,
> + asoc->base.sk->sk_rcvbuf);
>
> /* Send a window update SACK if the rwnd has increased by at least the
> * minimum of the association's PMTU and half of the receive buffer.
> * The algorithm used is similar to the one described in
> * Section 4.2.3.3 of RFC 1122.
> */
> - if (sctp_peer_needs_update(asoc)) {
> + if (check_sack && sctp_peer_needs_update(asoc)) {
> asoc->a_rwnd = asoc->rwnd;
>
> pr_debug("%s: sending window update SACK- asoc:%p rwnd:%u "
> @@ -1426,45 +1417,6 @@ void sctp_assoc_rwnd_increase(struct sct
> }
> }
>
> -/* Decrease asoc's rwnd by len. */
> -void sctp_assoc_rwnd_decrease(struct sctp_association *asoc, unsigned int len)
> -{
> - int rx_count;
> - int over = 0;
> -
> - if (unlikely(!asoc->rwnd || asoc->rwnd_over))
> - pr_debug("%s: association:%p has asoc->rwnd:%u, "
> - "asoc->rwnd_over:%u!\n", __func__, asoc,
> - asoc->rwnd, asoc->rwnd_over);
> -
> - if (asoc->ep->rcvbuf_policy)
> - rx_count = atomic_read(&asoc->rmem_alloc);
> - else
> - rx_count = atomic_read(&asoc->base.sk->sk_rmem_alloc);
> -
> - /* If we've reached or overflowed our receive buffer, announce
> - * a 0 rwnd if rwnd would still be positive. Store the
> - * the potential pressure overflow so that the window can be restored
> - * back to original value.
> - */
> - if (rx_count >= asoc->base.sk->sk_rcvbuf)
> - over = 1;
> -
> - if (asoc->rwnd >= len) {
> - asoc->rwnd -= len;
> - if (over) {
> - asoc->rwnd_press += asoc->rwnd;
> - asoc->rwnd = 0;
> - }
> - } else {
> - asoc->rwnd_over = len - asoc->rwnd;
> - asoc->rwnd = 0;
> - }
> -
> - pr_debug("%s: asoc:%p rwnd decreased by %d to (%u, %u, %u)\n",
> - __func__, asoc, len, asoc->rwnd, asoc->rwnd_over,
> - asoc->rwnd_press);
> -}
>
> /* Build the bind address list for the association based on info from the
> * local endpoint and the remote peer.
> --- net-next.orig/include/net/sctp/structs.h
> +++ net-next/include/net/sctp/structs.h
> @@ -1653,17 +1653,6 @@ struct sctp_association {
> /* This is the last advertised value of rwnd over a SACK chunk. */
> __u32 a_rwnd;
>
> - /* Number of bytes by which the rwnd has slopped. The rwnd is allowed
> - * to slop over a maximum of the association's frag_point.
> - */
> - __u32 rwnd_over;
> -
> - /* Keeps treack of rwnd pressure. This happens when we have
> - * a window, but not recevie buffer (i.e small packets). This one
> - * is releases slowly (1 PMTU at a time ).
> - */
> - __u32 rwnd_press;
> -
> /* This is the sndbuf size in use for the association.
> * This corresponds to the sndbuf size for the association,
> * as specified in the sk->sndbuf.
> @@ -1892,8 +1881,7 @@ void sctp_assoc_update(struct sctp_assoc
> __u32 sctp_association_get_next_tsn(struct sctp_association *);
>
> void sctp_assoc_sync_pmtu(struct sock *, struct sctp_association *);
> -void sctp_assoc_rwnd_increase(struct sctp_association *, unsigned int);
> -void sctp_assoc_rwnd_decrease(struct sctp_association *, unsigned int);
> +void sctp_assoc_rwnd_account(struct sctp_association *, int);
> void sctp_assoc_set_primary(struct sctp_association *,
> struct sctp_transport *);
> void sctp_assoc_del_nonprimary_peers(struct sctp_association *,
> --- net-next.orig/net/sctp/sm_statefuns.c
> +++ net-next/net/sctp/sm_statefuns.c
> @@ -6176,7 +6176,7 @@ static int sctp_eat_data(const struct sc
> * PMTU. In cases, such as loopback, this might be a rather
> * large spill over.
> */
> - if ((!chunk->data_accepted) && (!asoc->rwnd || asoc->rwnd_over ||
> + if ((!chunk->data_accepted) && (!asoc->rwnd ||
> (datalen > asoc->rwnd + asoc->frag_point))) {
>
> /* If this is the next TSN, consider reneging to make
> --- net-next.orig/net/sctp/socket.c
> +++ net-next/net/sctp/socket.c
> @@ -2097,7 +2097,7 @@ static int sctp_recvmsg(struct kiocb *io
> * rwnd is updated when the event is freed.
> */
> if (!sctp_ulpevent_is_notification(event))
> - sctp_assoc_rwnd_increase(event->asoc, copied);
> + sctp_assoc_rwnd_account(event->asoc, 1);
> goto out;
> } else if ((event->msg_flags & MSG_NOTIFICATION) ||
> (event->msg_flags & MSG_EOR))
> --- net-next.orig/net/sctp/ulpevent.c
> +++ net-next/net/sctp/ulpevent.c
> @@ -989,7 +989,7 @@ static void sctp_ulpevent_receive_data(s
> skb = sctp_event2skb(event);
> /* Set the owner and charge rwnd for bytes received. */
> sctp_ulpevent_set_owner(event, asoc);
> - sctp_assoc_rwnd_decrease(asoc, skb_headlen(skb));
> + sctp_assoc_rwnd_account(asoc, 0);
>
> if (!skb->data_len)
> return;
> @@ -1035,7 +1035,7 @@ static void sctp_ulpevent_release_data(s
> }
>
> done:
> - sctp_assoc_rwnd_increase(event->asoc, len);
> + sctp_assoc_rwnd_account(event->asoc, 1);
> sctp_ulpevent_release_owner(event);
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* RE: [PATCH net-next 3/4] ethtool: Support for configurable RSS hash key.
From: Venkata Duvvuru @ 2014-01-22 12:12 UTC (permalink / raw)
To: Ben Hutchings; +Cc: netdev@vger.kernel.org
In-Reply-To: <1390224028.3651.72.camel@deadeye.wl.decadent.org.uk>
> -----Original Message-----
> From: Ben Hutchings [mailto:ben@decadent.org.uk]
> Sent: Monday, January 20, 2014 6:50 PM
> To: Venkata Duvvuru
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH net-next 3/4] ethtool: Support for configurable RSS hash
> key.
>
> On Mon, 2014-01-20 at 12:23 +0000, Venkata Duvvuru wrote:
> [...]
> > > > +/* RSS Hash key */
> > > > +struct ethtool_rss_hkey {
> > > > + __u32 cmd; /* ETHTOOL_SET/GET_RSS_HKEY */
> > > > + __u8 data[RSS_HASH_KEY_LEN];
> > > > + __u32 data_len;
> > > > +};
> > > [...]
> > >
> > > How about putting data after the data_len and giving it a length of
> > > 0, so this is extensible to an arbitrary length key?
> > >
> > > If we're extending the RSS configuration interface, there are a few
> > > other things that might be worth doing at the same time:
> > >
> > > - Single commands to get/set both the key and the indirection table
> > > at the same time
> > > - Add a field to distinguish multiple RSS contexts (some hardware
> > > can use RSS contexts together with filters, though RX NFC does not
> > > support that yet)
> > Are you referring to the filter-id that is created at the time of config-nfc?
> Pls clarify.
>
> No, what I mean is:
>
> 1. An RX flow steering filter can specify use of RSS, in which case the value
> looked up in the indirection is added to the queue number specified in the
> filter. This is not yet controllable through RX NFC though there is room for
> extension there.
>
> 2. Multi-function controllers need multiple RSS contexts (key + indirection
> table) to support independent use of RSS on each function.
> But it may also be possible to allocate multiple contexts to a single function.
> This could be useful in conjunction with 1. But there would need to be a way
> to allocate and configure extra contexts first.
The proposed changes will be incremental so I think this can be done in a separate patch. Thoughts?
>
> Ben.
>
> --
> Ben Hutchings
> One of the nice things about standards is that there are so many of them.
^ permalink raw reply
* RE: [PATCH 2/4 ethtool] ethtool: Support for configurable RSS hash key.
From: Venkata Duvvuru @ 2014-01-22 12:06 UTC (permalink / raw)
To: Ben Hutchings; +Cc: netdev@vger.kernel.org
In-Reply-To: <1390225398.3651.86.camel@deadeye.wl.decadent.org.uk>
> -----Original Message-----
> From: Ben Hutchings [mailto:ben@decadent.org.uk]
> Sent: Monday, January 20, 2014 7:13 PM
> To: Venkata Duvvuru
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH 2/4 ethtool] ethtool: Support for configurable RSS hash
> key.
>
> On Mon, 2014-01-20 at 13:28 +0000, Venkata Duvvuru wrote:
> > Ben, Please ignore my previous reply. My reply options were screwed up in
> that.
> >
> > > -----Original Message-----
> > > From: Ben Hutchings [mailto:ben@decadent.org.uk]
> > > Sent: Monday, January 20, 2014 12:06 AM
> > > To: Venkata Duvvuru
> > > Cc: netdev@vger.kernel.org
> > > Subject: Re: [PATCH 2/4 ethtool] ethtool: Support for configurable
> > > RSS hash key.
> > >
> > > On Fri, 2014-01-17 at 13:02 +0000, Venkata Duvvuru wrote:
> > > > This ethtool patch will primarily implement the parser for the
> > > > options
> > > provided by the user for set and get hashkey before invoking the ioctl.
> > > > This patch also has Ethtool man page changes which describes the
> > > > Usage of
> > > set and get hashkey options.
> > >
> > > I'd prefer to have this combined with the -x/-X options (and add new
> > > long options to reflect that they cover the key as well).
> >
> > if we add hashkey options to the existing -x/-X (--show-rxfh-indir/ --set-
> rxfh-indir), I think it won't be appropriate going by the command name.
> > We could change the command name to something like --show-rssconfig /-
> -rss-config but I'm afraid would that be backward compatible?
> [...]
>
> That's why I said 'add new long options'. The ethtool argument parser allows
> arbitrarily many aliases for each sub-command.
Just to make sure that we are in sync
{ "-x|--show-rxfh-indir|--show-hashkey", 1, do_getrssconfig,
"Show RSS configuration" },
{ "-X|--set-rxfh-indir|--hashkey", 1, do_setrssconfig,
"Set RSS configuration",
" equal N | weight W0 W1 ...\n"
" hkey %x:%x:%x:%x:%x:....:%x\n" },
And equal/weight will be mutually exclusive with hkey.
Does it makes sense?
>
> Ben.
>
> --
> Ben Hutchings
> One of the nice things about standards is that there are so many of them.
^ permalink raw reply
* RE: [PATCH v5] can: add Renesas R-Car CAN driver
From: David Laight @ 2014-01-22 11:58 UTC (permalink / raw)
To: 'Ben Dooks', Marc Kleine-Budde
Cc: Sergei Shtylyov, David Miller, geert@linux-m68k.org,
netdev@vger.kernel.org, wg@grandegger.com,
linux-can@vger.kernel.org, linux-sh@vger.kernel.org,
vksavl@gmail.com
In-Reply-To: <52DFB101.8040907@codethink.co.uk>
From: Ben Dooks
> On 20/01/14 21:17, Marc Kleine-Budde wrote:
> > On 01/20/2014 11:12 PM, Sergei Shtylyov wrote:
> > [...]
> >
> >>> I truly think using packed here is rediculous, please remove it unless
> >>> you can prove that things won't work without it.
> >>
> >> They will. But how about the following 'struct rcar_can_regs'?
> >
> > The strcuts in question are:
> >
> >> +/* Mailbox registers structure */
> >> +struct rcar_can_mbox_regs {
> >> + u32 id; /* IDE and RTR bits, SID and EID */
> >> + u8 stub; /* Not used */
> >> + u8 dlc; /* Data Length Code - bits [0..3] */
> >> + u8 data[8]; /* Data Bytes */
> >> + u8 tsh; /* Time Stamp Higher Byte */
> >> + u8 tsl; /* Time Stamp Lower Byte */
> >> +} __packed;
> >> +
> >> +struct rcar_can_regs {
> >> + struct rcar_can_mbox_regs mb[RCAR_CAN_N_MBX]; /* Mailbox registers */
> >> + u32 mkr_2_9[8]; /* Mask Registers 2-9 */
> >> + u32 fidcr[2]; /* FIFO Received ID Compare Register */
> >> + u32 mkivlr1; /* Mask Invalid Register 1 */
> >> + u32 mier1; /* Mailbox Interrupt Enable Register 1 */
> >> + u32 mkr_0_1[2]; /* Mask Registers 0-1 */
> >> + u32 mkivlr0; /* Mask Invalid Register 0*/
> >> + u32 mier0; /* Mailbox Interrupt Enable Register 0 */
> >> + u8 pad_440[0x3c0];
> >> + u8 mctl[64]; /* Message Control Registers */
> >> + u16 ctlr; /* Control Register */
> >> + u16 str; /* Status register */
> >> + u8 bcr[3]; /* Bit Configuration Register */
> >> + u8 clkr; /* Clock Select Register */
> >> + u8 rfcr; /* Receive FIFO Control Register */
> >> + u8 rfpcr; /* Receive FIFO Pointer Control Register */
> >> + u8 tfcr; /* Transmit FIFO Control Register */
> >> + u8 tfpcr; /* Transmit FIFO Pointer Control Register */
> >> + u8 eier; /* Error Interrupt Enable Register */
> >> + u8 eifr; /* Error Interrupt Factor Judge Register */
> >> + u8 recr; /* Receive Error Count Register */
> >> + u8 tecr; /* Transmit Error Count Register */
> >> + u8 ecsr; /* Error Code Store Register */
> >> + u8 cssr; /* Channel Search Support Register */
> >> + u8 mssr; /* Mailbox Search Status Register */
> >> + u8 msmr; /* Mailbox Search Mode Register */
> >> + u16 tsr; /* Time Stamp Register */
> >> + u8 afsr; /* Acceptance Filter Support Register */
> >> + u8 pad_857;
> >> + u8 tcr; /* Test Control Register */
> >> + u8 pad_859[7];
> >> + u8 ier; /* Interrupt Enable Register */
> >> + u8 isr; /* Interrupt Status Register */
> >> + u8 pad_862;
> >> + u8 mbsmr; /* Mailbox Search Mask Register */
> >> +} __packed;
> >
> > I think they should work without packed, too.
>
> I think this discussion proves why this is not a good idea.
You need the second structure to be the correct size, with or without
__packed there could easily be a typo - so add a compile-type assert
on the size.
If this matches some hardware spec, and the hardware spec doesn't
have anything misaligned, then you don't want to specify __packed.
Without the __packed the size check will detect more typos.
> IIRC, a compiler has the right to pad, or re-order structure
> elements as it sees fit depending on the architecture and options.
The language might allow many things, the ABI is much more explicit.
In this case you really care about the ABI.
David
^ permalink raw reply
* Re: [PATCH v5] can: add Renesas R-Car CAN driver
From: Geert Uytterhoeven @ 2014-01-22 11:54 UTC (permalink / raw)
To: Ben Dooks
Cc: Marc Kleine-Budde, Sergei Shtylyov, David Miller,
netdev@vger.kernel.org, wg, linux-can, Linux-sh list,
Pavel Kiryukhin
In-Reply-To: <52DFB101.8040907@codethink.co.uk>
On Wed, Jan 22, 2014 at 12:52 PM, Ben Dooks <ben.dooks@codethink.co.uk> wrote:
> IIRC, a compiler has the right to pad, or re-order structure
> elements as it sees fit depending on the architecture and options.
Pad: yes, reorder: no.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* Re: [PATCH v5] can: add Renesas R-Car CAN driver
From: Ben Dooks @ 2014-01-22 11:52 UTC (permalink / raw)
To: Marc Kleine-Budde
Cc: Sergei Shtylyov, David Miller, geert, netdev, wg, linux-can,
linux-sh, vksavl
In-Reply-To: <52DD927B.2060500@pengutronix.de>
On 20/01/14 21:17, Marc Kleine-Budde wrote:
> On 01/20/2014 11:12 PM, Sergei Shtylyov wrote:
> [...]
>
>>> I truly think using packed here is rediculous, please remove it unless
>>> you can prove that things won't work without it.
>>
>> They will. But how about the following 'struct rcar_can_regs'?
>
> The strcuts in question are:
>
>> +/* Mailbox registers structure */
>> +struct rcar_can_mbox_regs {
>> + u32 id; /* IDE and RTR bits, SID and EID */
>> + u8 stub; /* Not used */
>> + u8 dlc; /* Data Length Code - bits [0..3] */
>> + u8 data[8]; /* Data Bytes */
>> + u8 tsh; /* Time Stamp Higher Byte */
>> + u8 tsl; /* Time Stamp Lower Byte */
>> +} __packed;
>> +
>> +struct rcar_can_regs {
>> + struct rcar_can_mbox_regs mb[RCAR_CAN_N_MBX]; /* Mailbox registers */
>> + u32 mkr_2_9[8]; /* Mask Registers 2-9 */
>> + u32 fidcr[2]; /* FIFO Received ID Compare Register */
>> + u32 mkivlr1; /* Mask Invalid Register 1 */
>> + u32 mier1; /* Mailbox Interrupt Enable Register 1 */
>> + u32 mkr_0_1[2]; /* Mask Registers 0-1 */
>> + u32 mkivlr0; /* Mask Invalid Register 0*/
>> + u32 mier0; /* Mailbox Interrupt Enable Register 0 */
>> + u8 pad_440[0x3c0];
>> + u8 mctl[64]; /* Message Control Registers */
>> + u16 ctlr; /* Control Register */
>> + u16 str; /* Status register */
>> + u8 bcr[3]; /* Bit Configuration Register */
>> + u8 clkr; /* Clock Select Register */
>> + u8 rfcr; /* Receive FIFO Control Register */
>> + u8 rfpcr; /* Receive FIFO Pointer Control Register */
>> + u8 tfcr; /* Transmit FIFO Control Register */
>> + u8 tfpcr; /* Transmit FIFO Pointer Control Register */
>> + u8 eier; /* Error Interrupt Enable Register */
>> + u8 eifr; /* Error Interrupt Factor Judge Register */
>> + u8 recr; /* Receive Error Count Register */
>> + u8 tecr; /* Transmit Error Count Register */
>> + u8 ecsr; /* Error Code Store Register */
>> + u8 cssr; /* Channel Search Support Register */
>> + u8 mssr; /* Mailbox Search Status Register */
>> + u8 msmr; /* Mailbox Search Mode Register */
>> + u16 tsr; /* Time Stamp Register */
>> + u8 afsr; /* Acceptance Filter Support Register */
>> + u8 pad_857;
>> + u8 tcr; /* Test Control Register */
>> + u8 pad_859[7];
>> + u8 ier; /* Interrupt Enable Register */
>> + u8 isr; /* Interrupt Status Register */
>> + u8 pad_862;
>> + u8 mbsmr; /* Mailbox Search Mask Register */
>> +} __packed;
>
> I think they should work without packed, too.
I think this discussion proves why this is not a good idea.
IIRC, a compiler has the right to pad, or re-order structure
elements as it sees fit depending on the architecture and options.
--
Ben Dooks http://www.codethink.co.uk/
Senior Engineer Codethink - Providing Genius
^ permalink raw reply
* Re: [PATCH RFC 4/6] net: rfkill: gpio: add device tree support
From: Mika Westerberg @ 2014-01-22 11:00 UTC (permalink / raw)
To: Linus Walleij
Cc: Alexandre Courbot, Arnd Bergmann,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Heikki Krogerus, netdev, linux-wireless, linux-sunxi,
linux-kernel, Maxime Ripard, Chen-Yu Tsai, Johannes Berg,
David S. Miller,
linux-gpio-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <CACRpkdZbb=eO8YRtMn6hW0vn97PkHUo88e_o61DEC8=wPY3_PQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Wed, Jan 22, 2014 at 10:58:38AM +0100, Linus Walleij wrote:
> On Tue, Jan 21, 2014 at 3:53 PM, Alexandre Courbot <gnurou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> > The good (or bad, rather) thing about DT is that we can do whatever we
> > please with the new bindings: decide which name or which index
> > (doesn't matter here) a GPIO should have. However we don't have this
> > control over ACPI, where nothing guarantees that the same index will
> > be used for the same GPIO function.
>
> It's not like ACPI will impose some tables on us and expect us to
> use them as-is no matter how crazy they are. Then indeed the whole
> idea of unifying ACPI and DT accessors would be moot and it would
> only work by mistake or as a good joke.
With the current ACPI version we really don't have much of a choice here.
Only way we can distinguish between a set of GPIOs is to use index and hope
that the vendor puts the GPIOs in the table in same order that the driver
expects :-(
This is going to get a bit better when ACPI gets properties (like Arnd
commented already) as then we can get the correct index based on a name in
a table. However, it still requires the vendor to use naming that is
suitable for Linux driver in question.
> The situation with ACPI is just like with DT, it is assumed that the
> author of these tables also look at the Linux kernel drivers to figure
> out what argument goes where and we can influence them.
I certainly hope, now that Android is becoming more and more important,
that the vendors will actually check what the Linux drivers expect and
populate the tables with such information that is suitable for both ACPI
and DT enabled driver.
^ permalink raw reply
* Re: [Xen-devel] [PATCH v4] xen/grant-table: Avoid m2p_override during mapping
From: David Vrabel @ 2014-01-22 10:58 UTC (permalink / raw)
To: Zoltan Kiss
Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel,
jonathan.davies, Stefano Stabellini
In-Reply-To: <1390335755-29328-1-git-send-email-zoltan.kiss@citrix.com>
On 21/01/14 20:22, Zoltan Kiss wrote:
> The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
> for blkback and future netback patches it just cause a lock contention, as
> those pages never go to userspace. Therefore this series does the following:
> - the original functions were renamed to __gnttab_[un]map_refs, with a new
> parameter m2p_override
> - based on m2p_override either they follow the original behaviour, or just set
> the private flag and call set_phys_to_machine
> - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
> m2p_override false
> - a new function gnttab_[un]map_refs_userspace provides the old behaviour
>
> v2:
> - move the storing of the old mfn in page->index to gnttab_map_refs
> - move the function header update to a separate patch
>
> v3:
> - a new approach to retain old behaviour where it needed
> - squash the patches into one
>
> v4:
> - move out the common bits from m2p* functions, and pass pfn/mfn as parameter
> - clear page->private before doing anything with the page, so m2p_find_override
> won't race with this
I would prune this version information out of the commit message.
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> Suggested-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
But I think this needs Stefano's ack.
It would be useful to get this into 3.14 if possible.
David
^ permalink raw reply
* [patch net-next 6/6] rtnetlink: remove IFLA_SLAVES definition
From: Jiri Pirko @ 2014-01-22 10:43 UTC (permalink / raw)
To: netdev
Cc: davem, fubar, vfalico, andy, sfeldma, stephen, vyasevic,
nicolas.dichtel, john.r.fastabend
In-Reply-To: <1390377957-31466-1-git-send-email-jiri@resnulli.us>
Not used anymore, and never should be.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
include/uapi/linux/if_link.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index b8fb352..ed7b2f8 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -144,7 +144,6 @@ enum {
IFLA_NUM_RX_QUEUES,
IFLA_CARRIER,
IFLA_PHYS_PORT_ID,
- IFLA_BOND_SLAVE,
__IFLA_MAX
};
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH v3 1/4] net_dma: simple removal
From: saeed bishara @ 2014-01-22 10:38 UTC (permalink / raw)
To: Dan Williams
Cc: dmaengine@vger.kernel.org, Alexander Duyck, Dave Jiang,
Vinod Koul, netdev@vger.kernel.org, David Whipple, lkml,
David S. Miller
In-Reply-To: <CAPcyv4h10vAw3rwNo25+RaZwFsnrjWrpoktj+Yg8eR1QDDW2tw@mail.gmail.com>
On Tue, Jan 21, 2014 at 11:44 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Fri, Jan 17, 2014 at 12:16 PM, saeed bishara <saeed.bishara@gmail.com> wrote:
>> Dan,
>>
>> isn't this issue similar to direct io case?
>> can you please look at the following article
>> http://lwn.net/Articles/322795/
>
> I guess it's similar, but the NET_DMA dma api violation is more
> blatant. The same thread that requested DMA is also writing to those
> same pages with the cpu. The fix is either guaranteeing that only the
> dma engine ever touches the gup'd pages or synchronizing dma before
> every cpu fallback.
>
>> regarding performance improvement using NET_DMA, I don't have concrete
>> numbers, but it should be around 15-20%. my system is i/o coherent.
>
> That sounds too high... is that throughput or cpu utilization? It
that's the throughput improvement, my test is iperf server (no special
flags, 1500 mtu).
the iperf and 10G eth interrupts bound to same cpu which is the
bottleneck in my case.
I ran the following configurations
a. NET_DMA=n
b. NET_DMA=y
c. NET_DMA=y + your dma_debug patch below,
d. same as 3. by with my simple fix path.
results in Gbps:
a. 5.41
b. 6.17 (+14%)
c. 5.93 (+9%)
d. 5.92 (+9%)
Dan, my simple fix is just to call tcp_service_net_dma(sk, true)
whenever the cpu is going to copy the data. proper fix ofcourse can be
smarter.
do you think this is sufficient?
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1295,6 +1295,7 @@ static int tcp_recv_urg(struct sock *sk, struct
msghdr *msg, int len, int flags)
*/
return -EAGAIN;
}
+static void tcp_service_net_dma(struct sock *sk, bool wait);
static int tcp_peek_sndq(struct sock *sk, struct msghdr *msg, int len)
{
@@ -1302,6 +1303,7 @@ static int tcp_peek_sndq(struct sock *sk, struct
msghdr *msg, int len)
int copied = 0, err = 0;
/* XXX -- need to support SO_PEEK_OFF */
+ tcp_service_net_dma(sk, true); /* Wait for queue to drain */
skb_queue_walk(&sk->sk_write_queue, skb) {
err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, skb->len);
@@ -1861,6 +1863,8 @@ do_prequeue:
} else
#endif
{
+ tcp_service_net_dma(sk, true); /*
Wait for queue to drain */
+
err = skb_copy_datagram_iovec(skb, offset,
msg->msg_iov, used);
if (err) {
> sounds high because NET_DMA also makes the data cache cold while the
> cpu copy warms the data before handing it to the application.
for iperf case the test doesn't touch the data.
also, for some applications, specially storage, the data can also be
moved using dma.
so this actually can be big advantage.
>
> Can you measure relative numbers and share your testing details? You
> will need to fix the data corruption and verify that the performance
> advantage is still there before proposing NET_DMA be restored.
see above.
>
> I have a new dma_debug capability in Andrew's tree that can you help
> you identify holes in the implementation.
>
> http://ozlabs.org/~akpm/mmots/broken-out/dma-debug-introduce-debug_dma_assert_idle.patch
>
> --
> Dan
>
>>
>> saeed
>>
>> On Wed, Jan 15, 2014 at 11:33 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>> On Wed, Jan 15, 2014 at 1:31 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>>> On Wed, Jan 15, 2014 at 1:20 PM, saeed bishara <saeed.bishara@gmail.com> wrote:
>>>>> Hi Dan,
>>>>>
>>>>> I'm using net_dma on my system and I achieve meaningful performance
>>>>> boost when running Iperf receive.
>>>>>
>>>>> As far as I know the net_dma is used by many embedded systems out
>>>>> there and might effect their performance.
>>>>> Can you please elaborate on the exact scenario that cause the memory corruption?
>>>>>
>>>>> Is the scenario mentioned here caused by "real life" application or
>>>>> this is more of theoretical issue found through manual testing, I was
>>>>> trying to find the thread describing the failing scenario and couldn't
>>>>> find it, any pointer will be appreciated.
>>>>
>>>> Did you see the referenced commit?
>>>>
>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=77873803363c
>>>>
>>>> This is a real issue in that any app that forks() while receiving data
>>>> can cause the dma data to be lost. The problem is that the copy
>>>> operation falls back to cpu at many locations. Any one of those
>>>> instance could touch a mapped page and trigger a copy-on-write event.
>>>> The dma completes to the wrong location.
>>>>
>>>
>>> Btw, do you have benchmark data showing that NET_DMA is beneficial on
>>> these platforms? I would have expected worse performance on platforms
>>> without i/o coherent caches.
^ permalink raw reply
* Re: [PATCH RFC 4/6] net: rfkill: gpio: add device tree support
From: Linus Walleij @ 2014-01-22 9:58 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Arnd Bergmann,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Heikki Krogerus, netdev, linux-wireless, linux-sunxi,
linux-kernel, Maxime Ripard, Chen-Yu Tsai, Johannes Berg,
Mika Westerberg, David S. Miller,
linux-gpio-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <CAAVeFuKFfAvSPYLmvWV5jjT-peZFJ8sJ2bbh4F=JAYoWLhjZpA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Tue, Jan 21, 2014 at 3:53 PM, Alexandre Courbot <gnurou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> The good (or bad, rather) thing about DT is that we can do whatever we
> please with the new bindings: decide which name or which index
> (doesn't matter here) a GPIO should have. However we don't have this
> control over ACPI, where nothing guarantees that the same index will
> be used for the same GPIO function.
It's not like ACPI will impose some tables on us and expect us to
use them as-is no matter how crazy they are. Then indeed the whole
idea of unifying ACPI and DT accessors would be moot and it would
only work by mistake or as a good joke.
The situation with ACPI is just like with DT, it is assumed that the
author of these tables also look at the Linux kernel drivers to figure
out what argument goes where and we can influence them.
Yours,
Linus Walleij
^ permalink raw reply
* Re: [PATCH RFC 4/6] net: rfkill: gpio: add device tree support
From: Linus Walleij @ 2014-01-22 9:54 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Arnd Bergmann,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Heikki Krogerus, netdev, linux-wireless, linux-sunxi,
linux-kernel, Maxime Ripard, Chen-Yu Tsai, Johannes Berg,
Mika Westerberg, David S. Miller,
linux-gpio-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <CAAVeFuKFfAvSPYLmvWV5jjT-peZFJ8sJ2bbh4F=JAYoWLhjZpA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Tue, Jan 21, 2014 at 3:53 PM, Alexandre Courbot <gnurou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Tue, Jan 21, 2014 at 9:35 PM, Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org> wrote:
>> On Tuesday 21 January 2014, Linus Walleij wrote:
>>> As discussed earlier in this thread I'm not sure the con_id is
>>> suitable for labelling GPIOs. It'd be better to have a proper name
>>> specified in DT/ACPI instead.
>>
>> +1
>
> I wonder why you guys prefer to have the name defined in the GPIO
> mapping. Having the driver decide the label makes it easier to look up
> which GPIO does what in debugfs, whereas nothing prevents people to
> name GPIOs whatever inadequate name they want in the device DT node.
> What am I overlooking here?
The proper name of a GPIO does not come from the driver but from the
usecase, i.e. often the name of the rail on the board where it is used.
Remember GPIO are per definition general purpose, they cannot get
any clever names from the driver. They would just be named
"chip-foo-gpio0" thru "chip-foo-gpioN" if the driver was to name them.
Using the rail name on the board is way more useful. A GPIO line
named "HAL_SENSOR" or "MMC_CD" is actually telling us what
that line is used for.
But such names can only come from a DT or ACPI table that has
knowledge of how the GPIO is used on a certain system/board and
not just from the driver.
Yours,
Linus Walleij
^ permalink raw reply
* Re: [PATCH] bonding: Don't allow bond devices to change network namespaces.
From: Jiri Pirko @ 2014-01-22 9:41 UTC (permalink / raw)
To: Chen Weilong; +Cc: fubar, vfalico, andy, davem, netdev
In-Reply-To: <1390382190-1604-1-git-send-email-chenweilong@huawei.com>
Wed, Jan 22, 2014 at 10:16:30AM CET, chenweilong@huawei.com wrote:
>From: Weilong Chen <chenweilong@huawei.com>
>
>Like bridge, bonding as netdevice doesn't cross netns boundaries.
>
>Bonding ports and bonding itself live in same netns.
I think should should be done for team as well. Openvs already
has this. I believe that for vlans it is ok to change ns, right?
>
>Signed-off-by: Weilong Chen <chenweilong@huawei.com>
>---
> drivers/net/bonding/bond_main.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index f00dd45..897d153 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -3916,6 +3916,9 @@ void bond_setup(struct net_device *bond_dev)
> * capable
> */
>
>+ /* Don't allow bond devices to change network namespaces. */
>+ bond_dev->features |= NETIF_F_NETNS_LOCAL;
>+
> bond_dev->hw_features = BOND_VLAN_FEATURES |
> NETIF_F_HW_VLAN_CTAG_TX |
> NETIF_F_HW_VLAN_CTAG_RX |
>--
>1.7.12
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH net-next 0/3] bonding: Fix some issues for fail_over_mac
From: Ding Tianhong @ 2014-01-22 9:21 UTC (permalink / raw)
To: Jay Vosburgh, Veaceslav Falico, David S. Miller, Netdev,
Andy Gospodarek
The parameter fail_over_mac only affect active-backup mode, if it was
set to active or follow and works with other modes, just like RR or XOR
mode, the bonding could not set all slaves to the master's address, it
will cause the slave could not work well with master.
So set the fail_over_mac to none if the mode is not active-backup and
slight optimization for bond_set_mac_address().
v1->v2: According Jay's suggestion, that we should permit setting an option
at any time, but only have it take effect in active-backup mode, so
I add mode checking together with fail_over_mac during enslavement and
rebuild the patches.
Regards
Ding
Ding Tianhong (3):
bonding: Set the correct value to fail_over_mac at enslavement
bonding: only affect active-backup mode when fail_over_mac is not
none
bonding: cleanup some redundant code and wrong variables
drivers/net/bonding/bond_main.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
--
1.8.0
^ permalink raw reply
* [PATCH net-next 2/3] bonding: only affect active-backup mode when fail_over_mac is not none
From: Ding Tianhong @ 2014-01-22 9:22 UTC (permalink / raw)
To: Jay Vosburgh, Veaceslav Falico, David S. Miller, Netdev,
Andy Gospodarek
According bonding.txt, the fail_over_mac only affect active-backup mode,
but now the fail_over_mac could be set to active or follow in every mode,
this will make the new slave could not be set to bond's MAC address at
enslavement for several modes, such as RR or XOR.
So this patch fix two problems:
1). If the fail_over_mac is active or follow and the mode is not active-backup,
the new slave could be set to the bond's MAC address,
2). The first slave could work normally regardless it could support setting MAC
address or not, and the bonding always be set to first slave's MAC address,
so no need to set the first slave to bond's MAC address again when fail_over_mac
is none at enslavement.
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
drivers/net/bonding/bond_main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 598f100..ce0f5c0 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1387,7 +1387,9 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
*/
memcpy(new_slave->perm_hwaddr, slave_dev->dev_addr, ETH_ALEN);
- if (!bond->params.fail_over_mac) {
+ if (bond_has_slaves(bond) &&
+ (!bond->params.fail_over_mac ||
+ bond->params.mode != BOND_MODE_ACTIVEBACKUP)) {
/*
* Set slave to master's mac address. The application already
* set the master's mac address to that of the first slave
--
1.8.0
^ permalink raw reply related
* [PATCH net-next 1/3] bonding: Set the correct value to fail_over_mac at enslavement
From: Ding Tianhong @ 2014-01-22 9:22 UTC (permalink / raw)
To: Jay Vosburgh, Veaceslav Falico, David S. Miller, Netdev,
Andy Gospodarek
If the new slave don't support setting the MAC address, there are
two ways to handle this situation:
1). If the new slave is the first slave, set bond to the new slave's
MAC address, if the mode is active-backup, set fail_over_mac to
active, otherwise set fail_over_mac to none.
2). If the new slave is not the first slave and the fail_over_mac is
active, it means that the slave could work normally in active-backup
mode, otherwise if the fail_over_mac is none, the slave could not
work normally for no active-backup mode, so bond could not ensalve
the new dev.
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
drivers/net/bonding/bond_main.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 3220b48..598f100 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1334,9 +1334,17 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
if (slave_ops->ndo_set_mac_address == NULL) {
if (!bond_has_slaves(bond)) {
- pr_warning("%s: Warning: The first slave device specified does not support setting the MAC address. Setting fail_over_mac to active.",
+ pr_warning("%s: Warning: The first slave device specified does not support setting the MAC address.\n",
bond_dev->name);
- bond->params.fail_over_mac = BOND_FOM_ACTIVE;
+ if (bond->params.mode == BOND_MODE_ACTIVEBACKUP) {
+ bond->params.fail_over_mac = BOND_FOM_ACTIVE;
+ pr_warning("%s: Setting fail_over_mac to active for active-backup mode.\n",
+ bond_dev->name);
+ } else {
+ bond->params.fail_over_mac = BOND_FOM_NONE;
+ pr_warning("%s: Setting fail_over_mac to none for no active-backup modes",
+ bond_dev->name);
+ }
} else if (bond->params.fail_over_mac != BOND_FOM_ACTIVE) {
pr_err("%s: Error: The slave device specified does not support setting the MAC address, but fail_over_mac is not set to active.\n",
bond_dev->name);
--
1.8.0
^ permalink raw reply related
* [PATCH net-next 3/3] bonding: cleanup some redundant code and wrong variables
From: Ding Tianhong @ 2014-01-22 9:22 UTC (permalink / raw)
To: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller,
Netdev
The dev_set_mac_address() will check the dev->netdev_ops->ndo_set_mac_address,
so no need to check it in bond_set_mac_address().
Fix the wrong variables for pr_err().
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
drivers/net/bonding/bond_main.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index ce0f5c0..9d92f46 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3509,15 +3509,7 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr)
*/
bond_for_each_slave(bond, slave, iter) {
- const struct net_device_ops *slave_ops = slave->dev->netdev_ops;
pr_debug("slave %p %s\n", slave, slave->dev->name);
-
- if (slave_ops->ndo_set_mac_address == NULL) {
- res = -EOPNOTSUPP;
- pr_debug("EOPNOTSUPP %s\n", slave->dev->name);
- goto unwind;
- }
-
res = dev_set_mac_address(slave->dev, addr);
if (res) {
/* TODO: consider downing the slave
@@ -4317,7 +4309,7 @@ static int bond_check_params(struct bond_params *params)
fail_over_mac_tbl);
if (fail_over_mac_value == -1) {
pr_err("Error: invalid fail_over_mac \"%s\"\n",
- arp_validate == NULL ? "NULL" : arp_validate);
+ fail_over_mac == NULL ? "NULL" : fail_over_mac);
return -EINVAL;
}
--
1.8.0
^ permalink raw reply related
* [PATCH] bonding: Don't allow bond devices to change network namespaces.
From: Chen Weilong @ 2014-01-22 9:16 UTC (permalink / raw)
To: fubar, vfalico, andy, davem; +Cc: netdev
From: Weilong Chen <chenweilong@huawei.com>
Like bridge, bonding as netdevice doesn't cross netns boundaries.
Bonding ports and bonding itself live in same netns.
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
---
drivers/net/bonding/bond_main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index f00dd45..897d153 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3916,6 +3916,9 @@ void bond_setup(struct net_device *bond_dev)
* capable
*/
+ /* Don't allow bond devices to change network namespaces. */
+ bond_dev->features |= NETIF_F_NETNS_LOCAL;
+
bond_dev->hw_features = BOND_VLAN_FEATURES |
NETIF_F_HW_VLAN_CTAG_TX |
NETIF_F_HW_VLAN_CTAG_RX |
--
1.7.12
^ permalink raw reply related
* Re: [PATCH net-next v3] tuntap: Fix for a race in accessing numqueues
From: Jason Wang @ 2014-01-22 8:53 UTC (permalink / raw)
To: Dominic Curran, netdev; +Cc: Maxim Krasnyansky
In-Reply-To: <1390359803-27989-1-git-send-email-dominic.curran@citrix.com>
On 01/22/2014 11:03 AM, Dominic Curran wrote:
> A patch for fixing a race between queue selection and changing queues
> was introduced in commit 92bb73ea2("tuntap: fix a possible race between
> queue selection and changing queues").
>
> The fix was to prevent the driver from re-reading the tun->numqueues
> more than once within tun_select_queue() using ACCESS_ONCE().
>
> We have been experiancing 'Divide-by-zero' errors in tun_net_xmit()
> since we moved from 3.6 to 3.10, and believe that they come from a
> simular source where the value of tun->numqueues changes to zero
> between the first and a subsequent read of tun->numqueues.
>
> The fix is a simular use of ACCESS_ONCE(), as well as a multiply
> instead of a divide in the if statement.
>
> Signed-off-by: Dominic Curran <dominic.curran@citrix.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Maxim Krasnyansky <maxk@qti.qualcomm.com>
> ---
> V3: Rebase against net-next. Include all numqueues in function.
> V2: Use multiply instead of divide. Suggested by Eric Dumazet.
> Fixed email address for maxk. Rebase against net tree.
> ---
> drivers/net/tun.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> Index: net-next/drivers/net/tun.c
> ===================================================================
> --- net-next.orig/drivers/net/tun.c 2014-01-22 02:50:01.000000000 +0000
> +++ net-next/drivers/net/tun.c 2014-01-22 02:59:42.000000000 +0000
> @@ -738,15 +738,17 @@ static netdev_tx_t tun_net_xmit(struct s
> struct tun_struct *tun = netdev_priv(dev);
> int txq = skb->queue_mapping;
> struct tun_file *tfile;
> + u32 numqueues = 0;
>
> rcu_read_lock();
> tfile = rcu_dereference(tun->tfiles[txq]);
> + numqueues = ACCESS_ONCE(tun->numqueues);
>
> /* Drop packet if interface is not attached */
> - if (txq >= tun->numqueues)
> + if (txq >= numqueues)
> goto drop;
>
> - if (tun->numqueues == 1) {
> + if (numqueues == 1) {
> /* Select queue was not called for the skbuff, so we extract the
> * RPS hash and save it into the flow_table here.
> */
> @@ -779,8 +781,8 @@ static netdev_tx_t tun_net_xmit(struct s
> /* Limit the number of packets queued by dividing txq length with the
> * number of queues.
> */
> - if (skb_queue_len(&tfile->socket.sk->sk_receive_queue)
> - >= dev->tx_queue_len / tun->numqueues)
> + if (skb_queue_len(&tfile->socket.sk->sk_receive_queue) * numqueues
> + >= dev->tx_queue_len)
> goto drop;
>
> if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
Acked-by: Jason Wang <jasowang@redhat.com>
^ permalink raw reply
* [PATCH v2 2/2] net: Make minimum SO_SNDBUF size dependent on the protocol family
From: Michal Sojka @ 2014-01-22 8:27 UTC (permalink / raw)
To: linux-can; +Cc: netdev, Marc Kleine-Budde, Michal Sojka
In-Reply-To: <1390379257-9040-1-git-send-email-sojkam1@fel.cvut.cz>
For CAN bus it is desired to have the per-socket send queue limit much
smaller than for Ethernet-based protocols (see the previous patch).
This patch makes the lower limit of accepted setsockopt(SO_SNDBUF)
values smaller for PF_CAN sockets.
The value of SOCK_MIN_SNDBUF is kept unchanged, because it is only
used in two functions (sk_stream_moderate_sndbuf, tcp_out_of_memory)
that seem to be called for TCP/IP related protocols only and the
change from a constant to sk->sk_prot->min_sndbuf could have negative
performance impact.
Changes since v1:
- SOCK_MIN_SNDBUF changed back to a constant.
Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
---
include/net/sock.h | 1 +
net/can/raw.c | 1 +
net/core/sock.c | 6 ++++--
3 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 808cbc2..9ca3c0c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -969,6 +969,7 @@ struct proto {
int *sysctl_rmem;
int max_header;
bool no_autobind;
+ int min_sndbuf;
struct kmem_cache *slab;
unsigned int obj_size;
diff --git a/net/can/raw.c b/net/can/raw.c
index 4ad0bb2..b58f53f 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -818,6 +818,7 @@ static struct proto raw_proto __read_mostly = {
.owner = THIS_MODULE,
.obj_size = sizeof(struct raw_sock),
.init = raw_init,
+ .min_sndbuf = SKB_TRUESIZE(sizeof(struct can_frame) + sizeof(struct can_skb_priv)),
};
static const struct can_proto raw_can_proto = {
diff --git a/net/core/sock.c b/net/core/sock.c
index 0b39e7a..fddb6be 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -625,7 +625,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
char __user *optval, unsigned int optlen)
{
struct sock *sk = sock->sk;
- int val;
+ int val, min;
int valbool;
struct linger ling;
int ret = 0;
@@ -681,7 +681,9 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
val = min_t(u32, val, sysctl_wmem_max);
set_sndbuf:
sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
- sk->sk_sndbuf = max_t(u32, val * 2, SOCK_MIN_SNDBUF);
+ min = sk->sk_prot->min_sndbuf ?
+ sk->sk_prot->min_sndbuf : SOCK_MIN_SNDBUF;
+ sk->sk_sndbuf = max_t(u32, val * 2, min);
/* Wake up sending tasks if we upped the value. */
sk->sk_write_space(sk);
break;
--
1.8.5.2
^ permalink raw reply related
* [PATCH v2 1/2] can: Decrease default size of CAN_RAW socket send queue
From: Michal Sojka @ 2014-01-22 8:27 UTC (permalink / raw)
To: linux-can; +Cc: netdev, Marc Kleine-Budde, Michal Sojka
This fixes the infamous ENOBUFS problem, which appears when an
application sends CAN frames faster than they leave the interface.
Packets for sending can be queued at queueing discipline. Qdisc queue
has two limits: maximum length and per-socket byte limit (SO_SNDBUF).
Only the later limit can cause the sender to block. If maximum queue
length limit is reached before the per-socket limit, the application
receives ENOBUFS and there is no way how it can wait for the queue to
become free again. Since the length of the qdisc queue was set by
default to 10 packets, this is exactly what was happening.
This patch decreases the default per-socket limit to approximately 3
CAN frames and increases the length of the qdisc queue to 100 frames.
This setting allows for at least 33 CAN_RAW sockets to send
simultaneously to the same CAN interface without getting ENOBUFS
errors.
The exact maximum number of CAN frames that fit into the per-socket
limit is: 1+floor(sk_sndbuf/skb->truesize)
The calculation of the default sk_sndbuf value in the patch is only
approximate, because skb->truesize can be slightly greater than
SKB_TRUESIZE(). For example, for CAN frames on my 32 bit PowerPC
system, SKB_TRUESIZE() = 408, but skb->truesize = 448. Therefore, on
my system the per-socket limit allows 1+floor(3*408/448) =
1+floor(2.73) = 3 CAN frames to be queued.
Without this patch, the default per-socket limit is
/proc/sys/net/core/wmem_default, which is 163840 on my system. This
limit allows for queuing of 1+163840/448 = 366 CAN frames, which is
clearly more than the queue length (10 frames).
Since the per-socket limit is expressed in bytes, the number of queued
CANFD frames, which are bigger than CAN frames, may be lower. This is
not a big problem, because at least one frame could be always queued.
Changes since v1:
- Improved the commit message, added some number from my system.
Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
---
drivers/net/can/dev.c | 2 +-
net/can/raw.c | 4 ++++
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index 1870c47..a0bce83 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -492,7 +492,7 @@ static void can_setup(struct net_device *dev)
dev->mtu = CAN_MTU;
dev->hard_header_len = 0;
dev->addr_len = 0;
- dev->tx_queue_len = 10;
+ dev->tx_queue_len = 100;
/* New-style flags. */
dev->flags = IFF_NOARP;
diff --git a/net/can/raw.c b/net/can/raw.c
index fdda5f6..4ad0bb2 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -291,6 +291,10 @@ static int raw_init(struct sock *sk)
{
struct raw_sock *ro = raw_sk(sk);
+ /* allow at most 3 frames to wait for transmission in socket queue */
+ sk->sk_sndbuf = 3 * SKB_TRUESIZE(sizeof(struct can_frame) +
+ sizeof(struct can_skb_priv));
+
ro->bound = 0;
ro->ifindex = 0;
--
1.8.5.2
^ permalink raw reply related
* Re: [PATCH net-next v2 2/2] bonding: add netlink attributes to slave link dev
From: Jiri Pirko @ 2014-01-22 8:22 UTC (permalink / raw)
To: Scott Feldman
Cc: Veaceslav Falico, Jay Vosburgh, Andy Gospodarek, Netdev,
Roopa Prabhu, Shrijeet Mukherjee, Ding Tianhong
In-Reply-To: <C5AA1F6D-4FC7-42F6-A59F-A0114E1B2E8B@cumulusnetworks.com>
Tue, Jan 21, 2014 at 11:42:58PM CET, sfeldma@cumulusnetworks.com wrote:
>
>On Jan 21, 2014, at 2:00 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>
>> Tue, Jan 21, 2014 at 10:36:58PM CET, sfeldma@cumulusnetworks.com wrote:
>>>
>>> On Jan 21, 2014, at 5:34 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>
>>>>> + if (rtnl_bond_slave_fill(skb, dev))
>>>>> + goto nla_put_failure;
>>>>> +
>>>>
>>>> I must say I do not like this at all. This should be done in a generic
>>>> way. By a callback registered by bonding and possibly other master-slave
>>>> device types.
>>>
>>> The bond was registered with the ndo_get_slave op. ndo_get_slave could be used for other master-slave device types. I’ll agree that rtnl_bond_slave_fill() could have been written more generically. Is that the objection?
>>
>> I think is should be done rather in rtnl_link_ops. It's the natural point
>> for this ops. I have patchset prepared. Will send it very soon.
>
>Ok, cool.
>
>Also, right now I have IFLA_SLAVE as a nest for IFLA_SLAVE_xxx attrs. Do you think we should have a two-layer nest so we can capture other master-slave devices rather than just bond slaves? I.e.:
>
> IFLA_SLAVE
> IFLA_BOND_SLAVE
> IFLA_BOND_SLAVE_xxx
> IFLA_BOND_SLAVE_yyy
> IFLA_BOND_SLAVE_zzz
> IFLA_FOO_SLAVE // FOO is some other non-bond master
> IFLA_FOO_SLAVE_xxx
> IFLA_FOO_SLAVE_yyy
> IFLA_FOO_SLAVE_zzz
>
>(Of course, slave wouldn’t be bond and foo slave at same time).
I would rather do this in LINKINFO nest the same way IFLA_BOND_* are
done. Please see following patch:
http://patchwork.ozlabs.org/patch/313156/
>
>-scott
^ permalink raw reply
* [patch net-next 5/5] rtnetlink: remove ndo_get_slave
From: Jiri Pirko @ 2014-01-22 8:05 UTC (permalink / raw)
To: netdev
Cc: davem, fubar, vfalico, andy, sfeldma, stephen, vyasevic,
nicolas.dichtel, john.r.fastabend
In-Reply-To: <1390377957-31466-1-git-send-email-jiri@resnulli.us>
No longer used API bond-specific can be removed now. This is now handled
in a generic way in rtnl_link_ops.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
include/linux/netdevice.h | 5 -----
1 file changed, 5 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 83ce2ae..e985231 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -921,9 +921,6 @@ struct netdev_phys_port_id {
* int (*ndo_del_slave)(struct net_device *dev, struct net_device *slave_dev);
* Called to release previously enslaved netdev.
*
- * int (*ndo_get_slave)(struct net_device *slave_dev, struct sk_buff *skb);
- * Called to fill netlink skb with slave info.
- *
* Feature/offload setting functions.
* netdev_features_t (*ndo_fix_features)(struct net_device *dev,
* netdev_features_t features);
@@ -1096,8 +1093,6 @@ struct net_device_ops {
struct net_device *slave_dev);
int (*ndo_del_slave)(struct net_device *dev,
struct net_device *slave_dev);
- int (*ndo_get_slave)(struct net_device *slave_dev,
- struct sk_buff *skb);
netdev_features_t (*ndo_fix_features)(struct net_device *dev,
netdev_features_t features);
int (*ndo_set_features)(struct net_device *dev,
--
1.8.3.1
^ permalink raw reply related
* [patch net-next 4/5] bonding: convert netlink to use slave data info api
From: Jiri Pirko @ 2014-01-22 8:05 UTC (permalink / raw)
To: netdev
Cc: davem, fubar, vfalico, andy, sfeldma, stephen, vyasevic,
nicolas.dichtel, john.r.fastabend
In-Reply-To: <1390377957-31466-1-git-send-email-jiri@resnulli.us>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
drivers/net/bonding/bond_main.c | 1 -
drivers/net/bonding/bond_netlink.c | 21 ++++++++++++++--
drivers/net/bonding/bonding.h | 1 -
net/core/rtnetlink.c | 51 --------------------------------------
4 files changed, 19 insertions(+), 55 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 3220b48..df85cec 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3883,7 +3883,6 @@ static const struct net_device_ops bond_netdev_ops = {
#endif
.ndo_add_slave = bond_enslave,
.ndo_del_slave = bond_release,
- .ndo_get_slave = bond_get_slave,
.ndo_fix_features = bond_fix_features,
};
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index dd786a3..524d7ce 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -22,10 +22,23 @@
#include <linux/reciprocal_div.h>
#include "bonding.h"
-int bond_get_slave(struct net_device *slave_dev, struct sk_buff *skb)
+static size_t bond_get_slave_size(const struct net_device *bond_dev,
+ const struct net_device *slave_dev)
+{
+ return nla_total_size(sizeof(u8)) + /* IFLA_BOND_SLAVE_STATE */
+ nla_total_size(sizeof(u8)) + /* IFLA_BOND_SLAVE_MII_STATUS */
+ nla_total_size(sizeof(u32)) + /* IFLA_BOND_SLAVE_LINK_FAILURE_COUNT */
+ nla_total_size(MAX_ADDR_LEN) + /* IFLA_BOND_SLAVE_PERM_HWADDR */
+ nla_total_size(sizeof(u16)) + /* IFLA_BOND_SLAVE_QUEUE_ID */
+ nla_total_size(sizeof(u16)) + /* IFLA_BOND_SLAVE_AD_AGGREGATOR_ID */
+ 0;
+}
+
+static int bond_fill_slave_info(struct sk_buff *skb,
+ const struct net_device *bond_dev,
+ const struct net_device *slave_dev)
{
struct slave *slave = bond_slave_get_rtnl(slave_dev);
- const struct aggregator *agg;
if (nla_put_u8(skb, IFLA_BOND_SLAVE_STATE, bond_slave_state(slave)))
goto nla_put_failure;
@@ -45,6 +58,8 @@ int bond_get_slave(struct net_device *slave_dev, struct sk_buff *skb)
goto nla_put_failure;
if (slave->bond->params.mode == BOND_MODE_8023AD) {
+ const struct aggregator *agg;
+
agg = SLAVE_AD_INFO(slave).port.aggregator;
if (agg)
if (nla_put_u16(skb, IFLA_BOND_SLAVE_AD_AGGREGATOR_ID,
@@ -518,6 +533,8 @@ struct rtnl_link_ops bond_link_ops __read_mostly = {
.get_num_tx_queues = bond_get_num_tx_queues,
.get_num_rx_queues = bond_get_num_tx_queues, /* Use the same number
as for TX queues */
+ .get_slave_size = bond_get_slave_size,
+ .fill_slave_info = bond_fill_slave_info,
};
int __init bond_netlink_init(void)
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 8a935f8..5033e83 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -432,7 +432,6 @@ int bond_sysfs_slave_add(struct slave *slave);
void bond_sysfs_slave_del(struct slave *slave);
int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev);
int bond_release(struct net_device *bond_dev, struct net_device *slave_dev);
-int bond_get_slave(struct net_device *slave_dev, struct sk_buff *skb);
int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count);
int bond_parse_parm(const char *mode_arg, const struct bond_parm_tbl *tbl);
int bond_parm_tbl_lookup(int mode, const struct bond_parm_tbl *tbl);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a56bccf..db6a239 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -800,28 +800,6 @@ static size_t rtnl_port_size(const struct net_device *dev)
return port_self_size;
}
-static size_t rtnl_bond_slave_size(const struct net_device *dev)
-{
- struct net_device *bond;
- size_t slave_size =
- nla_total_size(sizeof(struct nlattr)) + /* IFLA_BOND_SLAVE */
- nla_total_size(1) + /* IFLA_BOND_SLAVE_STATE */
- nla_total_size(1) + /* IFLA_BOND_SLAVE_MII_STATUS */
- nla_total_size(4) + /* IFLA_BOND_SLAVE_LINK_FAILURE_COUNT */
- nla_total_size(MAX_ADDR_LEN) + /* IFLA_BOND_SLAVE_PERM_HWADDR */
- nla_total_size(2) + /* IFLA_BOND_SLAVE_QUEUE_ID */
- nla_total_size(2) + /* IFLA_BOND_SLAVE_AD_AGGREGATOR_ID */
- 0;
-
- if (netif_is_bond_slave((struct net_device *)dev)) {
- bond = netdev_master_upper_dev_get((struct net_device *)dev);
- if (bond && bond->netdev_ops->ndo_get_slave)
- return slave_size;
- }
-
- return 0;
-}
-
static noinline size_t if_nlmsg_size(const struct net_device *dev,
u32 ext_filter_mask)
{
@@ -851,7 +829,6 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
+ rtnl_port_size(dev) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
+ rtnl_link_get_size(dev) /* IFLA_LINKINFO */
+ rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
- + rtnl_bond_slave_size(dev) /* IFLA_SLAVE */
+ nla_total_size(MAX_PHYS_PORT_ID_LEN); /* IFLA_PHYS_PORT_ID */
}
@@ -949,34 +926,6 @@ static int rtnl_phys_port_id_fill(struct sk_buff *skb, struct net_device *dev)
return 0;
}
-static size_t rtnl_bond_slave_fill(struct sk_buff *skb, struct net_device *dev)
-{
- struct net_device *bond;
- struct nlattr *nest;
- int err;
-
- if (!netif_is_bond_slave(dev))
- return 0;
-
- bond = netdev_master_upper_dev_get(dev);
- if (!bond || !bond->netdev_ops->ndo_get_slave)
- return 0;
-
- nest = nla_nest_start(skb, IFLA_BOND_SLAVE);
- if (!nest)
- return -EMSGSIZE;
-
- err = bond->netdev_ops->ndo_get_slave(dev, skb);
- if (err) {
- nla_nest_cancel(skb, nest);
- return (err == -EMSGSIZE) ? err : 0;
- }
-
- nla_nest_end(skb, nest);
-
- return 0;
-}
-
static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
int type, u32 pid, u32 seq, u32 change,
unsigned int flags, u32 ext_filter_mask)
--
1.8.3.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox