* [PATCH net 0/2] Prevent extreme TSO parameters from stalling TX queues @ 2012-07-30 17:14 Ben Hutchings 2012-07-30 17:16 ` [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb Ben Hutchings 2012-07-30 17:17 ` [PATCH net 2/2] sfc: Correct the minimum TX queue size Ben Hutchings 0 siblings, 2 replies; 13+ messages in thread From: Ben Hutchings @ 2012-07-30 17:14 UTC (permalink / raw) To: David Miller; +Cc: netdev, linux-net-drivers The following changes fix a potential DoS by peers or local users on network interfaces using the sfc driver (and possibly others) with TSO enabled (as it is by default). Please apply them to the net tree and your stable update queue. Ben. Ben Hutchings (2): tcp: Limit number of segments generated by GSO per skb sfc: Correct the minimum TX queue size drivers/net/ethernet/sfc/efx.c | 5 +++++ drivers/net/ethernet/sfc/efx.h | 11 +++++++---- drivers/net/ethernet/sfc/ethtool.c | 16 +++++++++++----- drivers/net/ethernet/sfc/tx.c | 20 ++++++++++++++++++++ include/net/tcp.h | 3 +++ net/ipv4/tcp.c | 4 +++- net/ipv4/tcp_output.c | 17 +++++++++-------- 7 files changed, 58 insertions(+), 18 deletions(-) -- 1.7.7.6 -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb 2012-07-30 17:14 [PATCH net 0/2] Prevent extreme TSO parameters from stalling TX queues Ben Hutchings @ 2012-07-30 17:16 ` Ben Hutchings 2012-07-30 17:23 ` Ben Greear 2012-07-30 17:31 ` Eric Dumazet 2012-07-30 17:17 ` [PATCH net 2/2] sfc: Correct the minimum TX queue size Ben Hutchings 1 sibling, 2 replies; 13+ messages in thread From: Ben Hutchings @ 2012-07-30 17:16 UTC (permalink / raw) To: David Miller; +Cc: netdev, linux-net-drivers A peer (or local user) may cause TCP to use a nominal MSS of as little as 88 (actual MSS of 76 with timestamps). Given that we have a sufficiently prodigious local sender and the peer ACKs quickly enough, it is nevertheless possible to grow the window for such a connection to the point that we will try to send just under 64K at once. This results in a single skb that expands to 861 segments. In some drivers with TSO support, such an skb will require hundreds of DMA descriptors; a substantial fraction of a TX ring or even more than a full ring. The TX queue selected for the skb may stall and trigger the TX watchdog repeatedly (since the problem skb will be retried after the TX reset). This particularly affects sfc, for which the issue is designated as CVE-2012-3412. However it may be that some hardware or firmware also fails to handle such an extreme TSO request correctly. Therefore, limit the number of segments per skb to 100. This should make no difference to behaviour unless the actual MSS is less than about 700. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> --- include/net/tcp.h | 3 +++ net/ipv4/tcp.c | 4 +++- net/ipv4/tcp_output.c | 17 +++++++++-------- 3 files changed, 15 insertions(+), 9 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index e19124b..098a2d0 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -70,6 +70,9 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo); /* The least MTU to use for probing */ #define TCP_BASE_MSS 512 +/* Maximum number of segments we may require GSO to generate from an skb. */ +#define TCP_MAX_GSO_SEGS 100 + /* After receiving this amount of duplicate ACKs fast retransmit starts. */ #define TCP_FASTRETRANS_THRESH 3 diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e7e6eea..51d8daf 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -811,7 +811,9 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now, old_size_goal + mss_now > xmit_size_goal)) { xmit_size_goal = old_size_goal; } else { - tp->xmit_size_goal_segs = xmit_size_goal / mss_now; + tp->xmit_size_goal_segs = + min_t(u32, xmit_size_goal / mss_now, + TCP_MAX_GSO_SEGS); xmit_size_goal = tp->xmit_size_goal_segs * mss_now; } } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 33cd065..c86c288 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1522,21 +1522,21 @@ static void tcp_cwnd_validate(struct sock *sk) * when we would be allowed to send the split-due-to-Nagle skb fully. */ static unsigned int tcp_mss_split_point(const struct sock *sk, const struct sk_buff *skb, - unsigned int mss_now, unsigned int cwnd) + unsigned int mss_now, unsigned int max_segs) { const struct tcp_sock *tp = tcp_sk(sk); - u32 needed, window, cwnd_len; + u32 needed, window, max_len; window = tcp_wnd_end(tp) - TCP_SKB_CB(skb)->seq; - cwnd_len = mss_now * cwnd; + max_len = mss_now * max_segs; - if (likely(cwnd_len <= window && skb != tcp_write_queue_tail(sk))) - return cwnd_len; + if (likely(max_len <= window && skb != tcp_write_queue_tail(sk))) + return max_len; needed = min(skb->len, window); - if (cwnd_len <= needed) - return cwnd_len; + if (max_len <= needed) + return max_len; return needed - needed % mss_now; } @@ -1999,7 +1999,8 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, limit = mss_now; if (tso_segs > 1 && !tcp_urg_mode(tp)) limit = tcp_mss_split_point(sk, skb, mss_now, - cwnd_quota); + min(cwnd_quota, + TCP_MAX_GSO_SEGS)); if (skb->len > limit && unlikely(tso_fragment(sk, skb, limit, mss_now, gfp))) -- 1.7.7.6 -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb 2012-07-30 17:16 ` [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb Ben Hutchings @ 2012-07-30 17:23 ` Ben Greear 2012-07-30 19:41 ` Ben Hutchings 2012-07-30 17:31 ` Eric Dumazet 1 sibling, 1 reply; 13+ messages in thread From: Ben Greear @ 2012-07-30 17:23 UTC (permalink / raw) To: Ben Hutchings; +Cc: David Miller, netdev, linux-net-drivers On 07/30/2012 10:16 AM, Ben Hutchings wrote: > A peer (or local user) may cause TCP to use a nominal MSS of as little > as 88 (actual MSS of 76 with timestamps). Given that we have a > sufficiently prodigious local sender and the peer ACKs quickly enough, > it is nevertheless possible to grow the window for such a connection > to the point that we will try to send just under 64K at once. This > results in a single skb that expands to 861 segments. > > In some drivers with TSO support, such an skb will require hundreds of > DMA descriptors; a substantial fraction of a TX ring or even more than > a full ring. The TX queue selected for the skb may stall and trigger > the TX watchdog repeatedly (since the problem skb will be retried > after the TX reset). This particularly affects sfc, for which the > issue is designated as CVE-2012-3412. However it may be that some > hardware or firmware also fails to handle such an extreme TSO request > correctly. > > Therefore, limit the number of segments per skb to 100. This should > make no difference to behaviour unless the actual MSS is less than > about 700. Please do not do this...or at least allow over-rides. We love the trick of seting very small MSS and making the NICs generate huge numbers of small TCP frames with efficient user-space logic. We use this for stateful TCP load testing when high numbers of tcp packets-per-second is desired. Intel NICs, including 10G, work just fine with minimal MSS in this scenario. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb 2012-07-30 17:23 ` Ben Greear @ 2012-07-30 19:41 ` Ben Hutchings 2012-07-30 21:00 ` Ben Greear 0 siblings, 1 reply; 13+ messages in thread From: Ben Hutchings @ 2012-07-30 19:41 UTC (permalink / raw) To: Ben Greear; +Cc: David Miller, netdev, linux-net-drivers On Mon, 2012-07-30 at 10:23 -0700, Ben Greear wrote: > On 07/30/2012 10:16 AM, Ben Hutchings wrote: > > A peer (or local user) may cause TCP to use a nominal MSS of as little > > as 88 (actual MSS of 76 with timestamps). Given that we have a > > sufficiently prodigious local sender and the peer ACKs quickly enough, > > it is nevertheless possible to grow the window for such a connection > > to the point that we will try to send just under 64K at once. This > > results in a single skb that expands to 861 segments. > > > > In some drivers with TSO support, such an skb will require hundreds of > > DMA descriptors; a substantial fraction of a TX ring or even more than > > a full ring. The TX queue selected for the skb may stall and trigger > > the TX watchdog repeatedly (since the problem skb will be retried > > after the TX reset). This particularly affects sfc, for which the > > issue is designated as CVE-2012-3412. However it may be that some > > hardware or firmware also fails to handle such an extreme TSO request > > correctly. > > > > Therefore, limit the number of segments per skb to 100. This should > > make no difference to behaviour unless the actual MSS is less than > > about 700. > > Please do not do this...or at least allow over-rides. We love > the trick of seting very small MSS and making the NICs generate > huge numbers of small TCP frames with efficient user-space > logic. We use this for stateful TCP load testing when high > numbers of tcp packets-per-second is desired. Please test whether this actually makes a difference - my suspicion is that 100 segments per skb is easily enough to prevent the host being a bottleneck. > Intel NICs, including 10G, work just fine with minimal MSS > in this scenario. I'll leave this to the Intel maintainers to answer. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb 2012-07-30 19:41 ` Ben Hutchings @ 2012-07-30 21:00 ` Ben Greear 0 siblings, 0 replies; 13+ messages in thread From: Ben Greear @ 2012-07-30 21:00 UTC (permalink / raw) To: Ben Hutchings; +Cc: David Miller, netdev, linux-net-drivers On 07/30/2012 12:41 PM, Ben Hutchings wrote: > On Mon, 2012-07-30 at 10:23 -0700, Ben Greear wrote: >> On 07/30/2012 10:16 AM, Ben Hutchings wrote: >>> A peer (or local user) may cause TCP to use a nominal MSS of as little >>> as 88 (actual MSS of 76 with timestamps). Given that we have a >>> sufficiently prodigious local sender and the peer ACKs quickly enough, >>> it is nevertheless possible to grow the window for such a connection >>> to the point that we will try to send just under 64K at once. This >>> results in a single skb that expands to 861 segments. >>> >>> In some drivers with TSO support, such an skb will require hundreds of >>> DMA descriptors; a substantial fraction of a TX ring or even more than >>> a full ring. The TX queue selected for the skb may stall and trigger >>> the TX watchdog repeatedly (since the problem skb will be retried >>> after the TX reset). This particularly affects sfc, for which the >>> issue is designated as CVE-2012-3412. However it may be that some >>> hardware or firmware also fails to handle such an extreme TSO request >>> correctly. >>> >>> Therefore, limit the number of segments per skb to 100. This should >>> make no difference to behaviour unless the actual MSS is less than >>> about 700. >> >> Please do not do this...or at least allow over-rides. We love >> the trick of seting very small MSS and making the NICs generate >> huge numbers of small TCP frames with efficient user-space >> logic. We use this for stateful TCP load testing when high >> numbers of tcp packets-per-second is desired. > > Please test whether this actually makes a difference - my suspicion is > that 100 segments per skb is easily enough to prevent the host being a > bottleneck. Any CPU I can save I can use for other tasks. If we can use the NIC's offload features to segment pkts, then we get near linear increase in pkts-per-second by adding NICs..at least up to whatever the total bandwidth of the system is... If you want to have the OS default to a safe value, that is fine by me..but please give us a tunable so that we can get the old behaviour. It's always possible I'm not the only one using this, and I think it would be considered bad form to break existing features and provide no work-around. Thanks, Ben > >> Intel NICs, including 10G, work just fine with minimal MSS >> in this scenario. > > I'll leave this to the Intel maintainers to answer. > > Ben. > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb 2012-07-30 17:16 ` [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb Ben Hutchings 2012-07-30 17:23 ` Ben Greear @ 2012-07-30 17:31 ` Eric Dumazet 2012-07-30 19:35 ` Ben Hutchings 1 sibling, 1 reply; 13+ messages in thread From: Eric Dumazet @ 2012-07-30 17:31 UTC (permalink / raw) To: Ben Hutchings; +Cc: David Miller, netdev, linux-net-drivers On Mon, 2012-07-30 at 18:16 +0100, Ben Hutchings wrote: > A peer (or local user) may cause TCP to use a nominal MSS of as little > as 88 (actual MSS of 76 with timestamps). Given that we have a > sufficiently prodigious local sender and the peer ACKs quickly enough, > it is nevertheless possible to grow the window for such a connection > to the point that we will try to send just under 64K at once. This > results in a single skb that expands to 861 segments. > > In some drivers with TSO support, such an skb will require hundreds of > DMA descriptors; a substantial fraction of a TX ring or even more than > a full ring. The TX queue selected for the skb may stall and trigger > the TX watchdog repeatedly (since the problem skb will be retried > after the TX reset). This particularly affects sfc, for which the > issue is designated as CVE-2012-3412. However it may be that some > hardware or firmware also fails to handle such an extreme TSO request > correctly. > > Therefore, limit the number of segments per skb to 100. This should > make no difference to behaviour unless the actual MSS is less than > about 700. > > Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> > --- Hmm, isnt GRO path also vulnerable ? An alternative would be to drop such frames in the ndo_start_xmit(), and cap sk->sk_gso_max_size (since skb are no longer orphaned...) Or you could introduce a new wk->sk_gso_max_segments, that your sfc driver sets to whatever limit ? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb 2012-07-30 17:31 ` Eric Dumazet @ 2012-07-30 19:35 ` Ben Hutchings 2012-07-30 19:56 ` Ben Hutchings 2012-07-30 21:46 ` David Miller 0 siblings, 2 replies; 13+ messages in thread From: Ben Hutchings @ 2012-07-30 19:35 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, linux-net-drivers On Mon, 2012-07-30 at 19:31 +0200, Eric Dumazet wrote: > On Mon, 2012-07-30 at 18:16 +0100, Ben Hutchings wrote: > > A peer (or local user) may cause TCP to use a nominal MSS of as little > > as 88 (actual MSS of 76 with timestamps). Given that we have a > > sufficiently prodigious local sender and the peer ACKs quickly enough, > > it is nevertheless possible to grow the window for such a connection > > to the point that we will try to send just under 64K at once. This > > results in a single skb that expands to 861 segments. > > > > In some drivers with TSO support, such an skb will require hundreds of > > DMA descriptors; a substantial fraction of a TX ring or even more than > > a full ring. The TX queue selected for the skb may stall and trigger > > the TX watchdog repeatedly (since the problem skb will be retried > > after the TX reset). This particularly affects sfc, for which the > > issue is designated as CVE-2012-3412. However it may be that some > > hardware or firmware also fails to handle such an extreme TSO request > > correctly. > > > > Therefore, limit the number of segments per skb to 100. This should > > make no difference to behaviour unless the actual MSS is less than > > about 700. > > > > Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> > > --- > > > Hmm, isnt GRO path also vulnerable ? You mean, for forwarding? If page fragments are used, the number of segments is limited to MAX_SKB_FRAGS < 100. But if skbs are aggregated and build_skb() is not used (e.g. due to jumbo MTU) it appears we would need an explicit limit. Something like this: --- From: Ben Hutchings <bhutchings@solarflare.com> Subject: [PATCH net] tcp: Limit number of segments merged by GRO In the case where GRO aggregates skbs that cannot be converted to page-fragments, there is currently no limit to the number of segments that may be merged and subsequently re-segmented by GSO. Apply the same limit as was introduced for locally-generated GSO skbs in 'tcp: Limit number of segments generated by GSO per skb'. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> --- net/ipv4/tcp.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 51d8daf..a052d07 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3144,7 +3144,8 @@ out_check_final: TCP_FLAG_RST | TCP_FLAG_SYN | TCP_FLAG_FIN)); - if (p && (!NAPI_GRO_CB(skb)->same_flow || flush)) + if (p && (!NAPI_GRO_CB(skb)->same_flow || flush || + NAPI_GRO_CB(p)->count == TCP_MAX_GSO_SEGS)) pp = head; out: --- > An alternative would be to drop such frames in the ndo_start_xmit(), and > cap sk->sk_gso_max_size (since skb are no longer orphaned...) I have implemented that workaround for the out-of-tree version of sfc. For the in-tree driver, I thought it would be better to limit the number of segments at source, which will avoid penalising any cases where the window can grow so much larger than MSS. > Or you could introduce a new wk->sk_gso_max_segments, that your sfc > driver sets to whatever limit ? Yes, that's another option. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb 2012-07-30 19:35 ` Ben Hutchings @ 2012-07-30 19:56 ` Ben Hutchings 2012-07-30 21:46 ` David Miller 1 sibling, 0 replies; 13+ messages in thread From: Ben Hutchings @ 2012-07-30 19:56 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, linux-net-drivers On Mon, 2012-07-30 at 20:35 +0100, Ben Hutchings wrote: > On Mon, 2012-07-30 at 19:31 +0200, Eric Dumazet wrote: > > On Mon, 2012-07-30 at 18:16 +0100, Ben Hutchings wrote: > > > A peer (or local user) may cause TCP to use a nominal MSS of as little > > > as 88 (actual MSS of 76 with timestamps). Given that we have a > > > sufficiently prodigious local sender and the peer ACKs quickly enough, > > > it is nevertheless possible to grow the window for such a connection > > > to the point that we will try to send just under 64K at once. This > > > results in a single skb that expands to 861 segments. > > > > > > In some drivers with TSO support, such an skb will require hundreds of > > > DMA descriptors; a substantial fraction of a TX ring or even more than > > > a full ring. The TX queue selected for the skb may stall and trigger > > > the TX watchdog repeatedly (since the problem skb will be retried > > > after the TX reset). This particularly affects sfc, for which the > > > issue is designated as CVE-2012-3412. However it may be that some > > > hardware or firmware also fails to handle such an extreme TSO request > > > correctly. > > > > > > Therefore, limit the number of segments per skb to 100. This should > > > make no difference to behaviour unless the actual MSS is less than > > > about 700. [...] > > An alternative would be to drop such frames in the ndo_start_xmit(), and > > cap sk->sk_gso_max_size (since skb are no longer orphaned...) > > I have implemented that workaround for the out-of-tree version of sfc. > For the in-tree driver, I thought it would be better to limit the number > of segments at source, which will avoid penalising any cases where the > window can grow so much larger than MSS. [...] I mean any *legitimate* cases where this can happen. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb 2012-07-30 19:35 ` Ben Hutchings 2012-07-30 19:56 ` Ben Hutchings @ 2012-07-30 21:46 ` David Miller 2012-07-30 22:20 ` Ben Hutchings 1 sibling, 1 reply; 13+ messages in thread From: David Miller @ 2012-07-30 21:46 UTC (permalink / raw) To: bhutchings; +Cc: eric.dumazet, netdev, linux-net-drivers From: Ben Hutchings <bhutchings@solarflare.com> Date: Mon, 30 Jul 2012 20:35:52 +0100 > On Mon, 2012-07-30 at 19:31 +0200, Eric Dumazet wrote: >> Or you could introduce a new wk->sk_gso_max_segments, that your sfc >> driver sets to whatever limit ? > > Yes, that's another option. This is how I want this handled. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb 2012-07-30 21:46 ` David Miller @ 2012-07-30 22:20 ` Ben Hutchings 2012-07-30 22:50 ` Stephen Hemminger 0 siblings, 1 reply; 13+ messages in thread From: Ben Hutchings @ 2012-07-30 22:20 UTC (permalink / raw) To: David Miller; +Cc: eric.dumazet, netdev, linux-net-drivers On Mon, 2012-07-30 at 14:46 -0700, David Miller wrote: > From: Ben Hutchings <bhutchings@solarflare.com> > Date: Mon, 30 Jul 2012 20:35:52 +0100 > > > On Mon, 2012-07-30 at 19:31 +0200, Eric Dumazet wrote: > >> Or you could introduce a new wk->sk_gso_max_segments, that your sfc > >> driver sets to whatever limit ? > > > > Yes, that's another option. > > This is how I want this handled. How should that be applied in the GRO-forwarding case? Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb 2012-07-30 22:20 ` Ben Hutchings @ 2012-07-30 22:50 ` Stephen Hemminger 2012-07-30 23:07 ` Ben Hutchings 0 siblings, 1 reply; 13+ messages in thread From: Stephen Hemminger @ 2012-07-30 22:50 UTC (permalink / raw) To: Ben Hutchings; +Cc: David Miller, eric.dumazet, netdev, linux-net-drivers On Mon, 30 Jul 2012 23:20:57 +0100 Ben Hutchings <bhutchings@solarflare.com> wrote: > On Mon, 2012-07-30 at 14:46 -0700, David Miller wrote: > > From: Ben Hutchings <bhutchings@solarflare.com> > > Date: Mon, 30 Jul 2012 20:35:52 +0100 > > > > > On Mon, 2012-07-30 at 19:31 +0200, Eric Dumazet wrote: > > >> Or you could introduce a new wk->sk_gso_max_segments, that your sfc > > >> driver sets to whatever limit ? > > > > > > Yes, that's another option. > > > > This is how I want this handled. > > How should that be applied in the GRO-forwarding case? > > Ben. > Why not make max_frags a property of the device? Something like the following untested idea: diff --git a/net/core/dev.c b/net/core/dev.c index 0ebaea1..bfb005b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2159,14 +2159,16 @@ EXPORT_SYMBOL(netif_skb_features); * at least one of fragments is in highmem and device does not * support DMA from it. */ -static inline int skb_needs_linearize(struct sk_buff *skb, - int features) +static inline bool skb_needs_linearize(struct sk_buff *skb, + int features, unsigned int maxfrags) { - return skb_is_nonlinear(skb) && - ((skb_has_frag_list(skb) && - !(features & NETIF_F_FRAGLIST)) || - (skb_shinfo(skb)->nr_frags && - !(features & NETIF_F_SG))); + if (!skb_is_nonlinear(skb)) + return false; + + if (skb_has_frag_list(skb)) + return !(features & NETIF_F_FRAGLIST); + else + return skb_shinfo(skb)->nr_frags > maxfrags; } int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, @@ -2206,7 +2208,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, if (skb->next) goto gso; } else { - if (skb_needs_linearize(skb, features) && + if (skb_needs_linearize(skb, features, dev->max_frags) && __skb_linearize(skb)) goto out_kfree_skb; @@ -5544,6 +5546,20 @@ int register_netdevice(struct net_device *dev) dev->features |= NETIF_F_SOFT_FEATURES; dev->wanted_features = dev->features & dev->hw_features; + if (dev->max_frags > 0) { + if (!(features & NETIF_F_SG)) { + netdev_dbg(dev, + "Resetting max fragments since no NETIF_F_SG\n"); + dev->max_frags = 0; + } + } else { + /* If device has not set maximum number of fragments + * then assume it can take any number of them + */ + if (features & NETIF_F_SG) + dev->max_frags = MAX_SKB_FRAGS; + } + /* Turn on no cache copy if HW is doing checksum */ if (!(dev->flags & IFF_LOOPBACK)) { dev->hw_features |= NETIF_F_NOCACHE_COPY; ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb 2012-07-30 22:50 ` Stephen Hemminger @ 2012-07-30 23:07 ` Ben Hutchings 0 siblings, 0 replies; 13+ messages in thread From: Ben Hutchings @ 2012-07-30 23:07 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, eric.dumazet, netdev, linux-net-drivers On Mon, 2012-07-30 at 15:50 -0700, Stephen Hemminger wrote: > On Mon, 30 Jul 2012 23:20:57 +0100 > Ben Hutchings <bhutchings@solarflare.com> wrote: > > > On Mon, 2012-07-30 at 14:46 -0700, David Miller wrote: > > > From: Ben Hutchings <bhutchings@solarflare.com> > > > Date: Mon, 30 Jul 2012 20:35:52 +0100 > > > > > > > On Mon, 2012-07-30 at 19:31 +0200, Eric Dumazet wrote: > > > >> Or you could introduce a new wk->sk_gso_max_segments, that your sfc > > > >> driver sets to whatever limit ? > > > > > > > > Yes, that's another option. > > > > > > This is how I want this handled. > > > > How should that be applied in the GRO-forwarding case? > > > > Ben. > > > Why not make max_frags a property of the device? [...] This has nothing to do with the number of input fragments. But I think you're on the right track - this can be checked in netif_skb_features() or something like that. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH net 2/2] sfc: Correct the minimum TX queue size 2012-07-30 17:14 [PATCH net 0/2] Prevent extreme TSO parameters from stalling TX queues Ben Hutchings 2012-07-30 17:16 ` [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb Ben Hutchings @ 2012-07-30 17:17 ` Ben Hutchings 1 sibling, 0 replies; 13+ messages in thread From: Ben Hutchings @ 2012-07-30 17:17 UTC (permalink / raw) To: David Miller; +Cc: netdev, linux-net-drivers Currently an skb requiring TSO may not fit within a minimum-size TX queue. The TX queue selected for the skb may stall and trigger the TX watchdog repeatedly (since the problem skb will be retried after the TX reset). This issue is designated as CVE-2012-3412. The preceding change 'tcp: Limit number of segments generated by GSO per skb' fixed this for the default queue size. Increase the minimum to allow for 2 worst-case skbs, such that there will definitely be space to add an skb after we wake a queue. To avoid invalidating existing configurations, change efx_ethtool_set_ringparam() to fix up values that are too small rather than returning -EINVAL. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> --- drivers/net/ethernet/sfc/efx.c | 5 +++++ drivers/net/ethernet/sfc/efx.h | 11 +++++++---- drivers/net/ethernet/sfc/ethtool.c | 16 +++++++++++----- drivers/net/ethernet/sfc/tx.c | 20 ++++++++++++++++++++ 4 files changed, 43 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c index 70554a1..a5b5a75 100644 --- a/drivers/net/ethernet/sfc/efx.c +++ b/drivers/net/ethernet/sfc/efx.c @@ -1503,6 +1503,11 @@ static int efx_probe_all(struct efx_nic *efx) goto fail2; } + BUILD_BUG_ON(EFX_DEFAULT_DMAQ_SIZE < EFX_RXQ_MIN_ENT); + if (WARN_ON(EFX_DEFAULT_DMAQ_SIZE < EFX_TXQ_MIN_ENT(efx))) { + rc = -EINVAL; + goto fail3; + } efx->rxq_entries = efx->txq_entries = EFX_DEFAULT_DMAQ_SIZE; rc = efx_probe_filters(efx); diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h index be8f915..d95309c 100644 --- a/drivers/net/ethernet/sfc/efx.h +++ b/drivers/net/ethernet/sfc/efx.h @@ -30,6 +30,7 @@ extern netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb); extern void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index); extern int efx_setup_tc(struct net_device *net_dev, u8 num_tc); +extern unsigned int efx_tx_max_skb_descs(struct efx_nic *efx); /* RX */ extern int efx_probe_rx_queue(struct efx_rx_queue *rx_queue); @@ -52,10 +53,12 @@ extern void efx_schedule_slow_fill(struct efx_rx_queue *rx_queue); #define EFX_MAX_EVQ_SIZE 16384UL #define EFX_MIN_EVQ_SIZE 512UL -/* The smallest [rt]xq_entries that the driver supports. Callers of - * efx_wake_queue() assume that they can subsequently send at least one - * skb. Falcon/A1 may require up to three descriptors per skb_frag. */ -#define EFX_MIN_RING_SIZE (roundup_pow_of_two(2 * 3 * MAX_SKB_FRAGS)) +/* The smallest [rt]xq_entries that the driver supports. RX minimum + * is a bit arbitrary. For TX, we must have space for at least 2 + * TSO skbs. + */ +#define EFX_RXQ_MIN_ENT 128U +#define EFX_TXQ_MIN_ENT(efx) (2 * efx_tx_max_skb_descs(efx)) /* Filters */ extern int efx_probe_filters(struct efx_nic *efx); diff --git a/drivers/net/ethernet/sfc/ethtool.c b/drivers/net/ethernet/sfc/ethtool.c index 10536f9..8cba2df 100644 --- a/drivers/net/ethernet/sfc/ethtool.c +++ b/drivers/net/ethernet/sfc/ethtool.c @@ -680,21 +680,27 @@ static int efx_ethtool_set_ringparam(struct net_device *net_dev, struct ethtool_ringparam *ring) { struct efx_nic *efx = netdev_priv(net_dev); + u32 txq_entries; if (ring->rx_mini_pending || ring->rx_jumbo_pending || ring->rx_pending > EFX_MAX_DMAQ_SIZE || ring->tx_pending > EFX_MAX_DMAQ_SIZE) return -EINVAL; - if (ring->rx_pending < EFX_MIN_RING_SIZE || - ring->tx_pending < EFX_MIN_RING_SIZE) { + if (ring->rx_pending < EFX_RXQ_MIN_ENT) { netif_err(efx, drv, efx->net_dev, - "TX and RX queues cannot be smaller than %ld\n", - EFX_MIN_RING_SIZE); + "RX queues cannot be smaller than %u\n", + EFX_RXQ_MIN_ENT); return -EINVAL; } - return efx_realloc_channels(efx, ring->rx_pending, ring->tx_pending); + txq_entries = max(ring->tx_pending, EFX_TXQ_MIN_ENT(efx)); + if (txq_entries != ring->tx_pending) + netif_warn(efx, drv, efx->net_dev, + "increasing TX queue size to minimum of %u\n", + txq_entries); + + return efx_realloc_channels(efx, ring->rx_pending, txq_entries); } static int efx_ethtool_set_pauseparam(struct net_device *net_dev, diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c index 9b225a7..814bd17 100644 --- a/drivers/net/ethernet/sfc/tx.c +++ b/drivers/net/ethernet/sfc/tx.c @@ -15,6 +15,7 @@ #include <linux/ipv6.h> #include <linux/slab.h> #include <net/ipv6.h> +#include <net/tcp.h> #include <linux/if_ether.h> #include <linux/highmem.h> #include "net_driver.h" @@ -119,6 +120,25 @@ efx_max_tx_len(struct efx_nic *efx, dma_addr_t dma_addr) return len; } +unsigned int efx_tx_max_skb_descs(struct efx_nic *efx) +{ + /* Header and payload descriptor for each output segment, plus + * one for every input fragment boundary within a segment + */ + unsigned int max_descs = TCP_MAX_GSO_SEGS * 2 + MAX_SKB_FRAGS; + + /* Possibly one more per segment for the alignment workaround */ + if (EFX_WORKAROUND_5391(efx)) + max_descs += TCP_MAX_GSO_SEGS; + + /* Possibly more for PCIe page boundaries within input fragments */ + if (PAGE_SIZE > EFX_PAGE_SIZE) + max_descs += max_t(unsigned int, MAX_SKB_FRAGS, + DIV_ROUND_UP(GSO_MAX_SIZE, EFX_PAGE_SIZE)); + + return max_descs; +} + /* * Add a socket buffer to a TX queue * -- 1.7.7.6 -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2012-07-30 23:08 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-07-30 17:14 [PATCH net 0/2] Prevent extreme TSO parameters from stalling TX queues Ben Hutchings 2012-07-30 17:16 ` [PATCH net 1/2] tcp: Limit number of segments generated by GSO per skb Ben Hutchings 2012-07-30 17:23 ` Ben Greear 2012-07-30 19:41 ` Ben Hutchings 2012-07-30 21:00 ` Ben Greear 2012-07-30 17:31 ` Eric Dumazet 2012-07-30 19:35 ` Ben Hutchings 2012-07-30 19:56 ` Ben Hutchings 2012-07-30 21:46 ` David Miller 2012-07-30 22:20 ` Ben Hutchings 2012-07-30 22:50 ` Stephen Hemminger 2012-07-30 23:07 ` Ben Hutchings 2012-07-30 17:17 ` [PATCH net 2/2] sfc: Correct the minimum TX queue size Ben Hutchings
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox