Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/3] net: constify struct net_protocol
From: David Miller @ 2009-09-15  0:07 UTC (permalink / raw)
  To: adobriyan; +Cc: netdev
In-Reply-To: <20090914222146.GA31565@x200.localdomain>

From: Alexey Dobriyan <adobriyan@gmail.com>
Date: Tue, 15 Sep 2009 02:21:47 +0400

> Remove long removed "inet_protocol_base" declaration.
> 
> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/3] net: constify struct inet6_protocol
From: David Miller @ 2009-09-15  0:07 UTC (permalink / raw)
  To: adobriyan; +Cc: netdev
In-Reply-To: <20090914222228.GB31565@x200.localdomain>

From: Alexey Dobriyan <adobriyan@gmail.com>
Date: Tue, 15 Sep 2009 02:22:28 +0400

> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: constify remaining proto_ops
From: David Miller @ 2009-09-15  0:07 UTC (permalink / raw)
  To: adobriyan; +Cc: netdev
In-Reply-To: <20090914222323.GC31565@x200.localdomain>

From: Alexey Dobriyan <adobriyan@gmail.com>
Date: Tue, 15 Sep 2009 02:23:23 +0400

> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>

I assume this was supposed to be "[PATCH 3/3]"? :-)

Applied.

^ permalink raw reply

* Re: [PATCH] Phonet: Netlink event for autoconfigured addresses
From: David Miller @ 2009-09-15  0:08 UTC (permalink / raw)
  To: remi; +Cc: netdev, remi.denis-courmont
In-Reply-To: <1252933829-12442-1-git-send-email-remi@remlab.net>

From: Rémi Denis-Courmont <remi@remlab.net>
Date: Mon, 14 Sep 2009 16:10:27 +0300

> From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
> 
> From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
> 
> Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>

Applied.

^ permalink raw reply

* Re: [PATCH] cdc-phonet: remove noisy debug statement
From: David Miller @ 2009-09-15  0:08 UTC (permalink / raw)
  To: remi; +Cc: netdev, remi.denis-courmont
In-Reply-To: <1252933829-12442-2-git-send-email-remi@remlab.net>

From: Rémi Denis-Courmont <remi@remlab.net>
Date: Mon, 14 Sep 2009 16:10:28 +0300

> From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
> 
> From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
> 
> Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>

Applied.

^ permalink raw reply

* Re: [net-next PATCH 1/3] ixgbe: Properly disable packet split per-ring when globally disabled
From: David Miller @ 2009-09-15  0:08 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, peter.p.waskiewicz.jr, donald.c.skidmore
In-Reply-To: <20090914174659.4560.28814.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 14 Sep 2009 10:47:27 -0700

> From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> 
> The packet split feature was recently moved out of the adapter-wide flags
> feature field and into a per-Rx ring feature field.  In the process, packet
> split isn't properly disabled in the Rx ring if the adapter has it globally
> disabled, followed by a device reset.
> 
> This won't impact the driver today, since it's always in packet split mode.
> However, this will prevent any pitfalls if someone disables packet split on
> the adapter in the future and doesn't disable it in each ring.
> 
> Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next PATCH 2/3] ixgbe: Add support for 82599-based CX4 adapters
From: David Miller @ 2009-09-15  0:08 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, peter.p.waskiewicz.jr, donald.c.skidmore
In-Reply-To: <20090914174748.4560.39373.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 14 Sep 2009 10:47:49 -0700

> From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> 
> This patch adds support for CX4 adapters based on 82599.
> 
> Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next PATCH 3/3] ixgbe: Create separate media type for CX4 adapters
From: David Miller @ 2009-09-15  0:08 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, peter.p.waskiewicz.jr, donald.c.skidmore
In-Reply-To: <20090914174809.4560.72843.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 14 Sep 2009 10:48:10 -0700

> From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> 
> Currently the media type detection for CX4 adapters lumps them into a
> type of fiber.  This causes some strange fallout when firmware verification
> is done on the NIC, and certain fiber NIC rules get enforced incorrectly.
> 
> This patch introduces a new media type for CX4, and puts both 82598 and
> 82599 CX4 adapters into this bucket.
> 
> Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next PATCH 1/2] igb: reset sgmii phy at start of init
From: David Miller @ 2009-09-15  0:08 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, alexander.h.duyck, donald.c.skidmore
In-Reply-To: <20090914182253.4859.11176.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 14 Sep 2009 11:22:54 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> Our SGMII phy code was incomplete in that it was not actually placing the
> phy in SGMII mode and as a result the PHY was not able to establish a link
> when connected to a non serdes link partner.  This patch updates the code
> to combine the SGMII/serdes PCS init and to add the necessary reset.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next PATCH 2/2] igb: do not allow phy sw reset code to make calls to null pointers
From: David Miller @ 2009-09-15  0:09 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, alexander.h.duyck, donald.c.skidmore
In-Reply-To: <20090914182312.4859.21938.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 14 Sep 2009 11:23:13 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> In the case of fiber and serdes adapters we were seeing issues with ethtool
> -t causing kernel panics due to null function pointers.  To prevent this we
> need to exit out of the phy reset code in the event that we do not have a
> valid phy.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH] pkt_sched: Fix qdisc_graft WRT ingress qdisc
From: David Miller @ 2009-09-15  0:10 UTC (permalink / raw)
  To: jarkao2; +Cc: kaber, netdev
In-Reply-To: <20090914083544.GA10444@ff.dom.local>

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Mon, 14 Sep 2009 08:35:44 +0000

> After the recent mq change using ingress qdisc overwrites dev->qdisc;
> there is also a wrong old qdisc pointer passed to notify_and_destroy.
> 
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

Applied, thanks Jarek.

When I did the original TX multiqueue changes I tried to eliminate as
much as possible all of the special ways in which ingress qdiscs are
handled, but this area of how the ingress qdisc pointers are updated
remains.

^ permalink raw reply

* Re: [PATCH 1/5] First Patch on TFRC-SP. Copy base files from TFRC
From: Ivo Calado @ 2009-09-15  0:38 UTC (permalink / raw)
  To: Gerrit Renker, dccp, netdev
In-Reply-To: <20090912183222.GA5706@gerrit.erg.abdn.ac.uk>

Hi Gerrit and all.
Thank you for your fast reply. My comments follow below

On Sat, Sep 12, 2009 at 15:32, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
> Hi Ivo,
>
> you have made a really good job of sticking to code conventions (see other
> posting). There are a few things that needed tending to in the first patch.
>
> (1) Version changes
> ===================
> It seems that you applied something like s/\*\*/*/g to the first instance of
> the patch, in order to remove duplicate asterisks. This caused a problem:
>
> --- tfrc_sp_receiver_01.patch   2009/09/12 08:37:12     1.1
> +++ tfrc_sp_receiver_01.patch   2009/09/08 17:34:40
> @@ -1001,8 +1001,8 @@ Index: dccp_tree_work4/net/dccp/ccids/li
>  +
>  +#endif
>  +
> -+extern int  tfrc_sp_tx_hist_add(struct tfrc_tx_hist_entry **headp, u64 seqno);
> -+extern void tfrc_sp_tx_hist_purge(struct tfrc_tx_hist_entry **headp);
> ++extern int  tfrc_sp_tx_hist_add(struct tfrc_tx_hist_entry *headp, u64 seqno);
> ++extern void tfrc_sp_tx_hist_purge(struct tfrc_tx_hist_entry *headp);
>
> I have reverted the bug, also to minimise the difference to the existing (non
> TFRC-SP) files.
>
>
> (2) Other changes that I edited out
> ===================================
> (Other than whitespace changes.)
>
> net/dccp/ccids/lib/loss_interval_sp.c
> -------------------------------------
> I replaced the following dead code
>        if ((tfrc_lh_slab != NULL))
>                return 0;
>
>        if (tfrc_lh_slab != NULL) {
>                kmem_cache_destroy(tfrc_lh_slab);
>                tfrc_lh_slab = NULL;
>        }
>        return -ENOBUFS;
> with
>        return tfrc_lh_slab == NULL ? -ENOBUFS : 0;
>
> Also separated the conditions
> +               if ((len <= 0) ||
> +                   (!tfrc_lh_closed_check(cur, cong_evt->tfrchrx_ccval))) {
> back into
>                if (len <= 0)
>                        return false;
>
>                if (!tfrc_lh_closed_check(cur, cong_evt->tfrchrx_ccval))
>                        return false;
>

Thanks!




> I removed the following unnecessary inclusion:
> +#include <linux/random.h>
>

I should not have included that header now, it'll be only necessary in
another patch



>
> The following function pokes a hole in thei so far "abstract" data type;
> the convention has been to access the internals of the struct only via
> get-functions:
>
> static inline struct tfrc_loss_interval
>        *tfrc_lh_get_loss_interval(struct tfrc_loss_hist *lh, const u8 i)
> {
>        BUG_ON(i >= lh->counter);
>        return lh->ring[LIH_INDEX(lh->counter - i - 1)];
> }
>
> (You use it in patch 3/5 to gain access to li_ccval and li_losses.
> Better would be to have two separate accessor functions.)
>

Okay, I will fix this.




>
> net/dccp/ccids/lib/tfrc_equation_sp.c
> -------------------------------------
> This is a prime candidate for removal. After editing out the whitespace
> differences, I found that it is 100% identical with tfrc_equation.c.
>
> The result of this editing has been uploaded to
> http://eden-feed.erg.abdn.ac.uk/tfrc_sp_receiver_01.patch
>
One future patch will need to modify this file, but now it's really an
exact copy.


-- 
Ivo Augusto Andrade Rocha Calado
MSc. Candidate
Embedded Systems and Pervasive Computing Lab - http://embedded.ufcg.edu.br
Systems and Computing Department - http://www.dsc.ufcg.edu.br
Electrical Engineering and Informatics Center - http://www.ceei.ufcg.edu.br
Federal University of Campina Grande - http://www.ufcg.edu.br

PGP: 0x03422935
Quidquid latine dictum sit, altum viditur.

^ permalink raw reply

* Re: [PATCH 2/5] Implement loss counting on TFRC-SP receiver
From: Ivo Calado @ 2009-09-15  0:39 UTC (permalink / raw)
  To: Gerrit Renker, dccp, netdev
In-Reply-To: <20090913161224.GA5121@gerrit.erg.abdn.ac.uk>

In the same way, my comments follow below


>                s64 len = dccp_delta_seqno(cur->li_seqno, cong_evt_seqno);
>                if ((len <= 0) ||
>                    (!tfrc_lh_closed_check(cur, cong_evt->tfrchrx_ccval))) {
> +                       cur->li_losses += rh->num_losses;
>                        return false;
>                }
> This has a multiplicative effect, since rh->num_losses is added to cur->li_losses
> each time the condition is evaluated. E.g. if 3 times in a row reordered (earlier)
> sequence numbers arrive, or if the CCvals do not differ (high-speed networks),
> we end up with 3 * rh->num_losses, which can't be correct.
>
>


The following code would be correct then?

              if ((len <= 0) ||
                  (!tfrc_lh_closed_check(cur, cong_evt->tfrchrx_ccval))) {
+                       cur->li_losses += rh->num_losses;
+                       rh->num_losses  = 0;
                      return false;
With this change I suppose the could be fixed. With that, the
rh->num_losses couldn't added twice. Am I correct?




>
> --- dccp_tree_work4.orig/net/dccp/ccids/lib/packet_history_sp.c
> +++ dccp_tree_work4/net/dccp/ccids/lib/packet_history_sp.c
> @@ -244,6 +244,7 @@
>                h->loss_count = 3;
>                tfrc_sp_rx_hist_entry_from_skb(tfrc_rx_hist_entry(h, 3),
>                                               skb, n3);
> +               h->num_losses = dccp_loss_count(s2, s3, n3);
>                return 1;
>        }
> This only measures the gap between s2 and s3, but the "hole" starts at s0,
> so it would need to be dccp_loss_count(s0, s3, n3). Algorithm is documented at
> http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/\
> ccid3/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt
>

<snip>

>  }
> Here it is also between s0 and s2, not between s0 and s3. It is case VI(c.3).
>
> However, the above still is a crude approximation, since it only measures between
> the last sequence number received before the loss and the third sequence number
> after the loss. It would be better to either
>
>  * use the first sequence number after the loss (this can be s1, s2, or s3) or
>  * check if there are more holes between the first/second and the second/third
>   sequence numbers after the loss.
>
> The second option would be the correct one, it should also take the NDP counts
> of each gap into account. And already we have a fairly complicated algorithm.
>



I'll study loss_detection_algorithm_notes.txt and correct the code.
But I have one question, that i don't know if is already answered by
the documentation:
Further holes, between the the first and third packet received after
the hole are accounted only in future calls to the function, right?
Because the receiver needs to receive more packets to confirm loss,
right?
So, it's really necessary to look for other holes after the loss? Will
not this other holes be identified as losses in future?



> Another observation is that this function is only called in packet_history_sp.c,
> and only in __two_after_loss(), so that dccp_loss_count() could be made static,
> and without the need for the WARN_ON (see below), since in all above cases it is
> ensured that the first sequence number argument is "before" the second one.
>

Okay.
>
> --- dccp_tree_work4.orig/net/dccp/ccids/lib/packet_history_sp.h
> +++ dccp_tree_work4/net/dccp/ccids/lib/packet_history_sp.h
> @@ -113,6 +113,7 @@
>        u32                       packet_size,
>                                  bytes_recvd;
>        ktime_t                   bytes_start;
> +       u8                        num_losses;
>  };
> No more than 255 losses? NDP count option has space for up to 6 bytes, i.e. 2^48-1.
> Suggest u64 for consistency with the other parts.
>

Okay.


>
> --- dccp_tree_work4.orig/net/dccp/dccp.h
> +++ dccp_tree_work4/net/dccp/dccp.h
> @@ -168,6 +168,21 @@
>        return (u64)delta <= ndp + 1;

<snip>

> But then dccp_loss_free reduces to a specialisation of the above:
> bool dccp_loss_free(const u64 s1, const u64 s2, const u64 ndp)
> {
>        return dccp_loss_count(s1, s2, ndp) == 0;
> }
>
> But please see above -- the function needs to be called for each hole in a row.
>

Thanks for correcting the calculation for me!

-- 
Ivo Augusto Andrade Rocha Calado
MSc. Candidate
Embedded Systems and Pervasive Computing Lab - http://embedded.ufcg.edu.br
Systems and Computing Department - http://www.dsc.ufcg.edu.br
Electrical Engineering and Informatics Center - http://www.ceei.ufcg.edu.br
Federal University of Campina Grande - http://www.ufcg.edu.br

PGP: 0x03422935
Quidquid latine dictum sit, altum viditur.

^ permalink raw reply

* Re: [PATCH 4/5] Adds options DROPPED PACKETS and LOSS INTERVALS to receiver
From: Ivo Calado @ 2009-09-15  0:40 UTC (permalink / raw)
  To: Gerrit Renker, dccp, netdev
In-Reply-To: <20090913184128.GA7165@gerrit.erg.abdn.ac.uk>

The comments follow below


On Sun, Sep 13, 2009 at 15:41, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
> | Adds options DROPPED PACKETS and LOSS INTERVALS to receiver. In this patch is added the
> | mechanism of gathering information about loss intervals and storing it, for later
> | construction of these two options.
> This also needs some more work, please see inline.
>

<snip>

>
> +       tfrc_sp_update_li_data(ld, h, skb, new_loss, new_event);
> +
> I don't understand why 'tfrc_sp_update_li_data' is called twice, one call seems
> to be redundant. What it seems to be wanting to do is

When we are designing the loss count algorithm it seemed to be
necessary, but now that i need to revise this (patch nº 2),  I'll
observe this.


>        bool new_loss = false;
>
>        //...
<snip>
> at the begin of the function, all subsequent 'dccp_data_packet(skb)' are unnecessary.
> Almost every 'if' statement ends in 'return', this seems ad-hoc and could be reduced
> by adding if-else-if-else-if..., which would probably also reduce the code duplication.
>

I'll try to reduce code duplication, thanks.

>
> @@ -244,8 +463,11 @@
>        tfrc_lh_slab = kmem_cache_create("tfrc_sp_li_hist",
>                                         sizeof(struct tfrc_loss_interval), 0,
>                                         SLAB_HWCACHE_ALIGN, NULL);
> +       tfrc_ld_slab = kmem_cache_create("tfrc_sp_li_data",
> +                                        sizeof(struct tfrc_loss_data_entry), 0,
> +                                        SLAB_HWCACHE_ALIGN, NULL);
>
> -       if ((tfrc_lh_slab != NULL))
> +       if ((tfrc_lh_slab != NULL) || (tfrc_ld_slab != NULL))
>                return 0;
>
>        if (tfrc_lh_slab != NULL) {
> @@ -253,6 +475,11 @@
>                tfrc_lh_slab = NULL;
>        }
>
> +       if (tfrc_ld_slab != NULL) {
> +               kmem_cache_destroy(tfrc_ld_slab);
> +               tfrc_ld_slab = NULL;
> +       }
> +
>        return -ENOBUFS;
>  }
> The condition above should be '&&', not '||'. Suggested alternative:
>
> +       if (tfrc_lh_slab == NULL)
> +               goto lh_failed;
> +
> +       tfrc_ld_slab = kmem_cache_create("tfrc_sp_li_data",
> +                                        sizeof(struct tfrc_loss_data_entry), 0,
> +                                        SLAB_HWCACHE_ALIGN, NULL);
> +       if (tfrc_ld_slab != NULL)
> +               return 0;
> +
> +       kmem_cache_destroy(tfrc_lh_slab);
> +       tfrc_lh_slab = NULL;
> +lh_failed:
> +       return -ENOBUFS;
>  }
>

Thanks for revising this. Adding one label for each failure case will
not scale well. In another patch it will be needed to create another
structure, and so, requiring another label.



>
>
>
> --- dccp_tree_work5.orig/net/dccp/ccids/lib/loss_interval_sp.h
> +++ dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.h
> @@ -70,13 +70,52 @@
>  struct tfrc_rx_hist;
>  #endif
>
> +struct tfrc_loss_data_entry {
> +       struct tfrc_loss_data_entry     *next;
> +       u32                             lossless_length:24;
> +       u8                              ecn_nonce_sum:1;
> +       u32                             loss_length:24;
> +       u32                             data_length:24;
> +       u32                             drop_count:24;
> +};
> According to RFC 4342, 8.6.1, Loss Length is a 23-bit number, i.e.
>        u32                             loss_length:23;
>

Thanks!

>
>
>
> +#define TFRC_LOSS_INTERVALS_OPT_MAX_LENGTH     28
> +#define TFRC_DROP_OPT_MAX_LENGTH               84
> +#define TFRC_LI_OPT_SZ \
> +       (2 + TFRC_LOSS_INTERVALS_OPT_MAX_LENGTH*9)
> +#define TFRC_DROPPED_OPT_SZ \
> +       (1 + TFRC_DROP_OPT_MAX_LENGTH*3)
> It would be good to have a reminder where the numbers come from, i.e.
>  * the "28" comes from RFC 4342, 8.6
>  * the "84" comes from RFC 5622, 8.7
>  * the "9" again is from RFC 4342, 8.6
>  * the 1 + TFRC_DROP_OPT_MAX_LENGTH*3 = DCCP_SINGLE_OPT_MAXLEN (linux/dccp.h)
>
Sorry, this documentation was written, but i left it in another future patch.

>
> +struct tfrc_loss_data {
> +       struct tfrc_loss_data_entry     *head;
> +       u16                             counter;
> +       u8                              loss_intervals_opts[TFRC_LI_OPT_SZ];
> +       u8                              drop_opts[TFRC_DROPPED_OPT_SZ];
> +       u8                              last_loss_count;
> +       u8                              sto_ecn;
> +       u8                              sto_is_data;
> +};
>
>
>
> +static inline void tfrc_ld_init(struct tfrc_loss_data *ld)
> +{
> +       memset(ld, 0, sizeof(struct tfrc_loss_data));
> +}
> A tip from CodingStyle - using "sizeof(*ld)" will continue to work if there
> are changes in the interface.
>

Thanks.


>
> --- dccp_tree_work5.orig/net/dccp/dccp.h
> +++ dccp_tree_work5/net/dccp/dccp.h
> @@ -403,6 +403,16 @@
>        return (DCCP_SKB_CB(skb)->dccpd_ecn & INET_ECN_MASK) == INET_ECN_CE;
>  }
>
> +static inline bool dccp_skb_is_ecn_ect0(const struct sk_buff *skb)
> +{
> +       return (DCCP_SKB_CB(skb)->dccpd_ecn & INET_ECN_MASK) == INET_ECN_ECT_0;
> +}
> +
> +static inline bool dccp_skb_is_ecn_ect1(const struct sk_buff *skb)
> +{
> +       return (DCCP_SKB_CB(skb)->dccpd_ecn & INET_ECN_MASK) == INET_ECN_ECT_0;
> +}
> The routines are not needed, because the dccpd_ecn field is (deliberately) only
> 2 bits wide. In the second case it should have been INET_ECN_ECT_1.
>

And how would be to determine if one packet's ecn is set to ECT 0 or ECT 1?


-- 
Ivo Augusto Andrade Rocha Calado
MSc. Candidate
Embedded Systems and Pervasive Computing Lab - http://embedded.ufcg.edu.br
Systems and Computing Department - http://www.dsc.ufcg.edu.br
Electrical Engineering and Informatics Center - http://www.ceei.ufcg.edu.br
Federal University of Campina Grande - http://www.ufcg.edu.br

PGP: 0x03422935
Quidquid latine dictum sit, altum viditur.

^ permalink raw reply

* Re: [PATCH 5/5] Updating documentation accordingly
From: Ivo Calado @ 2009-09-15  0:41 UTC (permalink / raw)
  To: Gerrit Renker, dccp, netdev
In-Reply-To: <20090913185050.GA7475@gerrit.erg.abdn.ac.uk>

On Sun, Sep 13, 2009 at 15:50, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
> | Updating documentation accordingly
>
> +/* As defined at section 8.6.1. of RFC 4342 */
>  #define TFRC_LOSS_INTERVALS_OPT_MAX_LENGTH     28
> +/* Specified on section 8.7. of CCID4 draft */
>  #define TFRC_DROP_OPT_MAX_LENGTH               84
> Need to update the references to CCID4 draft with RFC 5622. See also 4/5.
>
> It would be more helpful to include the documentation for structs with
> the patches that use these structs.
>

Thanks, I'll fix this.

-- 
Ivo Augusto Andrade Rocha Calado
MSc. Candidate
Embedded Systems and Pervasive Computing Lab - http://embedded.ufcg.edu.br
Systems and Computing Department - http://www.dsc.ufcg.edu.br
Electrical Engineering and Informatics Center - http://www.ceei.ufcg.edu.br
Federal University of Campina Grande - http://www.ufcg.edu.br

PGP: 0x03422935
Quidquid latine dictum sit, altum viditur.

^ permalink raw reply

* Re: L2 switching in igb
From: Alexander Duyck @ 2009-09-15  2:52 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Alexander Duyck, Kirsher, Jeffrey T, Fischer, Anna,
	netdev@vger.kernel.org, David Miller, Stephen Hemminger
In-Reply-To: <4AAE14CC.2000609@voltaire.com>

On Mon, Sep 14, 2009 at 3:02 AM, Or Gerlitz <ogerlitz@voltaire.com> wrote:
> To have VEPA support another bit has to be programmed... its the one that
> doesn't let the PF to forward a packet to a VF whose source mac matches the
> one in the packet (e.g multicast sender).

The bit I was referring to not setting would handle that.  By
disabling the DTXSWC local loopback bit the PF will not send anything
to the VFs or visa versa.

> Yes, lets do that. I'd like to suggest that a "VF programmable from user
> space" context  will contain a <mac, vlan-id, priority-bits, rate> tuple,
> such that in the absence of vlan tag, the VF driver will "sign" the packet
> (skb) with vlan-id and priority-bits assigned by the admin and the PF NIC
> will mandate that the VF originated traffic will not exceed the rate.

Well whenever I can get to it I will try to add that support.  In the
meantime I believe there is a BOF session covering "Virtual Ethernet
switch enhancements and configuration" at the Linux Plumbers Conf that
will cover some of this so hopefully we can come up with a solid plan
on how to address this.

Thanks,

Alex

^ permalink raw reply

* Re: [PATCH 1/5] First Patch on TFRC-SP. Copy base files from TFRC
From: Ian McDonald @ 2009-09-15  5:25 UTC (permalink / raw)
  To: Ivo Calado; +Cc: Gerrit Renker, dccp, netdev
In-Reply-To: <cb00fa210909141738l60d02cd5p287d990326ce6c3c@mail.gmail.com>

2009/9/15 Ivo Calado <ivocalado@embedded.ufcg.edu.br>
>
> Hi Gerrit and all.
> Thank you for your fast reply. My comments follow below
>
> >
> > net/dccp/ccids/lib/tfrc_equation_sp.c
> > -------------------------------------
> > This is a prime candidate for removal. After editing out the whitespace
> > differences, I found that it is 100% identical with tfrc_equation.c.
> >
> > The result of this editing has been uploaded to
> > http://eden-feed.erg.abdn.ac.uk/tfrc_sp_receiver_01.patch
> >
> One future patch will need to modify this file, but now it's really an
> exact copy.
>
>
Basically the rule with a patch set is that all the patches should
make sense together.

It may well be that it makes sense to make a copy, but if you want to
do this then present it with the patch that then modifies it.

Ian

^ permalink raw reply

* Re: UDP regression with packets rates < 10k per sec
From: Eric Dumazet @ 2009-09-15  5:29 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev
In-Reply-To: <alpine.DEB.1.10.0909141708150.8051@V090114053VZO-1>

Christoph Lameter a écrit :
> Where are we on this? Definitely a regression?

??

I tried to reproduce your numbers and failed on my machines.

2.6.31 is actually faster than 2.6.22 on the bench you provided.

Must be specific to the hardware I guess ?

As text size presumably is bigger in 2.6.31, fetching code
in cpu caches to handle 10 packets per second is what we call
a cold path anyway.

If you want to make it a fast path, you want to make sure code its
always hot in cpu caches, and find a way to inject packets into
the kernel to make sure cpu keep the path hot.

^ permalink raw reply

* [PATCH] can: fix NOHZ local_softirq_pending 08 warning
From: Oliver Hartkopp @ 2009-09-15  6:57 UTC (permalink / raw)
  To: David Miller, stable; +Cc: Linux Netdev List, Urs Thuermann, Michael Buesch

[-- Attachment #1: Type: text/plain, Size: 561 bytes --]

When using nanosleep() in an userspace application we get a ratelimit warning

NOHZ: local_softirq_pending 08

for 10 times.

The echo of CAN frames is done from process context and softirq context only.
Therefore the usage of netif_rx() was wrong (for years).

This patch replaces netif_rx() with netif_rx_ni() which has to be used from
process/softirq context. It also adds a missing comment that can_send() must
no be used from hardirq context.

Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Urs Thuermann <urs@isnogud.escape.de>

---


[-- Attachment #2: can-NOHZ-local_softirq_pending-08.patch --]
[-- Type: text/x-patch, Size: 1126 bytes --]

diff --git a/drivers/net/can/vcan.c b/drivers/net/can/vcan.c
index 6971f6c..80ac563 100644
--- a/drivers/net/can/vcan.c
+++ b/drivers/net/can/vcan.c
@@ -80,7 +80,7 @@ static void vcan_rx(struct sk_buff *skb, struct net_device *dev)
 	skb->dev       = dev;
 	skb->ip_summed = CHECKSUM_UNNECESSARY;
 
-	netif_rx(skb);
+	netif_rx_ni(skb);
 }
 
 static netdev_tx_t vcan_tx(struct sk_buff *skb, struct net_device *dev)
diff --git a/net/can/af_can.c b/net/can/af_can.c
index ef1c43a..6068321 100644
--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -199,6 +199,8 @@ static int can_create(struct net *net, struct socket *sock, int protocol)
  * @skb: pointer to socket buffer with CAN frame in data section
  * @loop: loopback for listeners on local CAN sockets (recommended default!)
  *
+ * Due to the loopback this routine must not be called from hardirq context.
+ *
  * Return:
  *  0 on success
  *  -ENETDOWN when the selected interface is down
@@ -278,7 +280,7 @@ int can_send(struct sk_buff *skb, int loop)
 	}
 
 	if (newskb)
-		netif_rx(newskb);
+		netif_rx_ni(newskb);
 
 	/* update statistics */
 	can_stats.tx_frames++;

^ permalink raw reply related

* Re: ipv4 regression in 2.6.31 ?
From: Jarek Poplawski @ 2009-09-15  8:13 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Stephan von Krawczynski, Eric Dumazet, linux-kernel, davem,
	Linux Netdev List
In-Reply-To: <20090914093128.4d709ff6@nehalam>

On 14-09-2009 18:31, Stephen Hemminger wrote:
> On Mon, 14 Sep 2009 17:55:05 +0200
> Stephan von Krawczynski <skraw@ithnet.com> wrote:
> 
>> On Mon, 14 Sep 2009 15:57:03 +0200
>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>> Stephan von Krawczynski a A~(c)crit :
>>>> Hello all,
...
>>> rp_filter - INTEGER
>>>         0 - No source validation.
>>>         1 - Strict mode as defined in RFC3704 Strict Reverse Path
>>>             Each incoming packet is tested against the FIB and if the interface
>>>             is not the best reverse path the packet check will fail.
>>>             By default failed packets are discarded.
>>>         2 - Loose mode as defined in RFC3704 Loose Reverse Path
>>>             Each incoming packet's source address is also tested against the FIB
>>>             and if the source address is not reachable via any interface
>>>             the packet check will fail.
...
> RP filter did not work correctly in 2.6.30. The code added to to the loose
> mode caused a bug; the rp_filter value was being computed as:
>   rp_filter = interface_value & all_value;
> So in order to get reverse path filter both would have to be set.
> 
> In 2.6.31 this was change to:
>    rp_filter = max(interface_value, all_value);
> 
> This was the intended behaviour, if user asks all interfaces to have rp
> filtering turned on, then set /proc/sys/net/ipv4/conf/all/rp_filter = 1
> or to turn on just one interface, set it for just that interface.

Alas this max() formula handles also cases where both values are set
and it doesn't look very natural/"user friendly" to me. Especially
with something like this: all_value = 2; interface_value = 1
Why would anybody care to bother with interface_value in such a case?

"All" suggests "default" in this context, so I'd rather expect
something like:
    rp_filter = interface_value ? : all_value;
which gives "the inteded behaviour" too, plus more...

We'd only need to add e.g.:
 0 - Default ("all") validation. (No source validation if "all" is 0).
 3 - No source validation on this interface.

Jarek P.

^ permalink raw reply

* alloc skb based on a given data buffer
From: Zhu Yi @ 2009-09-15  8:30 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Chatre, Reinette, Frans Pop, Larry Finger, John W. Linville,
	Pekka Enberg, linux-kernel@vger.kernel.org,
	linux-wireless@vger.kernel.org,
	ipw3945-devel@lists.sourceforge.net, Andrew Morton,
	cl@linux-foundation.org, Krauss, Assaf, Johannes Berg,
	Abbas, Mohamed, netdev
In-Reply-To: <20090914130612.GA11778@csn.ul.ie>

[ Cc netdev and change the subject ]

On Mon, 2009-09-14 at 21:06 +0800, Mel Gorman wrote:
> > Essentially, the hardware only requires an order-1 allocation aligned on
> > 256 bytes boundary. But as it is used as an SKB, a trailing struct
> > skb_shared_info is added. This forces us to both increase the order and
> > do alignment ourselves. I believe some improvement could be done here.
> > But it should not be an easy one.
> > 
> 
> Probably not. I can only assume that moving the location of
> skb_shared_info such that it is sometimes located after the buffer and
> sometimes allocated via a separate kmalloc() would be a significant
> undertaking.

Shall I propose below function as a variant to alloc_skb()?

struct sk_buff *alloc_skb_attach_buff(void *data_buff,
				      unsigned int real_size,
				      unsigned int size, gfp_t mask);

If real_size >= size + sizeof(struct skb_shared_info), we can still put
the shinfo at the end of the buffer. Otherwise we can allocate from a
new slab that put sk_buff and shinfo together. Of course, kfree_skbmem()
needs to recognize it.

This way, device drivers can allocate the Rx buffers with their own size
and alignment requirement. i.e. do an order-1 page allocation directly
with free_pages() in the iwlagn driver for a 256 bytes aligned 8K Rx
buffer. After DMA is finished, drivers can use the above function to
assemble an skb based on the Rx buffer. It should resolve the problem
for requiring an order-2 allocation by alloc_skb() in the first place.

Comments are welcome.

Thanks,
-yi

^ permalink raw reply

* Re: [PATCH 2nd try] tcp: fix ssthresh u16 leftover
From: David Miller @ 2009-09-15  8:30 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev
In-Reply-To: <alpine.DEB.2.00.0909151111250.7424@wel-95.cs.helsinki.fi>

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Tue, 15 Sep 2009 11:19:11 +0300 (EEST)

> [PATCH 2nd try] tcp: fix ssthresh u16 leftover
> 
> It was once upon time so that snd_sthresh was a 16-bit quantity.
> ...That has not been true for long period of time. I run across
> some ancient compares which still seem to trust such legacy.
> Put all that magic into a single place, I hopefully found all
> of them.
> 
> Compile tested, though linking of allyesconfig is ridiculous
> nowadays it seems.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Much better, applied, thanks!

^ permalink raw reply

* Re: [PATCH] can: fix NOHZ local_softirq_pending 08 warning
From: David Miller @ 2009-09-15  8:31 UTC (permalink / raw)
  To: oliver; +Cc: stable, netdev, urs, mb
In-Reply-To: <4AAF3AF2.60309@hartkopp.net>

From: Oliver Hartkopp <oliver@hartkopp.net>
Date: Tue, 15 Sep 2009 08:57:54 +0200

> When using nanosleep() in an userspace application we get a ratelimit warning
> 
> NOHZ: local_softirq_pending 08
> 
> for 10 times.
> 
> The echo of CAN frames is done from process context and softirq context only.
> Therefore the usage of netif_rx() was wrong (for years).
> 
> This patch replaces netif_rx() with netif_rx_ni() which has to be used from
> process/softirq context. It also adds a missing comment that can_send() must
> no be used from hardirq context.
> 
> Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
> Signed-off-by: Urs Thuermann <urs@isnogud.escape.de>

Applied, thanks.

^ permalink raw reply

* Re: alloc skb based on a given data buffer
From: David Miller @ 2009-09-15  8:33 UTC (permalink / raw)
  To: yi.zhu-ral2JQCrhuEAvxtiuMwx3w
  Cc: mel-wPRd99KPJ+uzQB+pC5nmwQ,
	reinette.chatre-ral2JQCrhuEAvxtiuMwx3w,
	elendil-EIBgga6/0yRmR6Xm/wNWPw,
	Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ,
	linville-2XuSBdqkA4R54TAoqtyWWQ, penberg-bbCR+/B0CizivPeTLB3BmA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	ipw3945-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	assaf.krauss-ral2JQCrhuEAvxtiuMwx3w,
	johannes-cdvu00un1VgdHxzADdlk8Q,
	mohamed.abbas-ral2JQCrhuEAvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1253003420.7549.51.camel@debian>

From: Zhu Yi <yi.zhu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Date: Tue, 15 Sep 2009 16:30:20 +0800

> This way, device drivers can allocate the Rx buffers with their own size
> and alignment requirement. i.e. do an order-1 page allocation directly
> with free_pages() in the iwlagn driver for a 256 bytes aligned 8K Rx
> buffer. After DMA is finished, drivers can use the above function to
> assemble an skb based on the Rx buffer. It should resolve the problem
> for requiring an order-2 allocation by alloc_skb() in the first place.

You can create paged RX skbs just like drivers such as niu.c
and others already do, there is no need for special APIs for
this.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Fwd: [RFC v3] net: Introduce recvmmsg socket syscall
From: Nir Tzachar @ 2009-09-15  8:37 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Linux Networking Development Mailing List, Ziv Ayalon
In-Reply-To: <20090914230902.GC22743@ghostprotocols.net>

On Tue, Sep 15, 2009 at 2:09 AM, Arnaldo Carvalho de
Melo<acme@ghostprotocols.net> wrote:
> Em Thu, Aug 06, 2009 at 10:15:26AM +0300, Nir Tzachar escreveu:
>> Hello.
>>
>> Is there anything new with this patch? What are the plans for merging
>> it upstream?
>
> I'm doing perf runs using a test app using recvmsg, then with the first
> patch, that introduces recvmmsg, then with the second, that locks the
> series of unlocked_recvmmsg calls just once, will try to get this posted
> here soon.
>
> I'd really appreciate if the people interested in this could try it and
> post numbers too, to get this ball rolling again.
>
> As for getting it upstream, well, posting numbers here would definetely
> help with that :-)

Ok, here are some crude results:

Setup:
linux 2.6.29.2 with the third version of the patch, running on an
Intel Xeon X3220 2.4GHz quad core, with 4Gbyte of ram, running Ubuntu
9.04

Application:
A financial application, subscribing to quotes from a stock exchange.
Typical traffic is small (around 50-100 bytes) multicast packets in
large volumes. The application just receives  the quotes and pass them
along.

The test:
Run two version of the application, head to head. one version using
recvmsg and the other recvmmsg. The data is passed to a third
application measuring the latency of the data.

Results:
On general, the recvmmsg beats the pants off the regular recvmsg by a
whole millisecond (which might not sound much, but is _really_ a lot
for us ;). The exact distribution fluctuates between half a milli and
2 millis, but the average is 1 milli.

Conclusions:
We would _really_ like to see this patch go upstream. It gives an
important performance boost in out use cases.

We are willing to perform more accurate tests if needed, and would
appreciate the feedback on how to conduct them.

Cheers,
Nir.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox