Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2 10/14] ixgbe: Update ixgbe to use new vlan accleration.
From: Peter P Waskiewicz Jr @ 2010-10-25 17:50 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: Jesse Gross, David Miller, netdev@vger.kernel.org,
	Tantilov, Emil S, Kirsher, Jeffrey T
In-Reply-To: <AANLkTindxJD1a5UPV+6vWbDEeMUTy1QgPs1X8pT-b69D@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2322 bytes --]

On Fri, 2010-10-22 at 06:24 -0700, Michał Mirosław wrote:
> 2010/10/21 Jesse Gross <jesse@nicira.com>:
> > Make the ixgbe driver use the new vlan accleration model.
> [...]
> > --- a/drivers/net/ixgbe/ixgbe_main.c
> > +++ b/drivers/net/ixgbe/ixgbe_main.c
> > @@ -954,17 +954,13 @@ static void ixgbe_receive_skb(struct ixgbe_q_vector *q_vector,
> >        bool is_vlan = (status & IXGBE_RXD_STAT_VP);
> >        u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
> >
> > -       if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL)) {
> > -               if (adapter->vlgrp && is_vlan && (tag & VLAN_VID_MASK))
> > -                       vlan_gro_receive(napi, adapter->vlgrp, tag, skb);
> > -               else
> > -                       napi_gro_receive(napi, skb);
> > -       } else {
> > -               if (adapter->vlgrp && is_vlan && (tag & VLAN_VID_MASK))
> > -                       vlan_hwaccel_rx(skb, adapter->vlgrp, tag);
> > -               else
> > -                       netif_rx(skb);
> > -       }
> > +       if (is_vlan && (tag & VLAN_VID_MASK))
> > +               __vlan_hwaccel_put_tag(skb, tag);
> 
> I know that this is carried over from the driver, but why tag == 0 is
> treated differently here? VID0 is somewhat special, as normally it
> means 802.1p packet, but i.e. in embedded world people are using it as
> normal VID. It would be nice to have this handled consistently in the
> VLAN core - deliver to base dev (tag stripped) if vlan 0 is not
> configured and to vlan dev if it is.

ixgbe handles VLAN 0 differently because that's the tag that's used when
DCB is enabled, and no VLAN is configured.  We have to insert the 802.1p
tag for DCB to work, but the OS won't know about the 802.1q tag, and
ends up dropping the frame.  So we enable VLAN ID 0 in the HW and tell
it to strip the tag, so we can still pass the frame up the stack.

> 
> > +
> > +       if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
> > +               napi_gro_receive(napi, skb);
> > +       else
> > +               netif_rx(skb);
> >  }
> >
> >  /**
> 
> Best Regards,
> Michał Mirosław

-- 
-----------------------------------------------------------
Peter P Waskiewicz Jr                   LAN Access Division
peter.p.waskiewicz.jr@intel.com         Intel Corp.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4394 bytes --]

^ permalink raw reply

* [PATCH net-2.6] cxgb3: fix device opening error path
From: Divy Le Ray @ 2010-10-25 17:35 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-kernel, swise

From: Divy Le Ray <divy@chelsio.com>

Only negative return from bind_qsets() should be considered an error and
propagated.
It fixes an issue reported by IBM on P Series platform.

Signed-off-by: Divy Le Ray <divy@chelsio.com>
Tested-by: Nishanth Aravamudan <nacc@us.ibm.com>
---

 drivers/net/cxgb3/cxgb3_main.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)


diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index a04ce6a..4e3c123 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -1266,11 +1266,13 @@ static int cxgb_up(struct adapter *adap)
 	}
 
 	if (!(adap->flags & QUEUES_BOUND)) {
-		err = bind_qsets(adap);
-		if (err) {
-			CH_ERR(adap, "failed to bind qsets, err %d\n", err);
+		int ret = bind_qsets(adap);
+
+		if (ret < 0) {
+			CH_ERR(adap, "failed to bind qsets, err %d\n", ret);
 			t3_intr_disable(adap);
 			free_irq_resources(adap);
+			err = ret;
 			goto out;
 		}
 		adap->flags |= QUEUES_BOUND;


^ permalink raw reply related

* Re: linux-next: build failure after merge of the final tree (net-current tree related)
From: Stephen Hemminger @ 2010-10-25 17:36 UTC (permalink / raw)
  To: David Miller; +Cc: sfr, netdev, linux-next, linux-kernel, jchapman
In-Reply-To: <20101024.222602.71095343.davem@davemloft.net>

On Sun, 24 Oct 2010 22:26:02 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Mon, 25 Oct 2010 14:19:56 +1100
> 
> > I wish doing that caused a build failure on other architectures ...
> 
> Me too :-/
> 
> > Subject: [PATCH] l2tp: static functions should not be exported
> 
> I'll add this thanks Stephen.

The section mismatch warning on x86 is not shown by default
because there are still so many problems.

-- 

^ permalink raw reply

* [net-next PATCH 2/3] qlge: Add firmware info to ethtool get regs.
From: Ron Mercer @ 2010-10-25 16:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer, jitendra.kalsaria, ying.lok
In-Reply-To: <1288023473-31490-1-git-send-email-ron.mercer@qlogic.com>

By default we add firmware information to ethtool get regs.
Optionally firmware info can instead be sent to log.

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
 drivers/net/qlge/qlge.h         |    2 ++
 drivers/net/qlge/qlge_dbg.c     |   21 ++++++++++++++++++++-
 drivers/net/qlge/qlge_ethtool.c |   19 ++++++++++++++++---
 3 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/net/qlge/qlge.h b/drivers/net/qlge/qlge.h
index a478786..0474d20 100644
--- a/drivers/net/qlge/qlge.h
+++ b/drivers/net/qlge/qlge.h
@@ -2221,6 +2221,7 @@ int ql_write_mpi_reg(struct ql_adapter *qdev, u32 reg, u32 data);
 int ql_unpause_mpi_risc(struct ql_adapter *qdev);
 int ql_pause_mpi_risc(struct ql_adapter *qdev);
 int ql_hard_reset_mpi_risc(struct ql_adapter *qdev);
+int ql_soft_reset_mpi_risc(struct ql_adapter *qdev);
 int ql_dump_risc_ram_area(struct ql_adapter *qdev, void *buf,
 		u32 ram_addr, int word_count);
 int ql_core_dump(struct ql_adapter *qdev,
@@ -2237,6 +2238,7 @@ int ql_mb_set_mgmnt_traffic_ctl(struct ql_adapter *qdev, u32 control);
 int ql_mb_get_port_cfg(struct ql_adapter *qdev);
 int ql_mb_set_port_cfg(struct ql_adapter *qdev);
 int ql_wait_fifo_empty(struct ql_adapter *qdev);
+void ql_get_dump(struct ql_adapter *qdev, void *buff);
 void ql_gen_reg_dump(struct ql_adapter *qdev,
 			struct ql_reg_dump *mpi_coredump);
 netdev_tx_t ql_lb_send(struct sk_buff *skb, struct net_device *ndev);
diff --git a/drivers/net/qlge/qlge_dbg.c b/drivers/net/qlge/qlge_dbg.c
index 4747492..fca804f 100644
--- a/drivers/net/qlge/qlge_dbg.c
+++ b/drivers/net/qlge/qlge_dbg.c
@@ -1317,9 +1317,28 @@ void ql_gen_reg_dump(struct ql_adapter *qdev,
 	status = ql_get_ets_regs(qdev, &mpi_coredump->ets[0]);
 	if (status)
 		return;
+}
+
+void ql_get_dump(struct ql_adapter *qdev, void *buff)
+{
+	/*
+	 * If the dump has already been taken and is stored
+	 * in our internal buffer and if force dump is set then
+	 * just start the spool to dump it to the log file
+	 * and also, take a snapshot of the general regs to
+	 * to the user's buffer or else take complete dump
+	 * to the user's buffer if force is not set.
+	 */
 
-	if (test_bit(QL_FRC_COREDUMP, &qdev->flags))
+	if (!test_bit(QL_FRC_COREDUMP, &qdev->flags)) {
+		if (!ql_core_dump(qdev, buff))
+			ql_soft_reset_mpi_risc(qdev);
+		else
+			netif_err(qdev, drv, qdev->ndev, "coredump failed!\n");
+	} else {
+		ql_gen_reg_dump(qdev, buff);
 		ql_get_core_dump(qdev);
+	}
 }
 
 /* Coredump to messages log file using separate worker thread */
diff --git a/drivers/net/qlge/qlge_ethtool.c b/drivers/net/qlge/qlge_ethtool.c
index 4892d64..8149cc9 100644
--- a/drivers/net/qlge/qlge_ethtool.c
+++ b/drivers/net/qlge/qlge_ethtool.c
@@ -375,7 +375,10 @@ static void ql_get_drvinfo(struct net_device *ndev,
 	strncpy(drvinfo->bus_info, pci_name(qdev->pdev), 32);
 	drvinfo->n_stats = 0;
 	drvinfo->testinfo_len = 0;
-	drvinfo->regdump_len = 0;
+	if (!test_bit(QL_FRC_COREDUMP, &qdev->flags))
+		drvinfo->regdump_len = sizeof(struct ql_mpi_coredump);
+	else
+		drvinfo->regdump_len = sizeof(struct ql_reg_dump);
 	drvinfo->eedump_len = 0;
 }
 
@@ -547,7 +550,12 @@ static void ql_self_test(struct net_device *ndev,
 
 static int ql_get_regs_len(struct net_device *ndev)
 {
-	return sizeof(struct ql_reg_dump);
+	struct ql_adapter *qdev = netdev_priv(ndev);
+
+	if (!test_bit(QL_FRC_COREDUMP, &qdev->flags))
+		return sizeof(struct ql_mpi_coredump);
+	else
+		return sizeof(struct ql_reg_dump);
 }
 
 static void ql_get_regs(struct net_device *ndev,
@@ -555,7 +563,12 @@ static void ql_get_regs(struct net_device *ndev,
 {
 	struct ql_adapter *qdev = netdev_priv(ndev);
 
-	ql_gen_reg_dump(qdev, p);
+	ql_get_dump(qdev, p);
+	qdev->core_is_dumped = 0;
+	if (!test_bit(QL_FRC_COREDUMP, &qdev->flags))
+		regs->len = sizeof(struct ql_mpi_coredump);
+	else
+		regs->len = sizeof(struct ql_reg_dump);
 }
 
 static int ql_get_coalesce(struct net_device *dev, struct ethtool_coalesce *c)
-- 
1.6.0.2


^ permalink raw reply related

* [net-next PATCH 3/3] qlge: Version change to v1.00.00.27
From: Ron Mercer @ 2010-10-25 16:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer, jitendra.kalsaria, ying.lok
In-Reply-To: <1288023473-31490-1-git-send-email-ron.mercer@qlogic.com>

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
 drivers/net/qlge/qlge.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/qlge/qlge.h b/drivers/net/qlge/qlge.h
index 0474d20..69c4780 100644
--- a/drivers/net/qlge/qlge.h
+++ b/drivers/net/qlge/qlge.h
@@ -16,7 +16,7 @@
  */
 #define DRV_NAME  	"qlge"
 #define DRV_STRING 	"QLogic 10 Gigabit PCI-E Ethernet Driver "
-#define DRV_VERSION	"v1.00.00.25.00.00-01"
+#define DRV_VERSION	"v1.00.00.27.00.00-01"
 
 #define WQ_ADDR_ALIGN	0x3	/* 4 byte alignment */
 
-- 
1.6.0.2


^ permalink raw reply related

* [net-next PATCH 1/3] qlge: Restoring the vlan setting.
From: Ron Mercer @ 2010-10-25 16:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer, jitendra.kalsaria, ying.lok
In-Reply-To: <1288023473-31490-1-git-send-email-ron.mercer@qlogic.com>

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
 drivers/net/qlge/qlge_main.c |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/drivers/net/qlge/qlge_main.c b/drivers/net/qlge/qlge_main.c
index ba0053d..ed0c63d 100644
--- a/drivers/net/qlge/qlge_main.c
+++ b/drivers/net/qlge/qlge_main.c
@@ -2382,6 +2382,20 @@ static void qlge_vlan_rx_kill_vid(struct net_device *ndev, u16 vid)
 
 }
 
+static void qlge_restore_vlan(struct ql_adapter *qdev)
+{
+	qlge_vlan_rx_register(qdev->ndev, qdev->vlgrp);
+
+	if (qdev->vlgrp) {
+		u16 vid;
+			for (vid = 0; vid < VLAN_N_VID; vid++) {
+				if (!vlan_group_get_device(qdev->vlgrp, vid))
+					continue;
+				qlge_vlan_rx_add_vid(qdev->ndev, vid);
+			}
+	}
+}
+
 /* MSI-X Multiple Vector Interrupt Handler for inbound completions. */
 static irqreturn_t qlge_msix_rx_isr(int irq, void *dev_id)
 {
@@ -3957,6 +3971,9 @@ static int ql_adapter_up(struct ql_adapter *qdev)
 	clear_bit(QL_PROMISCUOUS, &qdev->flags);
 	qlge_set_multicast_list(qdev->ndev);
 
+	/* Restore vlan setting. */
+	qlge_restore_vlan(qdev);
+
 	ql_enable_interrupts(qdev);
 	ql_enable_all_completion_interrupts(qdev);
 	netif_tx_start_all_queues(qdev->ndev);
-- 
1.6.0.2


^ permalink raw reply related

* qlge changes for net-next
From: Ron Mercer @ 2010-10-25 16:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer, jitendra.kalsaria, ying.lok

Changes for next-next:

1) Restore VLAN settings after reset.
2) Add firmware info to ethtool reg dump.
3) Version change.


^ permalink raw reply

* Re: [PATCH 1/2 v3] xps: Improvements in TX queue selection
From: Tom Herbert @ 2010-10-25 17:02 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, eric.dumazet
In-Reply-To: <20101024.153206.179940220.davem@davemloft.net>

On Sun, Oct 24, 2010 at 3:32 PM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <therbert@google.com>
> Date: Thu, 21 Oct 2010 13:17:08 -0700 (PDT)
>
>> @@ -822,8 +822,10 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
>>                                                          &md5);
>>       tcp_header_size = tcp_options_size + sizeof(struct tcphdr);
>>
>> -     if (tcp_packets_in_flight(tp) == 0)
>> +     if (tcp_packets_in_flight(tp) == 0) {
>>               tcp_ca_event(sk, CA_EVENT_TX_START);
>> +             skb->ooo_okay = 1;
>> +     }
>>
>
> You'll need to clear this flag the moment the first transmit of
> this packet happens, otherwise OOO won't be handled correctly in
> the event that fast retransmit is necessary later.
>

Would this be sufficient:

@@ -825,7 +825,8 @@ static int tcp_transmit_skb(struct sock *sk,
struct sk_buff *skb
        if (tcp_packets_in_flight(tp) == 0) {
                tcp_ca_event(sk, CA_EVENT_TX_START);
                skb->ooo_okay = 1;
-       }
+       } else
+               skb->ooo_okay = 0;

^ permalink raw reply

* Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Michael S. Tsirkin @ 2010-10-25 16:17 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: anthony, arnd, avi, davem, eric.dumazet, kvm, netdev, rusty
In-Reply-To: <OF8735A2A4.74B4EE91-ON652577C5.0040D90B-652577C7.0056B99D@in.ibm.com>

On Mon, Oct 25, 2010 at 09:20:38PM +0530, Krishna Kumar2 wrote:
> > Krishna Kumar2/India/IBM@IBMIN wrote on 10/20/2010 02:24:52 PM:
> 
> Any feedback, comments, objections, issues or bugs about the
> patches? Please let me know if something needs to be done.

I am trying to wrap my head around kernel/user interface here.
E.g., will we need another incompatible change when we add multiple RX
queues? Also need to think about how robust our single stream heuristic is,
e.g. what are the chances it will misdetect a bidirectional
UDP stream as a single TCP?

> Some more test results:
> _____________________________________________________
>          Host->Guest BW (numtxqs=2)
> #       BW%     CPU%    RCPU%   SD%     RSD%
> _____________________________________________________
> 1       5.53    .31     .67     -5.88   0
> 2       -2.11   -1.01   -2.08   4.34    0
> 4       13.53   10.77   13.87   -1.96   0
> 8       34.22   22.80   30.53   -8.46   -2.50
> 16      30.89   24.06   35.17   -5.20   3.20
> 24      33.22   26.30   43.39   -5.17   7.58
> 32      30.85   27.27   47.74   -.59    15.51
> 40      33.80   27.33   48.00   -7.42   7.59
> 48      45.93   26.33   45.46   -12.24  1.10
> 64      33.51   27.11   45.00   -3.27   10.30
> 80      39.28   29.21   52.33   -4.88   12.17
> 96      32.05   31.01   57.72   -1.02   19.05
> 128     35.66   32.04   60.00   -.66    20.41
> _____________________________________________________
> BW: 23.5%  CPU/RCPU: 28.6%,51.2%  SD/RSD: -2.6%,15.8%
> 
> ____________________________________________________
> Guest->Host 512 byte (numtxqs=2):
> #       BW%     CPU%    RCPU%   SD%     RSD%
> _____________________________________________________
> 1       3.02    -3.84   -4.76   -12.50  -7.69
> 2       52.77   -15.73  -8.66   -45.31  -40.33
> 4       -23.14  13.84   7.50    50.58   40.81
> 8       -21.44  28.08   16.32   63.06   47.43
> 16      33.53   46.50   27.19   7.61    -6.60
> 24      55.77   42.81   30.49   -8.65   -16.48
> 32      52.59   38.92   29.08   -9.18   -15.63
> 40      50.92   36.11   28.92   -10.59  -15.30
> 48      46.63   34.73   28.17   -7.83   -12.32
> 64      45.56   37.12   28.81   -5.05   -10.80
> 80      44.55   36.60   28.45   -4.95   -10.61
> 96      43.02   35.97   28.89   -.11    -5.31
> 128     38.54   33.88   27.19   -4.79   -9.54
> _____________________________________________________
> BW: 34.4%  CPU/RCPU: 35.9%,27.8%  SD/RSD: -4.1%,-9.3%
> 
> 
> Thanks,
> 
> - KK
> 
> 
> 
> > [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
> >
> > Following set of patches implement transmit MQ in virtio-net.  Also
> > included is the user qemu changes.  MQ is disabled by default unless
> > qemu specifies it.
> >
> >                   Changes from rev2:
> >                   ------------------
> > 1. Define (in virtio_net.h) the maximum send txqs; and use in
> >    virtio-net and vhost-net.
> > 2. vi->sq[i] is allocated individually, resulting in cache line
> >    aligned sq[0] to sq[n].  Another option was to define
> >    'send_queue' as:
> >        struct send_queue {
> >                struct virtqueue *svq;
> >                struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
> >        } ____cacheline_aligned_in_smp;
> >    and to statically allocate 'VIRTIO_MAX_SQ' of those.  I hope
> >    the submitted method is preferable.
> > 3. Changed vhost model such that vhost[0] handles RX and vhost[1-MAX]
> >    handles TX[0-n].
> > 4. Further change TX handling such that vhost[0] handles both RX/TX
> >    for single stream case.
> >
> >                   Enabling MQ on virtio:
> >                   -----------------------
> > When following options are passed to qemu:
> >         - smp > 1
> >         - vhost=on
> >         - mq=on (new option, default:off)
> > then #txqueues = #cpus.  The #txqueues can be changed by using an
> > optional 'numtxqs' option.  e.g. for a smp=4 guest:
> >         vhost=on                   ->   #txqueues = 1
> >         vhost=on,mq=on             ->   #txqueues = 4
> >         vhost=on,mq=on,numtxqs=2   ->   #txqueues = 2
> >         vhost=on,mq=on,numtxqs=8   ->   #txqueues = 8
> >
> >
> >                    Performance (guest -> local host):
> >                    -----------------------------------
> > System configuration:
> >         Host:  8 Intel Xeon, 8 GB memory
> >         Guest: 4 cpus, 2 GB memory
> > Test: Each test case runs for 60 secs, sum over three runs (except
> > when number of netperf sessions is 1, which has 10 runs of 12 secs
> > each).  No tuning (default netperf) other than taskset vhost's to
> > cpus 0-3.  numtxqs=32 gave the best results though the guest had
> > only 4 vcpus (I haven't tried beyond that).
> >
> > ______________ numtxqs=2, vhosts=3  ____________________
> > #sessions  BW%      CPU%    RCPU%    SD%      RSD%
> > ________________________________________________________
> > 1          4.46    -1.96     .19     -12.50   -6.06
> > 2          4.93    -1.16    2.10      0       -2.38
> > 4          46.17    64.77   33.72     19.51   -2.48
> > 8          47.89    70.00   36.23     41.46    13.35
> > 16         48.97    80.44   40.67     21.11   -5.46
> > 24         49.03    78.78   41.22     20.51   -4.78
> > 32         51.11    77.15   42.42     15.81   -6.87
> > 40         51.60    71.65   42.43     9.75    -8.94
> > 48         50.10    69.55   42.85     11.80   -5.81
> > 64         46.24    68.42   42.67     14.18   -3.28
> > 80         46.37    63.13   41.62     7.43    -6.73
> > 96         46.40    63.31   42.20     9.36    -4.78
> > 128        50.43    62.79   42.16     13.11   -1.23
> > ________________________________________________________
> > BW: 37.2%,  CPU/RCPU: 66.3%,41.6%,  SD/RSD: 11.5%,-3.7%
> >
> > ______________ numtxqs=8, vhosts=5  ____________________
> > #sessions   BW%      CPU%     RCPU%     SD%      RSD%
> > ________________________________________________________
> > 1           -.76    -1.56     2.33      0        3.03
> > 2           17.41    11.11    11.41     0       -4.76
> > 4           42.12    55.11    30.20     19.51    .62
> > 8           54.69    80.00    39.22     24.39    -3.88
> > 16          54.77    81.62    40.89     20.34    -6.58
> > 24          54.66    79.68    41.57     15.49    -8.99
> > 32          54.92    76.82    41.79     17.59    -5.70
> > 40          51.79    68.56    40.53     15.31    -3.87
> > 48          51.72    66.40    40.84     9.72     -7.13
> > 64          51.11    63.94    41.10     5.93     -8.82
> > 80          46.51    59.50    39.80     9.33     -4.18
> > 96          47.72    57.75    39.84     4.20     -7.62
> > 128         54.35    58.95    40.66     3.24     -8.63
> > ________________________________________________________
> > BW: 38.9%,  CPU/RCPU: 63.0%,40.1%,  SD/RSD: 6.0%,-7.4%
> >
> > ______________ numtxqs=16, vhosts=5  ___________________
> > #sessions   BW%      CPU%     RCPU%     SD%      RSD%
> > ________________________________________________________
> > 1           -1.43    -3.52    1.55      0          3.03
> > 2           33.09     21.63   20.12    -10.00     -9.52
> > 4           67.17     94.60   44.28     19.51     -11.80
> > 8           75.72     108.14  49.15     25.00     -10.71
> > 16          80.34     101.77  52.94     25.93     -4.49
> > 24          70.84     93.12   43.62     27.63     -5.03
> > 32          69.01     94.16   47.33     29.68     -1.51
> > 40          58.56     63.47   25.91    -3.92      -25.85
> > 48          61.16     74.70   34.88     .89       -22.08
> > 64          54.37     69.09   26.80    -6.68      -30.04
> > 80          36.22     22.73   -2.97    -8.25      -27.23
> > 96          41.51     50.59   13.24     9.84      -16.77
> > 128         48.98     38.15   6.41     -.33       -22.80
> > ________________________________________________________
> > BW: 46.2%,  CPU/RCPU: 55.2%,18.8%,  SD/RSD: 1.2%,-22.0%
> >
> > ______________ numtxqs=32, vhosts=5  ___________________
> > #            BW%       CPU%    RCPU%    SD%     RSD%
> > ________________________________________________________
> > 1            7.62     -38.03   -26.26  -50.00   -33.33
> > 2            28.95     20.46    21.62   0       -7.14
> > 4            84.05     60.79    45.74  -2.43    -12.42
> > 8            86.43     79.57    50.32   15.85   -3.10
> > 16           88.63     99.48    58.17   9.47    -13.10
> > 24           74.65     80.87    41.99  -1.81    -22.89
> > 32           63.86     59.21    23.58  -18.13   -36.37
> > 40           64.79     60.53    22.23  -15.77   -35.84
> > 48           49.68     26.93    .51    -36.40   -49.61
> > 64           54.69     36.50    5.41   -26.59   -43.23
> > 80           45.06     12.72   -13.25  -37.79   -52.08
> > 96           40.21    -3.16    -24.53  -39.92   -52.97
> > 128          36.33    -33.19   -43.66  -5.68    -20.49
> > ________________________________________________________
> > BW: 49.3%,  CPU/RCPU: 15.5%,-8.2%,  SD/RSD: -22.2%,-37.0%
> >
> >
> > Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>

^ permalink raw reply

* Re: [PATCH] net: b43legacy: fix compile error
From: Eric Dumazet @ 2010-10-25 15:51 UTC (permalink / raw)
  To: Larry Finger
  Cc: Arnd Hannemann, David S. Miller, netdev, linux-kernel,
	linux-wireless
In-Reply-To: <4CC5A301.1080606@lwfinger.net>

Le lundi 25 octobre 2010 à 10:32 -0500, Larry Finger a écrit :
> On 10/25/2010 09:41 AM, Arnd Hannemann wrote:
> > On todays linus tree the following compile error happened to me:
> > 
> >   CC [M]  drivers/net/wireless/b43legacy/xmit.o
> > In file included from include/net/dst.h:11,
> >                  from drivers/net/wireless/b43legacy/xmit.c:31:
> > include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__' before '____cacheline_aligned_in_smp'
> > include/net/dst_ops.h: In function 'dst_entries_get_fast':
> > include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> > include/net/dst_ops.h: In function 'dst_entries_get_slow':
> > include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> > include/net/dst_ops.h: In function 'dst_entries_add':
> > include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> > include/net/dst_ops.h: In function 'dst_entries_init':
> > include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> > include/net/dst_ops.h: In function 'dst_entries_destroy':
> > include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> > make[4]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
> > make[3]: *** [drivers/net/wireless/b43legacy] Error 2
> > make[2]: *** [drivers/net/wireless] Error 2
> > make[1]: *** [drivers/net] Error 2
> > make: *** [drivers] Error 2
> > 
> > This patch fixes this issue by adding "linux/cache.h" as an include to
> > "include/net/dst_ops.h".
> 
> Strange. Compiling b43legacy from the linux-2.6.git tree (git describe is
> v2.6.36-4464-g229aebb) works fine on x86_64. I wonder what is different.

Well, x86_64 must include cache.h, this is probably why I missed it in
my build tests.

I wonder also why #include <net/dst.h> is needed at all in this
driver...

diff --git a/drivers/net/wireless/b43legacy/xmit.c
b/drivers/net/wireless/b43legacy/xmit.c
index 7d177d9..a261aec 100644
--- a/drivers/net/wireless/b43legacy/xmit.c
+++ b/drivers/net/wireless/b43legacy/xmit.c
@@ -28,8 +28,6 @@
 
 */
 
-#include <net/dst.h>
-
 #include "xmit.h"
 #include "phy.h"
 #include "dma.h"

^ permalink raw reply related

* Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Krishna Kumar2 @ 2010-10-25 15:50 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: anthony, arnd, avi, davem, eric.dumazet, kvm, mst, netdev, rusty
In-Reply-To: <20101020085452.15579.76002.sendpatchset@krkumar2.in.ibm.com>

> Krishna Kumar2/India/IBM@IBMIN wrote on 10/20/2010 02:24:52 PM:

Any feedback, comments, objections, issues or bugs about the
patches? Please let me know if something needs to be done.

Some more test results:
_____________________________________________________
         Host->Guest BW (numtxqs=2)
#       BW%     CPU%    RCPU%   SD%     RSD%
_____________________________________________________
1       5.53    .31     .67     -5.88   0
2       -2.11   -1.01   -2.08   4.34    0
4       13.53   10.77   13.87   -1.96   0
8       34.22   22.80   30.53   -8.46   -2.50
16      30.89   24.06   35.17   -5.20   3.20
24      33.22   26.30   43.39   -5.17   7.58
32      30.85   27.27   47.74   -.59    15.51
40      33.80   27.33   48.00   -7.42   7.59
48      45.93   26.33   45.46   -12.24  1.10
64      33.51   27.11   45.00   -3.27   10.30
80      39.28   29.21   52.33   -4.88   12.17
96      32.05   31.01   57.72   -1.02   19.05
128     35.66   32.04   60.00   -.66    20.41
_____________________________________________________
BW: 23.5%  CPU/RCPU: 28.6%,51.2%  SD/RSD: -2.6%,15.8%

____________________________________________________
Guest->Host 512 byte (numtxqs=2):
#       BW%     CPU%    RCPU%   SD%     RSD%
_____________________________________________________
1       3.02    -3.84   -4.76   -12.50  -7.69
2       52.77   -15.73  -8.66   -45.31  -40.33
4       -23.14  13.84   7.50    50.58   40.81
8       -21.44  28.08   16.32   63.06   47.43
16      33.53   46.50   27.19   7.61    -6.60
24      55.77   42.81   30.49   -8.65   -16.48
32      52.59   38.92   29.08   -9.18   -15.63
40      50.92   36.11   28.92   -10.59  -15.30
48      46.63   34.73   28.17   -7.83   -12.32
64      45.56   37.12   28.81   -5.05   -10.80
80      44.55   36.60   28.45   -4.95   -10.61
96      43.02   35.97   28.89   -.11    -5.31
128     38.54   33.88   27.19   -4.79   -9.54
_____________________________________________________
BW: 34.4%  CPU/RCPU: 35.9%,27.8%  SD/RSD: -4.1%,-9.3%


Thanks,

- KK



> [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
>
> Following set of patches implement transmit MQ in virtio-net.  Also
> included is the user qemu changes.  MQ is disabled by default unless
> qemu specifies it.
>
>                   Changes from rev2:
>                   ------------------
> 1. Define (in virtio_net.h) the maximum send txqs; and use in
>    virtio-net and vhost-net.
> 2. vi->sq[i] is allocated individually, resulting in cache line
>    aligned sq[0] to sq[n].  Another option was to define
>    'send_queue' as:
>        struct send_queue {
>                struct virtqueue *svq;
>                struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
>        } ____cacheline_aligned_in_smp;
>    and to statically allocate 'VIRTIO_MAX_SQ' of those.  I hope
>    the submitted method is preferable.
> 3. Changed vhost model such that vhost[0] handles RX and vhost[1-MAX]
>    handles TX[0-n].
> 4. Further change TX handling such that vhost[0] handles both RX/TX
>    for single stream case.
>
>                   Enabling MQ on virtio:
>                   -----------------------
> When following options are passed to qemu:
>         - smp > 1
>         - vhost=on
>         - mq=on (new option, default:off)
> then #txqueues = #cpus.  The #txqueues can be changed by using an
> optional 'numtxqs' option.  e.g. for a smp=4 guest:
>         vhost=on                   ->   #txqueues = 1
>         vhost=on,mq=on             ->   #txqueues = 4
>         vhost=on,mq=on,numtxqs=2   ->   #txqueues = 2
>         vhost=on,mq=on,numtxqs=8   ->   #txqueues = 8
>
>
>                    Performance (guest -> local host):
>                    -----------------------------------
> System configuration:
>         Host:  8 Intel Xeon, 8 GB memory
>         Guest: 4 cpus, 2 GB memory
> Test: Each test case runs for 60 secs, sum over three runs (except
> when number of netperf sessions is 1, which has 10 runs of 12 secs
> each).  No tuning (default netperf) other than taskset vhost's to
> cpus 0-3.  numtxqs=32 gave the best results though the guest had
> only 4 vcpus (I haven't tried beyond that).
>
> ______________ numtxqs=2, vhosts=3  ____________________
> #sessions  BW%      CPU%    RCPU%    SD%      RSD%
> ________________________________________________________
> 1          4.46    -1.96     .19     -12.50   -6.06
> 2          4.93    -1.16    2.10      0       -2.38
> 4          46.17    64.77   33.72     19.51   -2.48
> 8          47.89    70.00   36.23     41.46    13.35
> 16         48.97    80.44   40.67     21.11   -5.46
> 24         49.03    78.78   41.22     20.51   -4.78
> 32         51.11    77.15   42.42     15.81   -6.87
> 40         51.60    71.65   42.43     9.75    -8.94
> 48         50.10    69.55   42.85     11.80   -5.81
> 64         46.24    68.42   42.67     14.18   -3.28
> 80         46.37    63.13   41.62     7.43    -6.73
> 96         46.40    63.31   42.20     9.36    -4.78
> 128        50.43    62.79   42.16     13.11   -1.23
> ________________________________________________________
> BW: 37.2%,  CPU/RCPU: 66.3%,41.6%,  SD/RSD: 11.5%,-3.7%
>
> ______________ numtxqs=8, vhosts=5  ____________________
> #sessions   BW%      CPU%     RCPU%     SD%      RSD%
> ________________________________________________________
> 1           -.76    -1.56     2.33      0        3.03
> 2           17.41    11.11    11.41     0       -4.76
> 4           42.12    55.11    30.20     19.51    .62
> 8           54.69    80.00    39.22     24.39    -3.88
> 16          54.77    81.62    40.89     20.34    -6.58
> 24          54.66    79.68    41.57     15.49    -8.99
> 32          54.92    76.82    41.79     17.59    -5.70
> 40          51.79    68.56    40.53     15.31    -3.87
> 48          51.72    66.40    40.84     9.72     -7.13
> 64          51.11    63.94    41.10     5.93     -8.82
> 80          46.51    59.50    39.80     9.33     -4.18
> 96          47.72    57.75    39.84     4.20     -7.62
> 128         54.35    58.95    40.66     3.24     -8.63
> ________________________________________________________
> BW: 38.9%,  CPU/RCPU: 63.0%,40.1%,  SD/RSD: 6.0%,-7.4%
>
> ______________ numtxqs=16, vhosts=5  ___________________
> #sessions   BW%      CPU%     RCPU%     SD%      RSD%
> ________________________________________________________
> 1           -1.43    -3.52    1.55      0          3.03
> 2           33.09     21.63   20.12    -10.00     -9.52
> 4           67.17     94.60   44.28     19.51     -11.80
> 8           75.72     108.14  49.15     25.00     -10.71
> 16          80.34     101.77  52.94     25.93     -4.49
> 24          70.84     93.12   43.62     27.63     -5.03
> 32          69.01     94.16   47.33     29.68     -1.51
> 40          58.56     63.47   25.91    -3.92      -25.85
> 48          61.16     74.70   34.88     .89       -22.08
> 64          54.37     69.09   26.80    -6.68      -30.04
> 80          36.22     22.73   -2.97    -8.25      -27.23
> 96          41.51     50.59   13.24     9.84      -16.77
> 128         48.98     38.15   6.41     -.33       -22.80
> ________________________________________________________
> BW: 46.2%,  CPU/RCPU: 55.2%,18.8%,  SD/RSD: 1.2%,-22.0%
>
> ______________ numtxqs=32, vhosts=5  ___________________
> #            BW%       CPU%    RCPU%    SD%     RSD%
> ________________________________________________________
> 1            7.62     -38.03   -26.26  -50.00   -33.33
> 2            28.95     20.46    21.62   0       -7.14
> 4            84.05     60.79    45.74  -2.43    -12.42
> 8            86.43     79.57    50.32   15.85   -3.10
> 16           88.63     99.48    58.17   9.47    -13.10
> 24           74.65     80.87    41.99  -1.81    -22.89
> 32           63.86     59.21    23.58  -18.13   -36.37
> 40           64.79     60.53    22.23  -15.77   -35.84
> 48           49.68     26.93    .51    -36.40   -49.61
> 64           54.69     36.50    5.41   -26.59   -43.23
> 80           45.06     12.72   -13.25  -37.79   -52.08
> 96           40.21    -3.16    -24.53  -39.92   -52.97
> 128          36.33    -33.19   -43.66  -5.68    -20.49
> ________________________________________________________
> BW: 49.3%,  CPU/RCPU: 15.5%,-8.2%,  SD/RSD: -22.2%,-37.0%
>
>
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>


^ permalink raw reply

* Re: [PATCH] drivers: rtl818x: request DMA-able memory
From: Hin-Tak Leung @ 2010-10-25 15:39 UTC (permalink / raw)
  To: John W. Linville
  Cc: Larry Finger, Serafeim Zanikolas, herton, joe, davem,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <20101025142232.GC2414@tuxdriver.com>


--- On Mon, 25/10/10, John W. Linville <linville@tuxdriver.com> wrote:

> > I had a quick look for similiar constructs and AFAIK
> only the
> > b43/b43legacy drivers uses DMA buffers. Seems to be a
> rare practice.
> > Is that something we should or should not do?
> 
> It doesn't mean what you think it means.  It is a
> relic of the past,
> used to indicate memory below 16MB so that ISA devices
> could do DMA.

okay - sorry about the confusion - I was grep'ing for GFP_DMA and only b43/b43lagacy have it and it is relatively rare. AFAIK none of the rtl8187 devices are non-USB... probably a NACK then, but I should ask Serafeim if there is a reason for him to submit this patch? (other than "it says dma"...) 

Hin-tak


      

^ permalink raw reply

* Re: [PATCH] net: b43legacy: fix compile error
From: Larry Finger @ 2010-10-25 15:32 UTC (permalink / raw)
  To: Arnd Hannemann; +Cc: David S. Miller, netdev, linux-kernel, linux-wireless
In-Reply-To: <1288017690-31248-1-git-send-email-arnd@arndnet.de>

On 10/25/2010 09:41 AM, Arnd Hannemann wrote:
> On todays linus tree the following compile error happened to me:
> 
>   CC [M]  drivers/net/wireless/b43legacy/xmit.o
> In file included from include/net/dst.h:11,
>                  from drivers/net/wireless/b43legacy/xmit.c:31:
> include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__' before '____cacheline_aligned_in_smp'
> include/net/dst_ops.h: In function 'dst_entries_get_fast':
> include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> include/net/dst_ops.h: In function 'dst_entries_get_slow':
> include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> include/net/dst_ops.h: In function 'dst_entries_add':
> include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> include/net/dst_ops.h: In function 'dst_entries_init':
> include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> include/net/dst_ops.h: In function 'dst_entries_destroy':
> include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> make[4]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
> make[3]: *** [drivers/net/wireless/b43legacy] Error 2
> make[2]: *** [drivers/net/wireless] Error 2
> make[1]: *** [drivers/net] Error 2
> make: *** [drivers] Error 2
> 
> This patch fixes this issue by adding "linux/cache.h" as an include to
> "include/net/dst_ops.h".

Strange. Compiling b43legacy from the linux-2.6.git tree (git describe is
v2.6.36-4464-g229aebb) works fine on x86_64. I wonder what is different.

Larry


^ permalink raw reply

* [PATCH] net: b43legacy: fix compile error
From: Arnd Hannemann @ 2010-10-25 14:41 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-kernel, linux-wireless, Arnd Hannemann

On todays linus tree the following compile error happened to me:

  CC [M]  drivers/net/wireless/b43legacy/xmit.o
In file included from include/net/dst.h:11,
                 from drivers/net/wireless/b43legacy/xmit.c:31:
include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__' before '____cacheline_aligned_in_smp'
include/net/dst_ops.h: In function 'dst_entries_get_fast':
include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_get_slow':
include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_add':
include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_init':
include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_destroy':
include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named 'pcpuc_entries'
make[4]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
make[3]: *** [drivers/net/wireless/b43legacy] Error 2
make[2]: *** [drivers/net/wireless] Error 2
make[1]: *** [drivers/net] Error 2
make: *** [drivers] Error 2

This patch fixes this issue by adding "linux/cache.h" as an include to
"include/net/dst_ops.h".

Signed-off-by: Arnd Hannemann <arnd@arndnet.de>
---
 include/net/dst_ops.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
index 1fa5306..51665b3 100644
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -2,6 +2,7 @@
 #define _NET_DST_OPS_H
 #include <linux/types.h>
 #include <linux/percpu_counter.h>
+#include <linux/cache.h>
 
 struct dst_entry;
 struct kmem_cachep;
-- 
1.7.0.4

^ permalink raw reply related

* Re: [PATCH] drivers: rtl818x: request DMA-able memory
From: John W. Linville @ 2010-10-25 14:22 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: Larry Finger, Serafeim Zanikolas, herton, joe, davem,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <4CC5900A.2050209@users.sourceforge.net>

On Mon, Oct 25, 2010 at 03:11:22PM +0100, Hin-Tak Leung wrote:
> 
> 
> Larry Finger wrote:
> >On 10/24/2010 03:32 PM, Serafeim Zanikolas wrote:
> >>Despite the indicated intention in comment, the kmalloc() call was not
> >>explicitly requesting memory from ZONE_DMA.
> >>
> >>Signed-off-by: Serafeim Zanikolas <sez@debian.org>
> >>---
> >> drivers/net/wireless/rtl818x/rtl8187_dev.c |    3 ++-
> >> 1 files changed, 2 insertions(+), 1 deletions(-)
> >>
> >>diff --git a/drivers/net/wireless/rtl818x/rtl8187_dev.c b/drivers/net/wireless/rtl818x/rtl8187_dev.c
> >>index 38fa824..771794d 100644
> >>--- a/drivers/net/wireless/rtl818x/rtl8187_dev.c
> >>+++ b/drivers/net/wireless/rtl818x/rtl8187_dev.c
> >>@@ -1343,7 +1343,8 @@ static int __devinit rtl8187_probe(struct usb_interface *intf,
> >> 	priv->is_rtl8187b = (id->driver_info == DEVICE_RTL8187B);
> >> 	/* allocate "DMA aware" buffer for register accesses */
> >>-	priv->io_dmabuf = kmalloc(sizeof(*priv->io_dmabuf), GFP_KERNEL);
> >>+	priv->io_dmabuf = kmalloc(sizeof(*priv->io_dmabuf),
> >>+				  GFP_DMA | GFP_KERNEL);
> >> 	if (!priv->io_dmabuf) {
> >> 		err = -ENOMEM;
> >> 		goto err_free_dev;
> >
> >ACK.
> >
> >Larry
> 
> Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
> 
> I had a quick look for similiar constructs and AFAIK only the
> b43/b43legacy drivers uses DMA buffers. Seems to be a rare practice.
> Is that something we should or should not do?

It doesn't mean what you think it means.  It is a relic of the past,
used to indicate memory below 16MB so that ISA devices could do DMA.

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [PATCH] drivers: rtl818x: request DMA-able memory
From: Johannes Berg @ 2010-10-25 14:23 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: Larry Finger, Serafeim Zanikolas, herton, linville, joe, davem,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <4CC5900A.2050209@users.sourceforge.net>

On Mon, 2010-10-25 at 15:11 +0100, Hin-Tak Leung wrote:

> >> Despite the indicated intention in comment, the kmalloc() call was not
> >> explicitly requesting memory from ZONE_DMA.

> I had a quick look for similiar constructs and AFAIK only the b43/b43legacy 
> drivers uses DMA buffers. Seems to be a rare practice. Is that something we 
> should or should not do?

I think there's some confusion here about ZONE_DMA vs. DMA-able memory.
All memory you get with kmalloc can be used for DMA, GFP_DMA means using
ZONE_DMA which is a hack for ISA (and in b43 maybe PCMCIA/Cardbus)
devices to put memory into something they can address. I don't think the
latter is necessary for USB devices.

johannes

^ permalink raw reply

* Re: [PATCH] drivers: rtl818x: request DMA-able memory
From: Hin-Tak Leung @ 2010-10-25 14:11 UTC (permalink / raw)
  To: Larry Finger
  Cc: Serafeim Zanikolas, herton, linville, joe, davem, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <4CC5851D.1040204@lwfinger.net>



Larry Finger wrote:
> On 10/24/2010 03:32 PM, Serafeim Zanikolas wrote:
>> Despite the indicated intention in comment, the kmalloc() call was not
>> explicitly requesting memory from ZONE_DMA.
>>
>> Signed-off-by: Serafeim Zanikolas <sez@debian.org>
>> ---
>>  drivers/net/wireless/rtl818x/rtl8187_dev.c |    3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/net/wireless/rtl818x/rtl8187_dev.c b/drivers/net/wireless/rtl818x/rtl8187_dev.c
>> index 38fa824..771794d 100644
>> --- a/drivers/net/wireless/rtl818x/rtl8187_dev.c
>> +++ b/drivers/net/wireless/rtl818x/rtl8187_dev.c
>> @@ -1343,7 +1343,8 @@ static int __devinit rtl8187_probe(struct usb_interface *intf,
>>  	priv->is_rtl8187b = (id->driver_info == DEVICE_RTL8187B);
>>  
>>  	/* allocate "DMA aware" buffer for register accesses */
>> -	priv->io_dmabuf = kmalloc(sizeof(*priv->io_dmabuf), GFP_KERNEL);
>> +	priv->io_dmabuf = kmalloc(sizeof(*priv->io_dmabuf),
>> +				  GFP_DMA | GFP_KERNEL);
>>  	if (!priv->io_dmabuf) {
>>  		err = -ENOMEM;
>>  		goto err_free_dev;
> 
> ACK.
> 
> Larry

Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>

I had a quick look for similiar constructs and AFAIK only the b43/b43legacy 
drivers uses DMA buffers. Seems to be a rare practice. Is that something we 
should or should not do?

Hin-Tak

^ permalink raw reply

* Re: VLAN packets silently dropped in promiscuous mode
From: Guillaume Gaudonville @ 2010-10-25 13:48 UTC (permalink / raw)
  To: Jesse Gross; +Cc: Roger Luethi, netdev, Patrick McHardy
In-Reply-To: <AANLkTi=MYxSUzVUF2sf1G32Z2EcjhfyOJ4EZyv5ePGWM@mail.gmail.com>

Jesse Gross wrote:
> On Fri, Oct 15, 2010 at 2:16 AM, Guillaume Gaudonville
> <guillaume.gaudonville@6wind.com> wrote:
>   
>> Jesse Gross wrote:
>>     
>>> On Thu, Sep 30, 2010 at 1:07 AM, Roger Luethi <rl@hellgate.ch> wrote:
>>>
>>>       
>>>> On Wed, 29 Sep 2010 10:44:26 -0700, Jesse Gross wrote:
>>>>
>>>>         
>>>>> On Wed, Sep 29, 2010 at 4:37 AM, Roger Luethi <rl@hellgate.ch> wrote:
>>>>>
>>>>>           
>>>>>> I noticed packets for unknown VLANs getting silently dropped even in
>>>>>> promiscuous mode (this is true only for the hardware accelerated path).
>>>>>> netif_nit_deliver was introduced specifically to prevent that, but the
>>>>>> function gets called only _after_ packets from unknown VLANs have been
>>>>>> dropped.
>>>>>>
>>>>>>             
>>>>> Some drivers are fixing this on a case by case basis by disabling
>>>>> hardware accelerated VLAN stripping when in promiscuous mode, i.e.:
>>>>>
>>>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5f6c01819979afbfec7e0b15fe52371b8eed87e8
>>>>>
>>>>> However, at this point it is more or less random which drivers do
>>>>> this.  It would obviously be much better if it were consistent.
>>>>>
>>>>>           
>>>> My understanding is this. Hardware VLAN tagging and stripping can always
>>>> be
>>>> enabled. The kernel passes 802.1Q information along with the stripped
>>>> header to libpcap which reassembles the original header where necessary.
>>>> Works for me.
>>>>
>>>>         
>>> Sorry, I misread your original post as saying that the VLAN header
>>> gets dropped, rather than the entire packet.  I agree that this is how
>>> it should work but not necessarily how it does work (again, depending
>>> on the driver).  Here's the problem that I was talking about:
>>>
>>> Most drivers have a snippet of code that looks something like this
>>> (taken from ixgbe):
>>>
>>> if (adapter->vlgrp && is_vlan && (tag & VLAN_VID_MASK))
>>>        vlan_gro_receive(napi, adapter->vlgrp, tag, skb);
>>> else
>>>        napi_gro_receive(napi, skb);
>>>
>>> At this point the VLAN has already been stripped in hardware.  If
>>> there is no VLAN group configured on the device then we hit the second
>>> case.  The VLAN header was removed from the SKB and the tag variable
>>> is unused.  It is no longer possible for libpcap to reconstruct the
>>> header because the information was thrown away (even the fact that
>>> there was a VLAN tag at all).
>>>
>>> There are a couple ways to fix this:
>>>
>>> * Turn off VLAN stripping when in promiscuous mode (as done by the ixgbe
>>> driver)
>>>
>>>       
>> This is not totally true: if changing the MTU ixgbe_change_mtu will call:
>> ixgbe_reinit_locked--> ixgbe_up --> ixgbe_configure:
>>                --> ixgbe_set_rx_mode: flag IFF_PROMISC is tested
>> ixgbe_vlan_filter_enable is not called
>>                --> ixgbe_restore_vlan --> ixgbe_vlan_rx_register: flag
>> IFF_PROMISC is not tested ixgbe_vlan_filter_enable
>>                     will be called.
>>
>> In fact it should happen each time we configure something which needs a
>> reset of the device. Why don't add a test
>> on flag promiscuous directly in ixgbe_vlan_filter_enable? Or do it on each
>> call, if we want to allow a device in promiscuous
>> mode to enable this feature.
>>
>> What do you think?
>>     
>
> I can believe that there are paths that lead to this not working
> correctly.  That was actually my larger point: this is something that
> is commonly not implemented correctly in drivers.  Rather than try to
> study every driver my goal is to just avoid the problem completely by
> handling vlan acceleration centrally in the networking core.  I sent
> out an RFC patch series a few days ago that should solve this problem:
>
> http://marc.info/?l=linux-netdev&m=128700022614170&w=3
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   
Thank you, I'm going to check these patches and try to apply them in our 
kernel.

Best Regards,

-- 
Guillaume Gaudonville
6WIND
Software Engineer

Tel: +33 1 39 30 92 63
Mob: +33 6 47 85 34 33
Fax: +33 1 39 30 92 11
guillaume.gaudonville@6wind.com
www.6wind.com
Join the Multicore Packet Processing Forum: www.multicorepacketprocessing.com

Ce courriel ainsi que toutes les pièces jointes, est uniquement destiné à son ou ses destinataires. Il contient des informations confidentielles qui sont la propriété de 6WIND. Toute révélation, distribution ou copie des informations qu'il contient est strictement interdite. Si vous avez reçu ce message par erreur, veuillez immédiatement le signaler à l'émetteur et détruire toutes les données reçues

This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and contains information that is confidential and proprietary to 6WIND. All unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.


^ permalink raw reply

* RE: Question w.r.t debugfs / netdevice pass-through IOCTL
From: Shyam_Iyer @ 2010-10-25 13:48 UTC (permalink / raw)
  To: john.r.fastabend, shemminger; +Cc: ddutt, netdev
In-Reply-To: <4CC0A0E9.9090903@intel.com>



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of John Fastabend
> Sent: Thursday, October 21, 2010 4:22 PM
> To: Stephen Hemminger
> Cc: Debashis Dutt; netdev@vger.kernel.org
> Subject: Re: Question w.r.t debugfs / netdevice pass-through IOCTL
> 
> On 10/20/2010 9:19 PM, Stephen Hemminger wrote:
> > On Wed, 20 Oct 2010 20:26:50 -0700
> > Debashis Dutt <ddutt@Brocade.COM> wrote:
> >
> >> Hi,
> >>
> >> For the Brocade 10G Ethernet driver (bna) we want to implement a set of operations which is not
> supported by current tools like ethtool.
> >>
> >> Examples of such operations would be
> >>        a) Queries related to CEE, if the link is CEE.
> 
> Assuming CEE is Converged Enhanced Ethernet here.
> 
> For CEE queries please consider using the dcbnl interface in /net/dcb/dcbnl.c. If
> it is missing an interface that would be useful to all DCB devices we could
> entertain adding it. Also this way DCB queries will work with existing tools that
> query these things lldpad/dcbtool.
> 
> The things you would want to know about a CEE device should be about the same
> regardless of the hardware in use lets try to use a single interface and avoid
> private interfaces.

John - I agree on this.. On a sidenote I would like the interface to not be netdev device specific.. 
View the CEE device to be not just an Ethernet controller but a SCSI controller as well and hence if this interface can be generally accessible by other subsystems.

> 
> Thanks,
> John.
> 
> >>        b) Get traces from firmware.
> >
> >>
> >> I was wondering what would be right approach to take here:
> >>                 a) use debugfs (like the Chelsio cxgb4 driver)
> > Works as long as they are really debug operations. The debugfs isn't always
> > available, and support should be a config option for your driver.
> >
> >>                 b) use SIOCDEVPRIVATE for the pass through IOCTL defined in
> >>                     struct net_device_ops{}
> >
> > The problem with ioctl is it doesn't work for 32 bit user space
> > compatiablity. The ioctl compat layer does not have enough context
> > to translate SIOCDEVPRIVATE
> >
> >>                     As per comments in the header file, b) should not be used
> >>                     since this IOCTL is supposed to be deprecated.
> >>                 c) use procfs / sysfs (these may not scale, in our opinion)
> >
> > Although less common, there were drivers putting things in /proc/net/xxx/ethX
> >
> >
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] net: add __rcu annotation to sk_filter
From: Eric Dumazet @ 2010-10-25 13:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Add __rcu annotation to :
        (struct sock)->sk_filter

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sock.h |    2 +-
 net/core/filter.c  |    4 ++--
 net/core/sock.c    |    2 +-
 net/ipv4/udp.c     |    2 +-
 net/ipv6/raw.c     |    2 +-
 net/ipv6/udp.c     |    2 +-
 6 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 73a4f97..c7a7362 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -301,7 +301,7 @@ struct sock {
 	const struct cred	*sk_peer_cred;
 	long			sk_rcvtimeo;
 	long			sk_sndtimeo;
-	struct sk_filter      	*sk_filter;
+	struct sk_filter __rcu	*sk_filter;
 	void			*sk_protinfo;
 	struct timer_list	sk_timer;
 	ktime_t			sk_stamp;
diff --git a/net/core/filter.c b/net/core/filter.c
index 7adf503..7beaec3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -89,8 +89,8 @@ int sk_filter(struct sock *sk, struct sk_buff *skb)
 	rcu_read_lock_bh();
 	filter = rcu_dereference_bh(sk->sk_filter);
 	if (filter) {
-		unsigned int pkt_len = sk_run_filter(skb, filter->insns,
-				filter->len);
+		unsigned int pkt_len = sk_run_filter(skb, filter->insns, filter->len);
+
 		err = pkt_len ? pskb_trim(skb, pkt_len) : -EPERM;
 	}
 	rcu_read_unlock_bh();
diff --git a/net/core/sock.c b/net/core/sock.c
index 11db436..3eed542 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1225,7 +1225,7 @@ struct sock *sk_clone(const struct sock *sk, const gfp_t priority)
 		sock_reset_flag(newsk, SOCK_DONE);
 		skb_queue_head_init(&newsk->sk_error_queue);
 
-		filter = newsk->sk_filter;
+		filter = rcu_dereference_protected(newsk->sk_filter, 1);
 		if (filter != NULL)
 			sk_filter_charge(newsk, filter);
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index b3f7e8c..28cb2d7 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1413,7 +1413,7 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		}
 	}
 
-	if (sk->sk_filter) {
+	if (rcu_dereference_raw(sk->sk_filter)) {
 		if (udp_lib_checksum_complete(skb))
 			goto drop;
 	}
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 45e6efb..86c3952 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -373,7 +373,7 @@ void raw6_icmp_error(struct sk_buff *skb, int nexthdr,
 
 static inline int rawv6_rcv_skb(struct sock * sk, struct sk_buff * skb)
 {
-	if ((raw6_sk(sk)->checksum || sk->sk_filter) &&
+	if ((raw6_sk(sk)->checksum || rcu_dereference_raw(sk->sk_filter)) &&
 	    skb_checksum_complete(skb)) {
 		atomic_inc(&sk->sk_drops);
 		kfree_skb(skb);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index c84dad4..91def93 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -527,7 +527,7 @@ int udpv6_queue_rcv_skb(struct sock * sk, struct sk_buff *skb)
 		}
 	}
 
-	if (sk->sk_filter) {
+	if (rcu_dereference_raw(sk->sk_filter)) {
 		if (udp_lib_checksum_complete(skb))
 			goto drop;
 	}



^ permalink raw reply related

* Re: [PATCH] drivers: rtl818x: request DMA-able memory
From: John W. Linville @ 2010-10-25 13:35 UTC (permalink / raw)
  To: Serafeim Zanikolas
  Cc: herton, htl10, Larry.Finger, joe, davem, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <1287952327-9924-1-git-send-email-sez@debian.org>

On Sun, Oct 24, 2010 at 10:32:07PM +0200, Serafeim Zanikolas wrote:
> Despite the indicated intention in comment, the kmalloc() call was not
> explicitly requesting memory from ZONE_DMA.
> 
> Signed-off-by: Serafeim Zanikolas <sez@debian.org>
> ---
>  drivers/net/wireless/rtl818x/rtl8187_dev.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/wireless/rtl818x/rtl8187_dev.c b/drivers/net/wireless/rtl818x/rtl8187_dev.c
> index 38fa824..771794d 100644
> --- a/drivers/net/wireless/rtl818x/rtl8187_dev.c
> +++ b/drivers/net/wireless/rtl818x/rtl8187_dev.c
> @@ -1343,7 +1343,8 @@ static int __devinit rtl8187_probe(struct usb_interface *intf,
>  	priv->is_rtl8187b = (id->driver_info == DEVICE_RTL8187B);
>  
>  	/* allocate "DMA aware" buffer for register accesses */
> -	priv->io_dmabuf = kmalloc(sizeof(*priv->io_dmabuf), GFP_KERNEL);
> +	priv->io_dmabuf = kmalloc(sizeof(*priv->io_dmabuf),
> +				  GFP_DMA | GFP_KERNEL);
>  	if (!priv->io_dmabuf) {
>  		err = -ENOMEM;
>  		goto err_free_dev;

Are you sure about this?  Are there USB controllers out there with
the ISA 16MB limitation for DMA?

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* [PATCH] ipv4: add __rcu annotations to ip_ra_chain
From: Eric Dumazet @ 2010-10-25 13:32 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Add __rcu annotations to :
        (struct ip_ra_chain)->next
	struct ip_ra_chain *ip_ra_chain;

And use appropriate rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/ip.h       |    4 ++--
 net/ipv4/ip_sockglue.c |   10 +++++++---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index dbee3fe..86e2b18 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -59,7 +59,7 @@ struct ipcm_cookie {
 #define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
 
 struct ip_ra_chain {
-	struct ip_ra_chain	*next;
+	struct ip_ra_chain __rcu *next;
 	struct sock		*sk;
 	union {
 		void			(*destructor)(struct sock *);
@@ -68,7 +68,7 @@ struct ip_ra_chain {
 	struct rcu_head		rcu;
 };
 
-extern struct ip_ra_chain *ip_ra_chain;
+extern struct ip_ra_chain __rcu *ip_ra_chain;
 
 /* IP flags. */
 #define IP_CE		0x8000		/* Flag: "Congestion"		*/
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 64b70ad..3948c86 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -238,7 +238,7 @@ int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc)
    but receiver should be enough clever f.e. to forward mtrace requests,
    sent to multicast group to reach destination designated router.
  */
-struct ip_ra_chain *ip_ra_chain;
+struct ip_ra_chain __rcu *ip_ra_chain;
 static DEFINE_SPINLOCK(ip_ra_lock);
 
 
@@ -253,7 +253,8 @@ static void ip_ra_destroy_rcu(struct rcu_head *head)
 int ip_ra_control(struct sock *sk, unsigned char on,
 		  void (*destructor)(struct sock *))
 {
-	struct ip_ra_chain *ra, *new_ra, **rap;
+	struct ip_ra_chain *ra, *new_ra;
+	struct ip_ra_chain __rcu **rap;
 
 	if (sk->sk_type != SOCK_RAW || inet_sk(sk)->inet_num == IPPROTO_RAW)
 		return -EINVAL;
@@ -261,7 +262,10 @@ int ip_ra_control(struct sock *sk, unsigned char on,
 	new_ra = on ? kmalloc(sizeof(*new_ra), GFP_KERNEL) : NULL;
 
 	spin_lock_bh(&ip_ra_lock);
-	for (rap = &ip_ra_chain; (ra = *rap) != NULL; rap = &ra->next) {
+	for (rap = &ip_ra_chain;
+	     (ra = rcu_dereference_protected(*rap,
+			lockdep_is_held(&ip_ra_lock))) != NULL;
+	     rap = &ra->next) {
 		if (ra->sk == sk) {
 			if (on) {
 				spin_unlock_bh(&ip_ra_lock);



^ permalink raw reply related

* Re: [PATCH] drivers: rtl818x: request DMA-able memory
From: Larry Finger @ 2010-10-25 13:24 UTC (permalink / raw)
  To: Serafeim Zanikolas
  Cc: herton, htl10, linville, joe, davem, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <1287952327-9924-1-git-send-email-sez@debian.org>

On 10/24/2010 03:32 PM, Serafeim Zanikolas wrote:
> Despite the indicated intention in comment, the kmalloc() call was not
> explicitly requesting memory from ZONE_DMA.
> 
> Signed-off-by: Serafeim Zanikolas <sez@debian.org>
> ---
>  drivers/net/wireless/rtl818x/rtl8187_dev.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/wireless/rtl818x/rtl8187_dev.c b/drivers/net/wireless/rtl818x/rtl8187_dev.c
> index 38fa824..771794d 100644
> --- a/drivers/net/wireless/rtl818x/rtl8187_dev.c
> +++ b/drivers/net/wireless/rtl818x/rtl8187_dev.c
> @@ -1343,7 +1343,8 @@ static int __devinit rtl8187_probe(struct usb_interface *intf,
>  	priv->is_rtl8187b = (id->driver_info == DEVICE_RTL8187B);
>  
>  	/* allocate "DMA aware" buffer for register accesses */
> -	priv->io_dmabuf = kmalloc(sizeof(*priv->io_dmabuf), GFP_KERNEL);
> +	priv->io_dmabuf = kmalloc(sizeof(*priv->io_dmabuf),
> +				  GFP_DMA | GFP_KERNEL);
>  	if (!priv->io_dmabuf) {
>  		err = -ENOMEM;
>  		goto err_free_dev;

ACK.

Larry


^ permalink raw reply

* [PATCH] net_ns: add __rcu annotations
From: Eric Dumazet @ 2010-10-25 13:20 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

add __rcu annotation to (struct net)->gen, and use
rcu_dereference_protected() in net_assign_generic()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/net_namespace.h |    2 +-
 net/core/net_namespace.c    |    4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 65af9a0..1bf812b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -88,7 +88,7 @@ struct net {
 #ifdef CONFIG_WEXT_CORE
 	struct sk_buff_head	wext_nlevents;
 #endif
-	struct net_generic	*gen;
+	struct net_generic __rcu	*gen;
 
 	/* Note : following structs are cache line aligned */
 #ifdef CONFIG_XFRM
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index c988e68..3f86026 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -42,7 +42,9 @@ static int net_assign_generic(struct net *net, int id, void *data)
 	BUG_ON(!mutex_is_locked(&net_mutex));
 	BUG_ON(id == 0);
 
-	ng = old_ng = net->gen;
+	old_ng = rcu_dereference_protected(net->gen,
+					   lockdep_is_held(&net_mutex));
+	ng = old_ng;
 	if (old_ng->len >= id)
 		goto assign;
 



^ permalink raw reply related

* [PATCH] rps: add __rcu annotations
From: Eric Dumazet @ 2010-10-25 13:02 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Add __rcu annotations to :
	(struct netdev_rx_queue)->rps_map
	(struct netdev_rx_queue)->rps_flow_table
	struct rps_sock_flow_table *rps_sock_flow_table;

And use appropriate rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/netdevice.h  |   12 ++++++------
 net/core/dev.c             |   12 ++++++------
 net/core/net-sysfs.c       |   20 +++++++++++++-------
 net/core/sysctl_net_core.c |    3 ++-
 4 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fcd3dda..2475206 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -585,15 +585,15 @@ static inline void rps_reset_sock_flow(struct rps_sock_flow_table *table,
 		table->ents[hash & table->mask] = RPS_NO_CPU;
 }
 
-extern struct rps_sock_flow_table *rps_sock_flow_table;
+extern struct rps_sock_flow_table __rcu *rps_sock_flow_table;
 
 /* This structure contains an instance of an RX queue. */
 struct netdev_rx_queue {
-	struct rps_map *rps_map;
-	struct rps_dev_flow_table *rps_flow_table;
-	struct kobject kobj;
-	struct netdev_rx_queue *first;
-	atomic_t count;
+	struct rps_map __rcu		*rps_map;
+	struct rps_dev_flow_table __rcu	*rps_flow_table;
+	struct kobject			kobj;
+	struct netdev_rx_queue		*first;
+	atomic_t			count;
 } ____cacheline_aligned_in_smp;
 #endif /* CONFIG_RPS */
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 78b5a89..625fde2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2413,7 +2413,7 @@ EXPORT_SYMBOL(__skb_get_rxhash);
 #ifdef CONFIG_RPS
 
 /* One global table that all flow-based protocols share. */
-struct rps_sock_flow_table *rps_sock_flow_table __read_mostly;
+struct rps_sock_flow_table __rcu *rps_sock_flow_table __read_mostly;
 EXPORT_SYMBOL(rps_sock_flow_table);
 
 /*
@@ -2425,7 +2425,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 		       struct rps_dev_flow **rflowp)
 {
 	struct netdev_rx_queue *rxqueue;
-	struct rps_map *map = NULL;
+	struct rps_map *map;
 	struct rps_dev_flow_table *flow_table;
 	struct rps_sock_flow_table *sock_flow_table;
 	int cpu = -1;
@@ -2444,15 +2444,15 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 	} else
 		rxqueue = dev->_rx;
 
-	if (rxqueue->rps_map) {
-		map = rcu_dereference(rxqueue->rps_map);
-		if (map && map->len == 1) {
+	map = rcu_dereference(rxqueue->rps_map);
+	if (map) {
+		if (map->len == 1) {
 			tcpu = map->cpus[0];
 			if (cpu_online(tcpu))
 				cpu = tcpu;
 			goto done;
 		}
-	} else if (!rxqueue->rps_flow_table) {
+	} else if (!rcu_dereference_raw(rxqueue->rps_flow_table)) {
 		goto done;
 	}
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b143173..a5ff5a8 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -598,7 +598,8 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
 	}
 
 	spin_lock(&rps_map_lock);
-	old_map = queue->rps_map;
+	old_map = rcu_dereference_protected(queue->rps_map,
+					    lockdep_is_held(&rps_map_lock));
 	rcu_assign_pointer(queue->rps_map, map);
 	spin_unlock(&rps_map_lock);
 
@@ -677,7 +678,8 @@ static ssize_t store_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue,
 		table = NULL;
 
 	spin_lock(&rps_dev_flow_lock);
-	old_table = queue->rps_flow_table;
+	old_table = rcu_dereference_protected(queue->rps_flow_table,
+					      lockdep_is_held(&rps_dev_flow_lock));
 	rcu_assign_pointer(queue->rps_flow_table, table);
 	spin_unlock(&rps_dev_flow_lock);
 
@@ -705,13 +707,17 @@ static void rx_queue_release(struct kobject *kobj)
 {
 	struct netdev_rx_queue *queue = to_rx_queue(kobj);
 	struct netdev_rx_queue *first = queue->first;
+	struct rps_map *map;
+	struct rps_dev_flow_table *flow_table;
 
-	if (queue->rps_map)
-		call_rcu(&queue->rps_map->rcu, rps_map_release);
 
-	if (queue->rps_flow_table)
-		call_rcu(&queue->rps_flow_table->rcu,
-		    rps_dev_flow_table_release);
+	map = rcu_dereference_raw(queue->rps_map);
+	if (map)
+		call_rcu(&map->rcu, rps_map_release);
+
+	flow_table = rcu_dereference_raw(queue->rps_flow_table);
+	if (flow_table)
+		call_rcu(&flow_table->rcu, rps_dev_flow_table_release);
 
 	if (atomic_dec_and_test(&first->count))
 		kfree(first);
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 01eee5d..385b609 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -34,7 +34,8 @@ static int rps_sock_flow_sysctl(ctl_table *table, int write,
 
 	mutex_lock(&sock_flow_mutex);
 
-	orig_sock_table = rps_sock_flow_table;
+	orig_sock_table = rcu_dereference_protected(rps_sock_flow_table,
+					lockdep_is_held(&sock_flow_mutex));
 	size = orig_size = orig_sock_table ? orig_sock_table->mask + 1 : 0;
 
 	ret = proc_dointvec(&tmp, write, buffer, lenp, ppos);



^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox