* Re: [PATCH] gianfar: Fix misleading indentation in startup_gfar()
From: David Miller @ 2011-01-19 0:16 UTC (permalink / raw)
To: cbouatmailru; +Cc: netdev
In-Reply-To: <20110118123602.GA26997@oksana.dev.rtsoft.ru>
From: Anton Vorontsov <cbouatmailru@gmail.com>
Date: Tue, 18 Jan 2011 15:36:02 +0300
> Just stumbled upon the issue while looking for another bug.
>
> The code looks correct, the indentation is not.
>
> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
Applied, thank you.
^ permalink raw reply
* net-next-2.6 open for business...
From: David Miller @ 2011-01-19 0:29 UTC (permalink / raw)
To: netdev; +Cc: netfilter-devel, linux-wireless
It is currently sync'd with net-2.6 and I will start adding
feature and cleanup patches to that tree.
Just FYI...
^ permalink raw reply
* Re: [PATCH] vhost: rcu annotation fixup
From: Mel Gorman @ 2011-01-19 0:40 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Jason Wang, kvm, virtualization, netdev, linux-kernel
In-Reply-To: <20110118110845.GA11555@redhat.com>
On Tue, Jan 18, 2011 at 01:08:45PM +0200, Michael S. Tsirkin wrote:
> When built with rcu checks enabled, vhost triggers
> bogus warnings as vhost features are read without
> dev->mutex sometimes.
> Fixing it properly is not trivial as vhost.h does not
> know which lockdep classes it will be used under.
> Disable the warning by stubbing out the check for now.
>
What is the harm in leaving the bogus warnings until the difficult fix
happens? RCU checks enabled does not seem like something that is enabled
in production. If this patch is applied, there is always the risk that
it'll be simply forgotten about.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply
* [PATCH 1/4] vxge: cleanup probe error paths
From: Jon Mason @ 2011-01-19 1:02 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Sivakumar Subramani, Sreenivasa Honnur, Ram Vepa
Reorder the commands to be in the inverse order of their allocations
(instead of the random order they appear to be in), propagate return
code on errors from pci_request_region and register_netdev, reduce the
config_dev_cnt and total_dev_cnt counters on remove, and return the
correct error code for vdev->vpaths kzalloc failures. Also, prevent
leaking of vdev->vpaths memory and netdev in vxge_probe error path due
to freeing for these not occurring in vxge_device_unregister.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Sivakumar Subramani <sivakumar.subramani@exar.com>
---
drivers/net/vxge/vxge-main.c | 55 +++++++++++++++++++-----------------------
1 files changed, 25 insertions(+), 30 deletions(-)
diff --git a/drivers/net/vxge/vxge-main.c b/drivers/net/vxge/vxge-main.c
index 1ac9b56..cd0698c 100644
--- a/drivers/net/vxge/vxge-main.c
+++ b/drivers/net/vxge/vxge-main.c
@@ -3348,7 +3348,7 @@ static int __devinit vxge_device_register(struct __vxge_hw_device *hldev,
vxge_debug_init(VXGE_ERR,
"%s: vpath memory allocation failed",
vdev->ndev->name);
- ret = -ENODEV;
+ ret = -ENOMEM;
goto _out1;
}
@@ -3369,11 +3369,11 @@ static int __devinit vxge_device_register(struct __vxge_hw_device *hldev,
if (vdev->config.gro_enable)
ndev->features |= NETIF_F_GRO;
- if (register_netdev(ndev)) {
+ ret = register_netdev(ndev);
+ if (ret) {
vxge_debug_init(vxge_hw_device_trace_level_get(hldev),
"%s: %s : device registration failed!",
ndev->name, __func__);
- ret = -ENODEV;
goto _out2;
}
@@ -3444,6 +3444,11 @@ static void vxge_device_unregister(struct __vxge_hw_device *hldev)
/* in 2.6 will call stop() if device is up */
unregister_netdev(dev);
+ kfree(vdev->vpaths);
+
+ /* we are safe to free it now */
+ free_netdev(dev);
+
vxge_debug_init(vdev->level_trace, "%s: ethernet device unregistered",
buf);
vxge_debug_entryexit(vdev->level_trace, "%s: %s:%d Exiting...", buf,
@@ -4334,10 +4339,10 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
goto _exit1;
}
- if (pci_request_region(pdev, 0, VXGE_DRIVER_NAME)) {
+ ret = pci_request_region(pdev, 0, VXGE_DRIVER_NAME);
+ if (ret) {
vxge_debug_init(VXGE_ERR,
"%s : request regions failed", __func__);
- ret = -ENODEV;
goto _exit1;
}
@@ -4642,8 +4647,9 @@ _exit6:
_exit5:
vxge_device_unregister(hldev);
_exit4:
- pci_disable_sriov(pdev);
+ pci_set_drvdata(pdev, NULL);
vxge_hw_device_terminate(hldev);
+ pci_disable_sriov(pdev);
_exit3:
iounmap(attr.bar0);
_exit2:
@@ -4654,7 +4660,7 @@ _exit0:
kfree(ll_config);
kfree(device_config);
driver_config->config_dev_cnt--;
- pci_set_drvdata(pdev, NULL);
+ driver_config->total_dev_cnt--;
return ret;
}
@@ -4667,45 +4673,34 @@ _exit0:
static void __devexit vxge_remove(struct pci_dev *pdev)
{
struct __vxge_hw_device *hldev;
- struct vxgedev *vdev = NULL;
- struct net_device *dev;
- int i = 0;
+ struct vxgedev *vdev;
+ int i;
hldev = pci_get_drvdata(pdev);
-
if (hldev == NULL)
return;
- dev = hldev->ndev;
- vdev = netdev_priv(dev);
+ vdev = netdev_priv(hldev->ndev);
vxge_debug_entryexit(vdev->level_trace, "%s:%d", __func__, __LINE__);
-
vxge_debug_init(vdev->level_trace, "%s : removing PCI device...",
__func__);
- vxge_device_unregister(hldev);
- for (i = 0; i < vdev->no_of_vpath; i++) {
+ for (i = 0; i < vdev->no_of_vpath; i++)
vxge_free_mac_add_list(&vdev->vpaths[i]);
- vdev->vpaths[i].mcast_addr_cnt = 0;
- vdev->vpaths[i].mac_addr_cnt = 0;
- }
-
- kfree(vdev->vpaths);
+ vxge_device_unregister(hldev);
+ pci_set_drvdata(pdev, NULL);
+ /* Do not call pci_disable_sriov here, as it will break child devices */
+ vxge_hw_device_terminate(hldev);
iounmap(vdev->bar0);
-
- /* we are safe to free it now */
- free_netdev(dev);
+ pci_release_region(pdev, 0);
+ pci_disable_device(pdev);
+ driver_config->config_dev_cnt--;
+ driver_config->total_dev_cnt--;
vxge_debug_init(vdev->level_trace, "%s:%d Device unregistered",
__func__, __LINE__);
-
- vxge_hw_device_terminate(hldev);
-
- pci_disable_device(pdev);
- pci_release_region(pdev, 0);
- pci_set_drvdata(pdev, NULL);
vxge_debug_entryexit(vdev->level_trace, "%s:%d Exiting...", __func__,
__LINE__);
}
--
1.7.0.4
^ permalink raw reply related
* [PATCH 2/4] vxge: correct eprom version detection
From: Jon Mason @ 2011-01-19 1:02 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Sivakumar Subramani, Sreenivasa Honnur, Ram Vepa
In-Reply-To: <1295398942-4131-1-git-send-email-jon.mason@exar.com>
The firmware PXE EPROM version detection is failing due to passing the
wrong parameter into firmware query function. Also, the version
printing function has an extraneous newline.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Sivakumar Subramani <sivakumar.subramani@exar.com>
---
drivers/net/vxge/vxge-config.c | 2 +-
drivers/net/vxge/vxge-main.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/vxge/vxge-config.c b/drivers/net/vxge/vxge-config.c
index 01c05f5..da35562 100644
--- a/drivers/net/vxge/vxge-config.c
+++ b/drivers/net/vxge/vxge-config.c
@@ -387,8 +387,8 @@ vxge_hw_vpath_eprom_img_ver_get(struct __vxge_hw_device *hldev,
data1 = steer_ctrl = 0;
status = vxge_hw_vpath_fw_api(vpath,
- VXGE_HW_RTS_ACCESS_STEER_CTRL_DATA_STRUCT_SEL_FW_MEMO,
VXGE_HW_FW_API_GET_EPROM_REV,
+ VXGE_HW_RTS_ACCESS_STEER_CTRL_DATA_STRUCT_SEL_FW_MEMO,
0, &data0, &data1, &steer_ctrl);
if (status != VXGE_HW_OK)
break;
diff --git a/drivers/net/vxge/vxge-main.c b/drivers/net/vxge/vxge-main.c
index cd0698c..9d4b0e8 100644
--- a/drivers/net/vxge/vxge-main.c
+++ b/drivers/net/vxge/vxge-main.c
@@ -4450,7 +4450,7 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
if (!img[i].is_valid)
break;
vxge_debug_init(VXGE_TRACE, "%s: EPROM %d, version "
- "%d.%d.%d.%d\n", VXGE_DRIVER_NAME, i,
+ "%d.%d.%d.%d", VXGE_DRIVER_NAME, i,
VXGE_EPROM_IMG_MAJOR(img[i].version),
VXGE_EPROM_IMG_MINOR(img[i].version),
VXGE_EPROM_IMG_FIX(img[i].version),
--
1.7.0.4
^ permalink raw reply related
* [PATCH 3/4] vxge: MSIX one shot mode
From: Jon Mason @ 2011-01-19 1:02 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Sivakumar Subramani, Sreenivasa Honnur, Ram Vepa,
Masroor Vettuparambil
In-Reply-To: <1295398942-4131-1-git-send-email-jon.mason@exar.com>
To reduce the possibility of losing an interrupt in the handler due to a
race between an interrupt processing and disable/enable of interrupts,
enable MSIX one shot.
Also, add support for adaptive interrupt coalesing
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Masroor Vettuparambil <masroor.vettuparambil@exar.com>
---
drivers/net/vxge/vxge-config.c | 30 +++-----
drivers/net/vxge/vxge-config.h | 10 +++
drivers/net/vxge/vxge-main.c | 159 ++++++++++++++++++++++++++++++++++-----
drivers/net/vxge/vxge-main.h | 23 +++++-
drivers/net/vxge/vxge-traffic.c | 116 ++++++++++++++++++++++++++--
drivers/net/vxge/vxge-traffic.h | 14 +++-
6 files changed, 302 insertions(+), 50 deletions(-)
diff --git a/drivers/net/vxge/vxge-config.c b/drivers/net/vxge/vxge-config.c
index da35562..77097e3 100644
--- a/drivers/net/vxge/vxge-config.c
+++ b/drivers/net/vxge/vxge-config.c
@@ -2868,6 +2868,8 @@ __vxge_hw_ring_create(struct __vxge_hw_vpath_handle *vp,
ring->rxd_init = attr->rxd_init;
ring->rxd_term = attr->rxd_term;
ring->buffer_mode = config->buffer_mode;
+ ring->tim_rti_cfg1_saved = vp->vpath->tim_rti_cfg1_saved;
+ ring->tim_rti_cfg3_saved = vp->vpath->tim_rti_cfg3_saved;
ring->rxds_limit = config->rxds_limit;
ring->rxd_size = vxge_hw_ring_rxd_size_get(config->buffer_mode);
@@ -3511,6 +3513,8 @@ __vxge_hw_fifo_create(struct __vxge_hw_vpath_handle *vp,
/* apply "interrupts per txdl" attribute */
fifo->interrupt_type = VXGE_HW_FIFO_TXD_INT_TYPE_UTILZ;
+ fifo->tim_tti_cfg1_saved = vpath->tim_tti_cfg1_saved;
+ fifo->tim_tti_cfg3_saved = vpath->tim_tti_cfg3_saved;
if (fifo->config->intr)
fifo->interrupt_type = VXGE_HW_FIFO_TXD_INT_TYPE_PER_LIST;
@@ -4377,6 +4381,8 @@ __vxge_hw_vpath_tim_configure(struct __vxge_hw_device *hldev, u32 vp_id)
}
writeq(val64, &vp_reg->tim_cfg1_int_num[VXGE_HW_VPATH_INTR_TX]);
+ vpath->tim_tti_cfg1_saved = val64;
+
val64 = readq(&vp_reg->tim_cfg2_int_num[VXGE_HW_VPATH_INTR_TX]);
if (config->tti.uec_a != VXGE_HW_USE_FLASH_DEFAULT) {
@@ -4433,6 +4439,7 @@ __vxge_hw_vpath_tim_configure(struct __vxge_hw_device *hldev, u32 vp_id)
}
writeq(val64, &vp_reg->tim_cfg3_int_num[VXGE_HW_VPATH_INTR_TX]);
+ vpath->tim_tti_cfg3_saved = val64;
}
if (config->ring.enable == VXGE_HW_RING_ENABLE) {
@@ -4481,6 +4488,8 @@ __vxge_hw_vpath_tim_configure(struct __vxge_hw_device *hldev, u32 vp_id)
}
writeq(val64, &vp_reg->tim_cfg1_int_num[VXGE_HW_VPATH_INTR_RX]);
+ vpath->tim_rti_cfg1_saved = val64;
+
val64 = readq(&vp_reg->tim_cfg2_int_num[VXGE_HW_VPATH_INTR_RX]);
if (config->rti.uec_a != VXGE_HW_USE_FLASH_DEFAULT) {
@@ -4537,6 +4546,7 @@ __vxge_hw_vpath_tim_configure(struct __vxge_hw_device *hldev, u32 vp_id)
}
writeq(val64, &vp_reg->tim_cfg3_int_num[VXGE_HW_VPATH_INTR_RX]);
+ vpath->tim_rti_cfg3_saved = val64;
}
val64 = 0;
@@ -4555,26 +4565,6 @@ __vxge_hw_vpath_tim_configure(struct __vxge_hw_device *hldev, u32 vp_id)
return status;
}
-void vxge_hw_vpath_tti_ci_set(struct __vxge_hw_device *hldev, u32 vp_id)
-{
- struct __vxge_hw_virtualpath *vpath;
- struct vxge_hw_vpath_reg __iomem *vp_reg;
- struct vxge_hw_vp_config *config;
- u64 val64;
-
- vpath = &hldev->virtual_paths[vp_id];
- vp_reg = vpath->vp_reg;
- config = vpath->vp_config;
-
- if (config->fifo.enable == VXGE_HW_FIFO_ENABLE &&
- config->tti.timer_ci_en != VXGE_HW_TIM_TIMER_CI_ENABLE) {
- config->tti.timer_ci_en = VXGE_HW_TIM_TIMER_CI_ENABLE;
- val64 = readq(&vp_reg->tim_cfg1_int_num[VXGE_HW_VPATH_INTR_TX]);
- val64 |= VXGE_HW_TIM_CFG1_INT_NUM_TIMER_CI;
- writeq(val64, &vp_reg->tim_cfg1_int_num[VXGE_HW_VPATH_INTR_TX]);
- }
-}
-
/*
* __vxge_hw_vpath_initialize
* This routine is the final phase of init which initializes the
diff --git a/drivers/net/vxge/vxge-config.h b/drivers/net/vxge/vxge-config.h
index e249e28..3c53aa7 100644
--- a/drivers/net/vxge/vxge-config.h
+++ b/drivers/net/vxge/vxge-config.h
@@ -682,6 +682,10 @@ struct __vxge_hw_virtualpath {
u32 vsport_number;
u32 max_kdfc_db;
u32 max_nofl_db;
+ u64 tim_tti_cfg1_saved;
+ u64 tim_tti_cfg3_saved;
+ u64 tim_rti_cfg1_saved;
+ u64 tim_rti_cfg3_saved;
struct __vxge_hw_ring *____cacheline_aligned ringh;
struct __vxge_hw_fifo *____cacheline_aligned fifoh;
@@ -921,6 +925,9 @@ struct __vxge_hw_ring {
u32 doorbell_cnt;
u32 total_db_cnt;
u64 rxds_limit;
+ u32 rtimer;
+ u64 tim_rti_cfg1_saved;
+ u64 tim_rti_cfg3_saved;
enum vxge_hw_status (*callback)(
struct __vxge_hw_ring *ringh,
@@ -1000,6 +1007,9 @@ struct __vxge_hw_fifo {
u32 per_txdl_space;
u32 vp_id;
u32 tx_intr_num;
+ u32 rtimer;
+ u64 tim_tti_cfg1_saved;
+ u64 tim_tti_cfg3_saved;
enum vxge_hw_status (*callback)(
struct __vxge_hw_fifo *fifo_handle,
diff --git a/drivers/net/vxge/vxge-main.c b/drivers/net/vxge/vxge-main.c
index 9d4b0e8..6c33eab 100644
--- a/drivers/net/vxge/vxge-main.c
+++ b/drivers/net/vxge/vxge-main.c
@@ -371,9 +371,6 @@ vxge_rx_1b_compl(struct __vxge_hw_ring *ringh, void *dtr,
struct vxge_hw_ring_rxd_info ext_info;
vxge_debug_entryexit(VXGE_TRACE, "%s: %s:%d",
ring->ndev->name, __func__, __LINE__);
- ring->pkts_processed = 0;
-
- vxge_hw_ring_replenish(ringh);
do {
prefetch((char *)dtr + L1_CACHE_BYTES);
@@ -1588,6 +1585,36 @@ static int vxge_reset_vpath(struct vxgedev *vdev, int vp_id)
return ret;
}
+/* Configure CI */
+static void vxge_config_ci_for_tti_rti(struct vxgedev *vdev)
+{
+ int i = 0;
+
+ /* Enable CI for RTI */
+ if (vdev->config.intr_type == MSI_X) {
+ for (i = 0; i < vdev->no_of_vpath; i++) {
+ struct __vxge_hw_ring *hw_ring;
+
+ hw_ring = vdev->vpaths[i].ring.handle;
+ vxge_hw_vpath_dynamic_rti_ci_set(hw_ring);
+ }
+ }
+
+ /* Enable CI for TTI */
+ for (i = 0; i < vdev->no_of_vpath; i++) {
+ struct __vxge_hw_fifo *hw_fifo = vdev->vpaths[i].fifo.handle;
+ vxge_hw_vpath_tti_ci_set(hw_fifo);
+ /*
+ * For Inta (with or without napi), Set CI ON for only one
+ * vpath. (Have only one free running timer).
+ */
+ if ((vdev->config.intr_type == INTA) && (i == 0))
+ break;
+ }
+
+ return;
+}
+
static int do_vxge_reset(struct vxgedev *vdev, int event)
{
enum vxge_hw_status status;
@@ -1753,6 +1780,9 @@ static int do_vxge_reset(struct vxgedev *vdev, int event)
netif_tx_wake_all_queues(vdev->ndev);
}
+ /* configure CI */
+ vxge_config_ci_for_tti_rti(vdev);
+
out:
vxge_debug_entryexit(VXGE_TRACE,
"%s:%d Exiting...", __func__, __LINE__);
@@ -1793,22 +1823,29 @@ static void vxge_reset(struct work_struct *work)
*/
static int vxge_poll_msix(struct napi_struct *napi, int budget)
{
- struct vxge_ring *ring =
- container_of(napi, struct vxge_ring, napi);
+ struct vxge_ring *ring = container_of(napi, struct vxge_ring, napi);
+ int pkts_processed;
int budget_org = budget;
- ring->budget = budget;
+ ring->budget = budget;
+ ring->pkts_processed = 0;
vxge_hw_vpath_poll_rx(ring->handle);
+ pkts_processed = ring->pkts_processed;
if (ring->pkts_processed < budget_org) {
napi_complete(napi);
+
/* Re enable the Rx interrupts for the vpath */
vxge_hw_channel_msix_unmask(
(struct __vxge_hw_channel *)ring->handle,
ring->rx_vector_no);
+ mmiowb();
}
- return ring->pkts_processed;
+ /* We are copying and returning the local variable, in case if after
+ * clearing the msix interrupt above, if the interrupt fires right
+ * away which can preempt this NAPI thread */
+ return pkts_processed;
}
static int vxge_poll_inta(struct napi_struct *napi, int budget)
@@ -1824,6 +1861,7 @@ static int vxge_poll_inta(struct napi_struct *napi, int budget)
for (i = 0; i < vdev->no_of_vpath; i++) {
ring = &vdev->vpaths[i].ring;
ring->budget = budget;
+ ring->pkts_processed = 0;
vxge_hw_vpath_poll_rx(ring->handle);
pkts_processed += ring->pkts_processed;
budget -= ring->pkts_processed;
@@ -2054,6 +2092,7 @@ static int vxge_open_vpaths(struct vxgedev *vdev)
netdev_get_tx_queue(vdev->ndev, 0);
vpath->fifo.indicate_max_pkts =
vdev->config.fifo_indicate_max_pkts;
+ vpath->fifo.tx_vector_no = 0;
vpath->ring.rx_vector_no = 0;
vpath->ring.rx_csum = vdev->rx_csum;
vpath->ring.rx_hwts = vdev->rx_hwts;
@@ -2079,6 +2118,61 @@ static int vxge_open_vpaths(struct vxgedev *vdev)
return VXGE_HW_OK;
}
+/**
+ * adaptive_coalesce_tx_interrupts - Changes the interrupt coalescing
+ * if the interrupts are not within a range
+ * @fifo: pointer to transmit fifo structure
+ * Description: The function changes boundary timer and restriction timer
+ * value depends on the traffic
+ * Return Value: None
+ */
+static void adaptive_coalesce_tx_interrupts(struct vxge_fifo *fifo)
+{
+ fifo->interrupt_count++;
+ if (jiffies > fifo->jiffies + HZ / 100) {
+ struct __vxge_hw_fifo *hw_fifo = fifo->handle;
+
+ fifo->jiffies = jiffies;
+ if (fifo->interrupt_count > VXGE_T1A_MAX_TX_INTERRUPT_COUNT &&
+ hw_fifo->rtimer != VXGE_TTI_RTIMER_ADAPT_VAL) {
+ hw_fifo->rtimer = VXGE_TTI_RTIMER_ADAPT_VAL;
+ vxge_hw_vpath_dynamic_tti_rtimer_set(hw_fifo);
+ } else if (hw_fifo->rtimer != 0) {
+ hw_fifo->rtimer = 0;
+ vxge_hw_vpath_dynamic_tti_rtimer_set(hw_fifo);
+ }
+ fifo->interrupt_count = 0;
+ }
+}
+
+/**
+ * adaptive_coalesce_rx_interrupts - Changes the interrupt coalescing
+ * if the interrupts are not within a range
+ * @ring: pointer to receive ring structure
+ * Description: The function increases of decreases the packet counts within
+ * the ranges of traffic utilization, if the interrupts due to this ring are
+ * not within a fixed range.
+ * Return Value: Nothing
+ */
+static void adaptive_coalesce_rx_interrupts(struct vxge_ring *ring)
+{
+ ring->interrupt_count++;
+ if (jiffies > ring->jiffies + HZ / 100) {
+ struct __vxge_hw_ring *hw_ring = ring->handle;
+
+ ring->jiffies = jiffies;
+ if (ring->interrupt_count > VXGE_T1A_MAX_INTERRUPT_COUNT &&
+ hw_ring->rtimer != VXGE_RTI_RTIMER_ADAPT_VAL) {
+ hw_ring->rtimer = VXGE_RTI_RTIMER_ADAPT_VAL;
+ vxge_hw_vpath_dynamic_rti_rtimer_set(hw_ring);
+ } else if (hw_ring->rtimer != 0) {
+ hw_ring->rtimer = 0;
+ vxge_hw_vpath_dynamic_rti_rtimer_set(hw_ring);
+ }
+ ring->interrupt_count = 0;
+ }
+}
+
/*
* vxge_isr_napi
* @irq: the irq of the device.
@@ -2139,24 +2233,39 @@ static irqreturn_t vxge_isr_napi(int irq, void *dev_id)
#ifdef CONFIG_PCI_MSI
-static irqreturn_t
-vxge_tx_msix_handle(int irq, void *dev_id)
+static irqreturn_t vxge_tx_msix_handle(int irq, void *dev_id)
{
struct vxge_fifo *fifo = (struct vxge_fifo *)dev_id;
+ adaptive_coalesce_tx_interrupts(fifo);
+
+ vxge_hw_channel_msix_mask((struct __vxge_hw_channel *)fifo->handle,
+ fifo->tx_vector_no);
+
+ vxge_hw_channel_msix_clear((struct __vxge_hw_channel *)fifo->handle,
+ fifo->tx_vector_no);
+
VXGE_COMPLETE_VPATH_TX(fifo);
+ vxge_hw_channel_msix_unmask((struct __vxge_hw_channel *)fifo->handle,
+ fifo->tx_vector_no);
+
+ mmiowb();
+
return IRQ_HANDLED;
}
-static irqreturn_t
-vxge_rx_msix_napi_handle(int irq, void *dev_id)
+static irqreturn_t vxge_rx_msix_napi_handle(int irq, void *dev_id)
{
struct vxge_ring *ring = (struct vxge_ring *)dev_id;
- /* MSIX_IDX for Rx is 1 */
+ adaptive_coalesce_rx_interrupts(ring);
+
vxge_hw_channel_msix_mask((struct __vxge_hw_channel *)ring->handle,
- ring->rx_vector_no);
+ ring->rx_vector_no);
+
+ vxge_hw_channel_msix_clear((struct __vxge_hw_channel *)ring->handle,
+ ring->rx_vector_no);
napi_schedule(&ring->napi);
return IRQ_HANDLED;
@@ -2173,14 +2282,20 @@ vxge_alarm_msix_handle(int irq, void *dev_id)
VXGE_HW_VPATH_MSIX_ACTIVE) + VXGE_ALARM_MSIX_ID;
for (i = 0; i < vdev->no_of_vpath; i++) {
+ /* Reduce the chance of loosing alarm interrupts by masking
+ * the vector. A pending bit will be set if an alarm is
+ * generated and on unmask the interrupt will be fired.
+ */
vxge_hw_vpath_msix_mask(vdev->vpaths[i].handle, msix_id);
+ vxge_hw_vpath_msix_clear(vdev->vpaths[i].handle, msix_id);
+ mmiowb();
status = vxge_hw_vpath_alarm_process(vdev->vpaths[i].handle,
vdev->exec_mode);
if (status == VXGE_HW_OK) {
-
vxge_hw_vpath_msix_unmask(vdev->vpaths[i].handle,
- msix_id);
+ msix_id);
+ mmiowb();
continue;
}
vxge_debug_intr(VXGE_ERR,
@@ -2299,6 +2414,9 @@ static int vxge_enable_msix(struct vxgedev *vdev)
vpath->ring.rx_vector_no = (vpath->device_id *
VXGE_HW_VPATH_MSIX_ACTIVE) + 1;
+ vpath->fifo.tx_vector_no = (vpath->device_id *
+ VXGE_HW_VPATH_MSIX_ACTIVE);
+
vxge_hw_vpath_msix_set(vpath->handle, tim_msix_id,
VXGE_ALARM_MSIX_ID);
}
@@ -2474,8 +2592,9 @@ INTA_MODE:
"%s:vxge:INTA", vdev->ndev->name);
vxge_hw_device_set_intr_type(vdev->devh,
VXGE_HW_INTR_MODE_IRQLINE);
- vxge_hw_vpath_tti_ci_set(vdev->devh,
- vdev->vpaths[0].device_id);
+
+ vxge_hw_vpath_tti_ci_set(vdev->vpaths[0].fifo.handle);
+
ret = request_irq((int) vdev->pdev->irq,
vxge_isr_napi,
IRQF_SHARED, vdev->desc[0], vdev);
@@ -2745,6 +2864,10 @@ static int vxge_open(struct net_device *dev)
}
netif_tx_start_all_queues(vdev->ndev);
+
+ /* configure CI */
+ vxge_config_ci_for_tti_rti(vdev);
+
goto out0;
out2:
@@ -3804,7 +3927,7 @@ static void __devinit vxge_device_config_init(
break;
case MSI_X:
- device_config->intr_mode = VXGE_HW_INTR_MODE_MSIX;
+ device_config->intr_mode = VXGE_HW_INTR_MODE_MSIX_ONE_SHOT;
break;
}
diff --git a/drivers/net/vxge/vxge-main.h b/drivers/net/vxge/vxge-main.h
index 5746fed..40474f0 100644
--- a/drivers/net/vxge/vxge-main.h
+++ b/drivers/net/vxge/vxge-main.h
@@ -59,11 +59,13 @@
#define VXGE_TTI_LTIMER_VAL 1000
#define VXGE_T1A_TTI_LTIMER_VAL 80
#define VXGE_TTI_RTIMER_VAL 0
+#define VXGE_TTI_RTIMER_ADAPT_VAL 10
#define VXGE_T1A_TTI_RTIMER_VAL 400
#define VXGE_RTI_BTIMER_VAL 250
#define VXGE_RTI_LTIMER_VAL 100
#define VXGE_RTI_RTIMER_VAL 0
-#define VXGE_FIFO_INDICATE_MAX_PKTS VXGE_DEF_FIFO_LENGTH
+#define VXGE_RTI_RTIMER_ADAPT_VAL 15
+#define VXGE_FIFO_INDICATE_MAX_PKTS VXGE_DEF_FIFO_LENGTH
#define VXGE_ISR_POLLING_CNT 8
#define VXGE_MAX_CONFIG_DEV 0xFF
#define VXGE_EXEC_MODE_DISABLE 0
@@ -107,6 +109,14 @@
#define RTI_T1A_RX_UFC_C 50
#define RTI_T1A_RX_UFC_D 60
+/*
+ * The interrupt rate is maintained at 3k per second with the moderation
+ * parameters for most traffic but not all. This is the maximum interrupt
+ * count allowed per function with INTA or per vector in the case of
+ * MSI-X in a 10 millisecond time period. Enabled only for Titan 1A.
+ */
+#define VXGE_T1A_MAX_INTERRUPT_COUNT 100
+#define VXGE_T1A_MAX_TX_INTERRUPT_COUNT 200
/* Milli secs timer period */
#define VXGE_TIMER_DELAY 10000
@@ -247,6 +257,11 @@ struct vxge_fifo {
int tx_steering_type;
int indicate_max_pkts;
+ /* Adaptive interrupt moderation parameters used in T1A */
+ unsigned long interrupt_count;
+ unsigned long jiffies;
+
+ u32 tx_vector_no;
/* Tx stats */
struct vxge_fifo_stats stats;
} ____cacheline_aligned;
@@ -271,6 +286,10 @@ struct vxge_ring {
*/
int driver_id;
+ /* Adaptive interrupt moderation parameters used in T1A */
+ unsigned long interrupt_count;
+ unsigned long jiffies;
+
/* copy of the flag indicating whether rx_csum is to be used */
u32 rx_csum:1,
rx_hwts:1;
@@ -286,7 +305,7 @@ struct vxge_ring {
int vlan_tag_strip;
struct vlan_group *vlgrp;
- int rx_vector_no;
+ u32 rx_vector_no;
enum vxge_hw_status last_status;
/* Rx stats */
diff --git a/drivers/net/vxge/vxge-traffic.c b/drivers/net/vxge/vxge-traffic.c
index 4c10d6c..8674f33 100644
--- a/drivers/net/vxge/vxge-traffic.c
+++ b/drivers/net/vxge/vxge-traffic.c
@@ -218,6 +218,68 @@ exit:
return status;
}
+void vxge_hw_vpath_tti_ci_set(struct __vxge_hw_fifo *fifo)
+{
+ struct vxge_hw_vpath_reg __iomem *vp_reg;
+ struct vxge_hw_vp_config *config;
+ u64 val64;
+
+ if (fifo->config->enable != VXGE_HW_FIFO_ENABLE)
+ return;
+
+ vp_reg = fifo->vp_reg;
+ config = container_of(fifo->config, struct vxge_hw_vp_config, fifo);
+
+ if (config->tti.timer_ci_en != VXGE_HW_TIM_TIMER_CI_ENABLE) {
+ config->tti.timer_ci_en = VXGE_HW_TIM_TIMER_CI_ENABLE;
+ val64 = readq(&vp_reg->tim_cfg1_int_num[VXGE_HW_VPATH_INTR_TX]);
+ val64 |= VXGE_HW_TIM_CFG1_INT_NUM_TIMER_CI;
+ fifo->tim_tti_cfg1_saved = val64;
+ writeq(val64, &vp_reg->tim_cfg1_int_num[VXGE_HW_VPATH_INTR_TX]);
+ }
+}
+
+void vxge_hw_vpath_dynamic_rti_ci_set(struct __vxge_hw_ring *ring)
+{
+ u64 val64 = ring->tim_rti_cfg1_saved;
+
+ val64 |= VXGE_HW_TIM_CFG1_INT_NUM_TIMER_CI;
+ ring->tim_rti_cfg1_saved = val64;
+ writeq(val64, &ring->vp_reg->tim_cfg1_int_num[VXGE_HW_VPATH_INTR_RX]);
+}
+
+void vxge_hw_vpath_dynamic_tti_rtimer_set(struct __vxge_hw_fifo *fifo)
+{
+ u64 val64 = fifo->tim_tti_cfg3_saved;
+ u64 timer = (fifo->rtimer * 1000) / 272;
+
+ val64 &= ~VXGE_HW_TIM_CFG3_INT_NUM_RTIMER_VAL(0x3ffffff);
+ if (timer)
+ val64 |= VXGE_HW_TIM_CFG3_INT_NUM_RTIMER_VAL(timer) |
+ VXGE_HW_TIM_CFG3_INT_NUM_RTIMER_EVENT_SF(5);
+
+ writeq(val64, &fifo->vp_reg->tim_cfg3_int_num[VXGE_HW_VPATH_INTR_TX]);
+ /* tti_cfg3_saved is not updated again because it is
+ * initialized at one place only - init time.
+ */
+}
+
+void vxge_hw_vpath_dynamic_rti_rtimer_set(struct __vxge_hw_ring *ring)
+{
+ u64 val64 = ring->tim_rti_cfg3_saved;
+ u64 timer = (ring->rtimer * 1000) / 272;
+
+ val64 &= ~VXGE_HW_TIM_CFG3_INT_NUM_RTIMER_VAL(0x3ffffff);
+ if (timer)
+ val64 |= VXGE_HW_TIM_CFG3_INT_NUM_RTIMER_VAL(timer) |
+ VXGE_HW_TIM_CFG3_INT_NUM_RTIMER_EVENT_SF(4);
+
+ writeq(val64, &ring->vp_reg->tim_cfg3_int_num[VXGE_HW_VPATH_INTR_RX]);
+ /* rti_cfg3_saved is not updated again because it is
+ * initialized at one place only - init time.
+ */
+}
+
/**
* vxge_hw_channel_msix_mask - Mask MSIX Vector.
* @channeh: Channel for rx or tx handle
@@ -254,6 +316,23 @@ vxge_hw_channel_msix_unmask(struct __vxge_hw_channel *channel, int msix_id)
}
/**
+ * vxge_hw_channel_msix_clear - Unmask the MSIX Vector.
+ * @channel: Channel for rx or tx handle
+ * @msix_id: MSI ID
+ *
+ * The function unmasks the msix interrupt for the given msix_id
+ * if configured in MSIX oneshot mode
+ *
+ * Returns: 0
+ */
+void vxge_hw_channel_msix_clear(struct __vxge_hw_channel *channel, int msix_id)
+{
+ __vxge_hw_pio_mem_write32_upper(
+ (u32) vxge_bVALn(vxge_mBIT(msix_id >> 2), 0, 32),
+ &channel->common_reg->clr_msix_one_shot_vec[msix_id % 4]);
+}
+
+/**
* vxge_hw_device_set_intr_type - Updates the configuration
* with new interrupt type.
* @hldev: HW device handle.
@@ -2191,19 +2270,14 @@ vxge_hw_vpath_msix_set(struct __vxge_hw_vpath_handle *vp, int *tim_msix_id,
if (vpath->hldev->config.intr_mode ==
VXGE_HW_INTR_MODE_MSIX_ONE_SHOT) {
__vxge_hw_pio_mem_write32_upper((u32)vxge_bVALn(
+ VXGE_HW_ONE_SHOT_VECT0_EN_ONE_SHOT_VECT0_EN,
+ 0, 32), &vp_reg->one_shot_vect0_en);
+ __vxge_hw_pio_mem_write32_upper((u32)vxge_bVALn(
VXGE_HW_ONE_SHOT_VECT1_EN_ONE_SHOT_VECT1_EN,
0, 32), &vp_reg->one_shot_vect1_en);
- }
-
- if (vpath->hldev->config.intr_mode ==
- VXGE_HW_INTR_MODE_MSIX_ONE_SHOT) {
__vxge_hw_pio_mem_write32_upper((u32)vxge_bVALn(
VXGE_HW_ONE_SHOT_VECT2_EN_ONE_SHOT_VECT2_EN,
0, 32), &vp_reg->one_shot_vect2_en);
-
- __vxge_hw_pio_mem_write32_upper((u32)vxge_bVALn(
- VXGE_HW_ONE_SHOT_VECT3_EN_ONE_SHOT_VECT3_EN,
- 0, 32), &vp_reg->one_shot_vect3_en);
}
}
@@ -2229,6 +2303,32 @@ vxge_hw_vpath_msix_mask(struct __vxge_hw_vpath_handle *vp, int msix_id)
}
/**
+ * vxge_hw_vpath_msix_clear - Clear MSIX Vector.
+ * @vp: Virtual Path handle.
+ * @msix_id: MSI ID
+ *
+ * The function clears the msix interrupt for the given msix_id
+ *
+ * Returns: 0,
+ * Otherwise, VXGE_HW_ERR_WRONG_IRQ if the msix index is out of range
+ * status.
+ * See also:
+ */
+void vxge_hw_vpath_msix_clear(struct __vxge_hw_vpath_handle *vp, int msix_id)
+{
+ struct __vxge_hw_device *hldev = vp->vpath->hldev;
+
+ if ((hldev->config.intr_mode == VXGE_HW_INTR_MODE_MSIX_ONE_SHOT))
+ __vxge_hw_pio_mem_write32_upper(
+ (u32) vxge_bVALn(vxge_mBIT((msix_id >> 2)), 0, 32),
+ &hldev->common_reg->clr_msix_one_shot_vec[msix_id % 4]);
+ else
+ __vxge_hw_pio_mem_write32_upper(
+ (u32) vxge_bVALn(vxge_mBIT((msix_id >> 2)), 0, 32),
+ &hldev->common_reg->clear_msix_mask_vect[msix_id % 4]);
+}
+
+/**
* vxge_hw_vpath_msix_unmask - Unmask the MSIX Vector.
* @vp: Virtual Path handle.
* @msix_id: MSI ID
diff --git a/drivers/net/vxge/vxge-traffic.h b/drivers/net/vxge/vxge-traffic.h
index 8c3103f..760c319 100644
--- a/drivers/net/vxge/vxge-traffic.h
+++ b/drivers/net/vxge/vxge-traffic.h
@@ -2142,6 +2142,10 @@ void vxge_hw_device_clear_tx_rx(
* Virtual Paths
*/
+void vxge_hw_vpath_dynamic_rti_rtimer_set(struct __vxge_hw_ring *ring);
+
+void vxge_hw_vpath_dynamic_tti_rtimer_set(struct __vxge_hw_fifo *fifo);
+
u32 vxge_hw_vpath_id(
struct __vxge_hw_vpath_handle *vpath_handle);
@@ -2245,6 +2249,8 @@ void
vxge_hw_vpath_msix_mask(struct __vxge_hw_vpath_handle *vpath_handle,
int msix_id);
+void vxge_hw_vpath_msix_clear(struct __vxge_hw_vpath_handle *vp, int msix_id);
+
void vxge_hw_device_flush_io(struct __vxge_hw_device *devh);
void
@@ -2270,6 +2276,9 @@ void
vxge_hw_channel_msix_unmask(struct __vxge_hw_channel *channelh, int msix_id);
void
+vxge_hw_channel_msix_clear(struct __vxge_hw_channel *channelh, int msix_id);
+
+void
vxge_hw_channel_dtr_try_complete(struct __vxge_hw_channel *channel,
void **dtrh);
@@ -2282,7 +2291,8 @@ vxge_hw_channel_dtr_free(struct __vxge_hw_channel *channel, void *dtrh);
int
vxge_hw_channel_dtr_count(struct __vxge_hw_channel *channel);
-void
-vxge_hw_vpath_tti_ci_set(struct __vxge_hw_device *hldev, u32 vp_id);
+void vxge_hw_vpath_tti_ci_set(struct __vxge_hw_fifo *fifo);
+
+void vxge_hw_vpath_dynamic_rti_ci_set(struct __vxge_hw_ring *ring);
#endif
--
1.7.0.4
^ permalink raw reply related
* [PATCH 4/4] vxge: update driver version
From: Jon Mason @ 2011-01-19 1:02 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Sivakumar Subramani, Sreenivasa Honnur, Ram Vepa
In-Reply-To: <1295398942-4131-1-git-send-email-jon.mason@exar.com>
Update vxge driver version to 2.5.2
Signed-off-by: Jon Mason <jon.mason@exar.com>
---
drivers/net/vxge/vxge-version.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/vxge/vxge-version.h b/drivers/net/vxge/vxge-version.h
index ad2f99b..581e215 100644
--- a/drivers/net/vxge/vxge-version.h
+++ b/drivers/net/vxge/vxge-version.h
@@ -16,8 +16,8 @@
#define VXGE_VERSION_MAJOR "2"
#define VXGE_VERSION_MINOR "5"
-#define VXGE_VERSION_FIX "1"
-#define VXGE_VERSION_BUILD "22082"
+#define VXGE_VERSION_FIX "2"
+#define VXGE_VERSION_BUILD "22259"
#define VXGE_VERSION_FOR "k"
#define VXGE_FW_VER(maj, min, bld) (((maj) << 16) + ((min) << 8) + (bld))
--
1.7.0.4
^ permalink raw reply related
* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: Jay Vosburgh @ 2011-01-19 1:45 UTC (permalink / raw)
To: =?UTF-8?B?Tmljb2xhcyBkZSBQZXNsb8O8YW4=?=
Cc: Oleg V. Ukhno, John Fastabend, David S. Miller,
netdev@vger.kernel.org, =?UTF-8?B?U8OpYmFzdGllbiBCYXJyw6k=?=,
Christophe Paasch
In-Reply-To: <4D360408.1080104@gmail.com>
Nicolas de Pesloüan <nicolas.2p.debian@gmail.com> wrote:
>Le 18/01/2011 21:24, Jay Vosburgh a écrit :
>> Nicolas de Pesloüan<nicolas.2p.debian@gmail.com> wrote:
>
>>>>> - it is possible to detect path failure using arp monitoring instead of
>>>>> miimon.
>>
>> I don't think this is true, at least not for the case of
>> balance-rr. Using ARP monitoring with any sort of load balance scheme
>> is problematic, because the replies may be balanced to a different slave
>> than the sender.
>
>Cannot we achieve the expected arp monitoring by using the exact same
>artifice that Oleg suggested: using a different source MAC per slave for
>arp monitoring, so that return path match sending path ?
It's not as simple with ARP, because it's a control protocol
that has side effects.
First, the MAC level broadcast ARP probes from bonding would
have to be round robined in such a manner that they regularly arrive at
every possible slave. A single broadcast won't be sent to more than one
member of the channel group by the switch. We can't do multiple unicast
ARPs with different destination MAC addresses, because we'd have to
track all of those MACs somewhere (keep track of the MAC of every slave
on each peer we're monitoring). I suspect that snooping switches will
get all whiny about port flapping and the like.
We could have a separate IP address per slave, used only for
link monitoring, but that's a huge headache. Actually, it's a lot like
the multi-link stuff I've been working on (and posted RFC of in
December), but that doesn't use ARP (it segregates slaves by IP subnet,
and balances at the IP layer). Basically, you need a overlaying active
protocol to handle the map of which slave goes where (which multi-link
has).
So, maybe we have the ARP replies massaged such that the
Ethernet header source and ARP target hardware address don't match.
So the probes from bonding currently look like this:
MAC-A > ff:ff:ff:ff:ff:ff Request who-has 10.0.4.2 tell 10.0.1.1
Where MAC-A is the bond's MAC address. And the replies now look
like this:
MAC-B > MAC-A, Reply 10.0.4.2 is-at MAC-B
Where MAC-B is the MAC of the peer's bond. The massaged replies
would be of the form:
MAC-C > MAC-A, Reply 10.0.4.2 is-at MAC-B
where MAC-C is the slave "permanent" address (which is really a
fake address to manipulate the switch's hash), and MAC-B is whatever the
real MAC of the bond is. I don't think we can mess with MAC-B in the
reply (the "is-at" part), because that would update ARP tables and such.
If we change MAC-A in the reply, they're liable to be filtered out. I
really don't know if putting MAC-C in there as the source would confuse
snooping switches or not.
One other thought I had while chewing on this is to run the LACP
protocol exchange between the bonding peers directly, instead of between
each bond and each switch. I have no idea if this would work or not,
but the theory would look something like the "VLAN tunnel" topology for
the switches, but the bonds at the ends are configured for 802.3ad. To
make this work, bonding would have to be able to run mutiple LACP
instances (one for each bonding peer on the network) over a single
aggregator (or permit slaves to belong to multiple active aggregators).
This would basically be the same as the multi-link business, except
using LACP for the active protocol to build the map.
A distinguished correspondent (who may confess if he so chooses)
also suggested 802.2 LLC XID or TEST frames, which have been discussed
in the past. Those don't have side effects, but I'm not sure if either
is technically feasible, or if we really want bonding to have a
dependency on llc. They would also only interop with hosts that respond
to the XID or TEST. I haven't thought about this in detail for a number
of years, but I think the LLC DSAP / SSAP space is pretty small.
>>>>> - changing the destination MAC address of egress packets are not
>>>>> necessary, because egress path selection force ingress path selection
>>>>> due to the VLAN.
>>
>> This is true, with one comment: Oleg's proposal we're discussing
>> changes the source MAC address of outgoing packets, not the destination.
>> The purpose being to manipulate the src-mac balancing algorithm on the
>> switch when the packets are hashed at the egress port channel group.
>> The packets (for a particular destination) all bear the same destination
>> MAC, but (as I understand it) are manually assigned tailored source MAC
>> addresses that hash to sequential values.
>
>Yes, you're right.
>
>> That's true. The big problem with the "VLAN tunnel" approach is
>> that it's not tolerant of link failures.
>
>Yes, except if we find a way to make arp monitoring reliable in load balancing situation.
>
>[snip]
>
>> This is essentially the same thing as the diagram I pasted in up
>> above, except with VLANs and an additional layer of switches between the
>> hosts. The multiple VLANs take the place of multiple discrete switches.
>>
>> This could also be accomplished via bridge groups (in
>> Cisco-speak). For example, instead of VLAN 100, that could be bridge
>> group X, VLAN 200 is bridge group Y, and so on.
>>
>> Neither the VLAN nor the bridge group methods handle link
>> failures very well; if, in the above diagram, the link from "switch 2
>> vlan 100" to "host B" fails, there's no way for host A to know to stop
>> sending to "switch 1 vlan 100," and there's no backup path for VLAN 100
>> to "host B."
>
>Can't we imagine to "arp monitor" the destination MAC address of host B,
>on both paths ? That way, host A would know that a given path is down,
>because return path would be the same. The target host should send the
>reply on the slave on which it receive the request, which is the normal
>way to reply to arp request.
I think you can only get away with this if each slave set (where
a "set" is one slave from each bond that's attending our little load
balancing party) is on a separate switch domain, and the switch domains
are not bridged together. Otherwise the switches will flap their MAC
tables as they update from each probe that they see.
As for the reply going out the same slave, to do that, bonding
would have to intercept the ARP traffic (because ARPs arriving on slaves
are normally assigned to the bond itself, not the slave) and track and
tweak them.
Lastly, bonding would again have to maintain a map, showing
which destinations are reachable via which set of slaves. All peer
systems (needing to have per-slave link monitoring) would have to be ARP
targets.
>> One item I'd like to see some more data on is the level of
>> reordering at the receiver in Oleg's system.
>
>This is exactly the reason why I asked Oleg to do some test with
>balance-rr. I cannot find a good reason for a possibly new
>xmit_hash_policy to provide better throughput than current balance-rr. If
>the throughput increase by, let's say, less than 20%, whatever
>tcp_reordering value, then it is probably a dead end way.
Well, the point of making a round robin xmit_hash_policy isn't
that the throughput will be better than the existing round robin, it's
to make round-robin accessible to the 802.3ad mode.
>> One of the reasons round robin isn't as useful as it once was is
>> due to the rise of NAPI and interrupt coalescing, both of which will
>> tend to increase the reordering of packets at the receiver when the
>> packets are evenly striped. In the old days, it was one interrupt, one
>> packet. Now, it's one interrupt or NAPI poll, many packets. With the
>> packets striped across interfaces, this will tend to increase
>> reordering. E.g.,
>>
>> slave 1 slave 2 slave 3
>> Packet 1 P2 P3
>> P4 P5 P6
>> P7 P8 P9
>>
>> and so on. A poll of slave 1 will get packets 1, 4 and 7 (and
>> probably several more), then a poll of slave 2 will get 2, 5 and 8, etc.
>
>Any chance to receive P1, P2, P3 on slave 1, P4, P5, P6 on slave 2 et P7,
>P8, P9 on slave3, possibly by sending grouped packets, changing the
>sending slave every N packets instead of every packet ? I think we already
>discussed this possibility a few months or years ago in bonding-devel
>ML. For as far as I remember, the idea was not developed because it was
>not easy to find the number of packets to send through the same
>slave. Anyway, this might help reduce out of order delivery.
Yes, this came up several years ago, and, basically, there's no
way to do it perfectly. An interesting experiment would be to see if
sending groups (perhaps close to the NAPI weight of the receiver) would
reduce reordering.
>> Barring evidence to the contrary, I presume that Oleg's system
>> delivers out of order at the receiver. That's not automatically a
>> reason to reject it, but this entire proposal is sufficiently complex to
>> configure that very explicit documentation will be necessary.
>
>Yes, and this is already true for some bonding modes and in particular for balance-rr.
I don't think any modes other than balance-rr will deliver out
of order normally. It can happen during edge cases, e.g., alb
rebalance, or the layer3+4 hash with IP fragments, but I'd expect those
to be at a much lower rate than what round robin causes.
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
^ permalink raw reply
* Re: [PATCH v4 08/10] ARM: mxs: add ocotp read function
From: Shawn Guo @ 2011-01-19 10:48 UTC (permalink / raw)
To: Sascha Hauer
Cc: gerg, B32542, netdev, bryan.wu, jamie, baruch, w.sang, r64343,
eric, Uwe Kleine-König, jamie, davem, linux-arm-kernel, lw
In-Reply-To: <20110114131655.GB27583@freescale.com>
Hi Sascha,
On Fri, Jan 14, 2011 at 09:16:56PM +0800, Shawn Guo wrote:
> On Fri, Jan 14, 2011 at 09:33:20AM +0100, Sascha Hauer wrote:
> > On Thu, Jan 13, 2011 at 04:19:39PM +0100, Uwe Kleine-König wrote:
> > > On Thu, Jan 06, 2011 at 03:13:16PM +0800, Shawn Guo wrote:
> > > > Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
> > > > ---
> > > > Changes for v4:
> > > > - Call cpu_relax() during polling
> > > >
> > > > Changes for v2:
> > > > - Add mutex locking for mxs_read_ocotp()
> > > > - Use type size_t for count and i
> > > > - Add comment for clk_enable/disable skipping
> > > > - Add ERROR bit clearing and polling step
> > > >
> > > > arch/arm/mach-mxs/Makefile | 2 +-
> > > > arch/arm/mach-mxs/include/mach/common.h | 1 +
> > > > arch/arm/mach-mxs/ocotp.c | 79 +++++++++++++++++++++++++++++++
> > > > 3 files changed, 81 insertions(+), 1 deletions(-)
> > > > create mode 100644 arch/arm/mach-mxs/ocotp.c
> > > >
> > > > diff --git a/arch/arm/mach-mxs/Makefile b/arch/arm/mach-mxs/Makefile
> > > > index 39d3f9c..f23ebbd 100644
> > > > --- a/arch/arm/mach-mxs/Makefile
> > > > +++ b/arch/arm/mach-mxs/Makefile
> > > > @@ -1,5 +1,5 @@
> > > > # Common support
> > > > -obj-y := clock.o devices.o gpio.o icoll.o iomux.o system.o timer.o
> > > > +obj-y := clock.o devices.o gpio.o icoll.o iomux.o ocotp.o system.o timer.o
> > > is it worth to make ocotp optional? (and let evk select
> > > CONFIG_MXS_OCOTP)
> >
> > I think not.
> >
> I thought it depends on how we think about it. If we think that
> ocotp is a block of soc and should be on every system based on
> the soc, we should build it unconditional. However, if we think
> ocotp is a function that could either be used on a machine or not,
> it should be built conditional, so that the machine does not use
> ocotp function could save the ocotp build and the memory of
> u32 ocotp_words[0x20].
>
You've not merged patch #8, #9, #10, so you still think
CONFIG_MXS_OCOTP should not be added?
--
Regards,
Shawn
^ permalink raw reply
* Re: [PATCH] bluetooth: Fix failure to release lock in read_index_list() when mem alloc fails.
From: Gustavo F. Padovan @ 2011-01-19 3:38 UTC (permalink / raw)
To: Jesper Juhl
Cc: linux-bluetooth, netdev, linux-kernel, David S. Miller,
Marcel Holtmann
In-Reply-To: <alpine.LNX.2.00.1101140000541.11347@swampdragon.chaosbits.net>
Hi Jesper,
* Jesper Juhl <jj@chaosbits.net> [2011-01-14 00:18:49 +0100]:
> If alloc_skb() fails in read_index_list() we'll return -ENOMEM without
> releasing 'hci_dev_list_lock'.
>
> Signed-off-by: Jesper Juhl <jj@chaosbits.net>
> ---
> mgmt.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
Patch has been applied. Thanks.
--
Gustavo F. Padovan
http://profusion.mobi
^ permalink raw reply
* Re: [PATCH] vhost: rcu annotation fixup
From: Michael S. Tsirkin @ 2011-01-19 5:18 UTC (permalink / raw)
To: Mel Gorman; +Cc: Jason Wang, kvm, virtualization, netdev, linux-kernel
In-Reply-To: <20110119004040.GE18984@csn.ul.ie>
On Wed, Jan 19, 2011 at 12:40:40AM +0000, Mel Gorman wrote:
> On Tue, Jan 18, 2011 at 01:08:45PM +0200, Michael S. Tsirkin wrote:
> > When built with rcu checks enabled, vhost triggers
> > bogus warnings as vhost features are read without
> > dev->mutex sometimes.
> > Fixing it properly is not trivial as vhost.h does not
> > know which lockdep classes it will be used under.
> > Disable the warning by stubbing out the check for now.
> >
>
> What is the harm in leaving the bogus warnings until the difficult fix
> happens? RCU checks enabled does not seem like something that is enabled
> in production.
I would like to run with rcu checks enabled sometimes to debug kvm,
which has an elaborate rcu strategy. Bogus warnings in the log
make it easy to overlook the real ones. Further, the rcu macros
used are a form of documentation. If we have
- rcu_dereference_index_check(dev->acked_features,
- lockdep_is_held(&dev->mutex));
this means 'taken in rcu read side critical section or under mutex',
+ acked_features = rcu_dereference_index_check(dev->acked_features, 1);
means 'not checked'.
> If this patch is applied, there is always the risk that
> it'll be simply forgotten about.
Well, that's why I put in a TODO.
If there's a demand for that, I can add a Kconfig option to
trigger a warning at each unchecked rcu call in vhost-net
but I doubt it'll get a lof of use :)
> --
> Mel Gorman
> Part-time Phd Student Linux Technology Center
> University of Limerick IBM Dublin Software Lab
^ permalink raw reply
* Re: [PATCH] af_unix: implement socket filter
From: David Miller @ 2011-01-19 5:33 UTC (permalink / raw)
To: ian.molton
Cc: netdev, linux-kernel, eric.dumazet, ebiederm, xemul, davidel,
alban.crequy
In-Reply-To: <1295368755-20931-1-git-send-email-ian.molton@collabora.co.uk>
From: Ian Molton <ian.molton@collabora.co.uk>
Date: Tue, 18 Jan 2011 16:39:15 +0000
> From: Alban Crequy <alban.crequy@collabora.co.uk>
>
> Linux Socket Filters can already be successfully attached and detached on unix
> sockets with setsockopt(sockfd, SOL_SOCKET, SO_{ATTACH,DETACH}_FILTER, ...).
> See: Documentation/networking/filter.txt
>
> But the filter was never used in the unix socket code so it did not work. This
> patch uses sk_filter() to filter buffers before delivery.
>
> This short program demonstrates the problem on SOCK_DGRAM.
...
> Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
> Reviewed-by: Ian Molton <ian.molton@collabora.co.uk>
Applied.
^ permalink raw reply
* Re: [PATCH] net: filter: dont block softirqs in sk_run_filter()
From: David Miller @ 2011-01-19 5:33 UTC (permalink / raw)
To: eric.dumazet
Cc: netdev, linux-kernel, ebiederm, xemul, davidel, alban.crequy,
ian.molton
In-Reply-To: <1295372812.3290.35.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 18 Jan 2011 18:46:52 +0100
> Packet filter (BPF) doesnt need to disable softirqs, being fully
> re-entrant and lock-less.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied.
^ permalink raw reply
* [PATCH] sctp: user perfect name for Delayed SACK Timer option
From: Shan Wei @ 2011-01-19 7:33 UTC (permalink / raw)
To: David Miller, Vlad Yasevich, 魏勇军,
Network-Maillist
The option name of Delayed SACK Timer should be SCTP_DELAYED_SACK,
not SCTP_DELAYED_ACK.
Reference:
8.1.19. Get or Set Delayed SACK Timer (SCTP_DELAYED_SACK)
(http://tools.ietf.org/html/draft-ietf-tsvwg-sctpsocket-25)
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
---
include/net/sctp/user.h | 2 +-
net/sctp/socket.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
index 2b2769c..74e10d6 100644
--- a/include/net/sctp/user.h
+++ b/include/net/sctp/user.h
@@ -77,7 +77,7 @@ typedef __s32 sctp_assoc_t;
#define SCTP_STATUS 14
#define SCTP_GET_PEER_ADDR_INFO 15
#define SCTP_DELAYED_ACK_TIME 16
-#define SCTP_DELAYED_ACK SCTP_DELAYED_ACK_TIME
+#define SCTP_DELAYED_SACK SCTP_DELAYED_ACK_TIME
#define SCTP_CONTEXT 17
#define SCTP_FRAGMENT_INTERLEAVE 18
#define SCTP_PARTIAL_DELIVERY_POINT 19 /* Set/Get partial delivery point */
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index a09b0dd..8e02550 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -3428,7 +3428,7 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname,
retval = sctp_setsockopt_peer_addr_params(sk, optval, optlen);
break;
- case SCTP_DELAYED_ACK:
+ case SCTP_DELAYED_SACK:
retval = sctp_setsockopt_delayed_ack(sk, optval, optlen);
break;
case SCTP_PARTIAL_DELIVERY_POINT:
@@ -5333,7 +5333,7 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
retval = sctp_getsockopt_peer_addr_params(sk, len, optval,
optlen);
break;
- case SCTP_DELAYED_ACK:
+ case SCTP_DELAYED_SACK:
retval = sctp_getsockopt_delayed_ack(sk, len, optval,
optlen);
break;
--
1.6.3.3
^ permalink raw reply related
* Re: [PATCH] sctp: user perfect name for Delayed SACK Timer option
From: David Miller @ 2011-01-19 7:39 UTC (permalink / raw)
To: shanwei; +Cc: vladislav.yasevich, yjwei, netdev, linux-sctp
In-Reply-To: <4D3693CA.40508@cn.fujitsu.com>
From: Shan Wei <shanwei@cn.fujitsu.com>
Date: Wed, 19 Jan 2011 15:33:30 +0800
> The option name of Delayed SACK Timer should be SCTP_DELAYED_SACK,
> not SCTP_DELAYED_ACK.
>
> Reference:
> 8.1.19. Get or Set Delayed SACK Timer (SCTP_DELAYED_SACK)
> (http://tools.ietf.org/html/draft-ietf-tsvwg-sctpsocket-25)
>
>
> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
You can't make this change, you'll break applications using the
existing definition.
^ permalink raw reply
* Re: [PATCH] sctp: user perfect name for Delayed SACK Timer option
From: Shan Wei @ 2011-01-19 8:17 UTC (permalink / raw)
To: David Miller; +Cc: vladislav.yasevich, yjwei, netdev, linux-sctp
In-Reply-To: <20110118.233932.91327830.davem@davemloft.net>
David Miller wrote, at 01/19/2011 03:39 PM:
> From: Shan Wei <shanwei@cn.fujitsu.com>
> Date: Wed, 19 Jan 2011 15:33:30 +0800
>
>> The option name of Delayed SACK Timer should be SCTP_DELAYED_SACK,
>> not SCTP_DELAYED_ACK.
>>
>> Reference:
>> 8.1.19. Get or Set Delayed SACK Timer (SCTP_DELAYED_SACK)
>> (http://tools.ietf.org/html/draft-ietf-tsvwg-sctpsocket-25)
>>
>>
>> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
>
> You can't make this change, you'll break applications using the
> existing definition.
No documents guide user to use SCTP_DELAYED_ACK option.
I double that there is no applications using this option.
If, there be. How about keeping this be concomitant with SCTP_DELAYED_SACK?
--
Best Regards
-----
Shan Wei
^ permalink raw reply
* Re: [PATCH] sctp: user perfect name for Delayed SACK Timer option
From: Wei Yongjun @ 2011-01-19 8:22 UTC (permalink / raw)
To: Shan Wei; +Cc: David Miller, vladislav.yasevich, netdev, linux-sctp
In-Reply-To: <4D369E03.2000109@cn.fujitsu.com>
> David Miller wrote, at 01/19/2011 03:39 PM:
>> From: Shan Wei <shanwei@cn.fujitsu.com>
>> Date: Wed, 19 Jan 2011 15:33:30 +0800
>>
>>> The option name of Delayed SACK Timer should be SCTP_DELAYED_SACK,
>>> not SCTP_DELAYED_ACK.
>>>
>>> Reference:
>>> 8.1.19. Get or Set Delayed SACK Timer (SCTP_DELAYED_SACK)
>>> (http://tools.ietf.org/html/draft-ietf-tsvwg-sctpsocket-25)
>>>
>>>
>>> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
>> You can't make this change, you'll break applications using the
>> existing definition.
> No documents guide user to use SCTP_DELAYED_ACK option.
> I double that there is no applications using this option.
>
> If, there be. How about keeping this be concomitant with SCTP_DELAYED_SACK?
You sould do as the same as the following patch, which has fixed the same
problem of lksctp-tools, but not lksctp:
http://git.kernel.org/?p=linux/kernel/git/vxy/lksctp-tools.git;a=commit;h=ecdc6c0d3c3e40843916427923a45a93d748c1c6
^ permalink raw reply
* [PATCH-v2] sctp: user perfect name for Delayed SACK Timer option
From: Shan Wei @ 2011-01-19 8:39 UTC (permalink / raw)
To: David Miller, Vlad Yasevich, 魏勇军,
Network-Maillist
In-Reply-To: <4D3693CA.40508@cn.fujitsu.com>
The option name of Delayed SACK Timer should be SCTP_DELAYED_SACK,
not SCTP_DELAYED_ACK.
Left SCTP_DELAYED_ACK be concomitant with SCTP_DELAYED_SACK,
for making compatibility with existing applications.
Reference:
8.1.19. Get or Set Delayed SACK Timer (SCTP_DELAYED_SACK)
(http://tools.ietf.org/html/draft-ietf-tsvwg-sctpsocket-25)
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
---
include/net/sctp/user.h | 1 +
net/sctp/socket.c | 4 ++--
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
index 2b2769c..92eedc0 100644
--- a/include/net/sctp/user.h
+++ b/include/net/sctp/user.h
@@ -78,6 +78,7 @@ typedef __s32 sctp_assoc_t;
#define SCTP_GET_PEER_ADDR_INFO 15
#define SCTP_DELAYED_ACK_TIME 16
#define SCTP_DELAYED_ACK SCTP_DELAYED_ACK_TIME
+#define SCTP_DELAYED_SACK SCTP_DELAYED_ACK_TIME
#define SCTP_CONTEXT 17
#define SCTP_FRAGMENT_INTERLEAVE 18
#define SCTP_PARTIAL_DELIVERY_POINT 19 /* Set/Get partial delivery point */
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index a09b0dd..8e02550 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -3428,7 +3428,7 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname,
retval = sctp_setsockopt_peer_addr_params(sk, optval, optlen);
break;
- case SCTP_DELAYED_ACK:
+ case SCTP_DELAYED_SACK:
retval = sctp_setsockopt_delayed_ack(sk, optval, optlen);
break;
case SCTP_PARTIAL_DELIVERY_POINT:
@@ -5333,7 +5333,7 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
retval = sctp_getsockopt_peer_addr_params(sk, len, optval,
optlen);
break;
- case SCTP_DELAYED_ACK:
+ case SCTP_DELAYED_SACK:
retval = sctp_getsockopt_delayed_ack(sk, len, optval,
optlen);
break;
--
1.6.3.3
^ permalink raw reply related
* Re: [PATCH-v2] sctp: user perfect name for Delayed SACK Timer option
From: Wei Yongjun @ 2011-01-19 8:48 UTC (permalink / raw)
To: Shan Wei; +Cc: David Miller, Vlad Yasevich, Network-Maillist, SCTP-Maillist
In-Reply-To: <4D36A324.4040307@cn.fujitsu.com>
> The option name of Delayed SACK Timer should be SCTP_DELAYED_SACK,
> not SCTP_DELAYED_ACK.
>
> Left SCTP_DELAYED_ACK be concomitant with SCTP_DELAYED_SACK,
> for making compatibility with existing applications.
SCTP_DELAYED_ACK is introduced by my typo when change it to
API compliance two years ago. So:
Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com>
^ permalink raw reply
* Re: [PATCH-v2] sctp: user perfect name for Delayed SACK Timer option
From: Shan Wei @ 2011-01-19 8:54 UTC (permalink / raw)
To: Wei Yongjun; +Cc: David Miller, Vlad Yasevich, Network-Maillist, SCTP-Maillist
In-Reply-To: <4D36A557.6080708@cn.fujitsu.com>
Wei Yongjun wrote, at 01/19/2011 04:48 PM:
>
>> The option name of Delayed SACK Timer should be SCTP_DELAYED_SACK,
>> not SCTP_DELAYED_ACK.
>>
>> Left SCTP_DELAYED_ACK be concomitant with SCTP_DELAYED_SACK,
>> for making compatibility with existing applications.
>
> SCTP_DELAYED_ACK is introduced by my typo when change it to
> API compliance two years ago. So:
>
> Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Thanks for your work.
--
Best Regards
-----
Shan Wei
^ permalink raw reply
* [PATCH] e1000: add support for Marvell Alaska M88E1118R PHY
From: Florian Fainelli @ 2011-01-19 9:09 UTC (permalink / raw)
To: netdev; +Cc: David Miller, Dirk Brandewie, Jeff Kirsher
From: Florian Fainelli <ffainelli@freebox.fr>
This patch adds support for Marvell Alask M88E188R PHY chips. Support for
other M88* PHYs is already there, so there is nothing more to add than its
PHY id.
Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
CC: Dirk Brandewie <dirk.j.brandewie@intel.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
diff --git a/drivers/net/e1000/e1000_hw.c b/drivers/net/e1000/e1000_hw.c
index aed223b..7501d97 100644
--- a/drivers/net/e1000/e1000_hw.c
+++ b/drivers/net/e1000/e1000_hw.c
@@ -124,6 +124,7 @@ static s32 e1000_set_phy_type(struct e1000_hw *hw)
case M88E1000_I_PHY_ID:
case M88E1011_I_PHY_ID:
case M88E1111_I_PHY_ID:
+ case M88E1118_E_PHY_ID:
hw->phy_type = e1000_phy_m88;
break;
case IGP01E1000_I_PHY_ID:
@@ -3222,7 +3223,8 @@ static s32 e1000_detect_gig_phy(struct e1000_hw *hw)
break;
case e1000_ce4100:
if ((hw->phy_id == RTL8211B_PHY_ID) ||
- (hw->phy_id == RTL8201N_PHY_ID))
+ (hw->phy_id == RTL8201N_PHY_ID) ||
+ (hw->phy_id == M88E1118_E_PHY_ID))
match = true;
break;
case e1000_82541:
diff --git a/drivers/net/e1000/e1000_hw.h b/drivers/net/e1000/e1000_hw.h
index 196eeda..c70b23d 100644
--- a/drivers/net/e1000/e1000_hw.h
+++ b/drivers/net/e1000/e1000_hw.h
@@ -2917,6 +2917,7 @@ struct e1000_host_command_info {
#define M88E1000_14_PHY_ID M88E1000_E_PHY_ID
#define M88E1011_I_REV_4 0x04
#define M88E1111_I_PHY_ID 0x01410CC0
+#define M88E1118_E_PHY_ID 0x01410E40
#define L1LXT971A_PHY_ID 0x001378E0
#define RTL8211B_PHY_ID 0x001CC910
--
1.7.1
^ permalink raw reply related
* Re: Flow Control and Port Mirroring Revisited
From: Simon Horman @ 2011-01-19 9:11 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Rick Jones, Jesse Gross, Eric Dumazet, Rusty Russell,
virtualization, dev, virtualization, netdev, kvm
In-Reply-To: <20110118201333.GD18760@redhat.com>
On Tue, Jan 18, 2011 at 10:13:33PM +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote:
> > >So it won't be all that simple to implement well, and before we try,
> > >I'd like to know whether there are applications that are helped
> > >by it. For example, we could try to measure latency at various
> > >pps and see whether the backpressure helps. netperf has -b, -w
> > >flags which might help these measurements.
> >
> > Those options are enabled when one adds --enable-burst to the
> > pre-compilation ./configure of netperf (one doesn't have to
> > recompile netserver). However, if one is also looking at latency
> > statistics via the -j option in the top-of-trunk, or simply at the
> > histogram with --enable-histogram on the ./configure and a verbosity
> > level of 2 (global -v 2) then one wants the very top of trunk
> > netperf from:
> >
> > http://www.netperf.org/svn/netperf2/trunk
> >
> > to get the recently added support for accurate (netperf level) RTT
> > measuremnts on burst-mode request/response tests.
> >
> > happy benchmarking,
> >
> > rick jones
Thanks Rick, that is really helpful.
> > PS - the enhanced latency statistics from -j are only available in
> > the "omni" version of the TCP_RR test. To get that add a
> > --enable-omni to the ./configure - and in this case both netperf and
> > netserver have to be recompiled.
>
>
> Is this TCP only? I would love to get latency data from UDP as well.
At a glance, -- -T UDP is what you are after.
^ permalink raw reply
* Re: [PATCH v2] net: add Faraday FTMAC100 10/100 Ethernet driver
From: Po-Yu Chuang @ 2011-01-19 9:20 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, linux-kernel, ratbert, bhutchings, joe, dilinger
In-Reply-To: <1295288462.3335.55.camel@edumazet-laptop>
Dear Eric,
On Tue, Jan 18, 2011 at 2:21 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 17 janvier 2011 à 17:21 +0800, Po-Yu Chuang a écrit :
>> + spin_lock_irqsave(&priv->tx_lock, flags);
>> + ftmac100_txdes_set_skb(txdes, skb);
>> + ftmac100_txdes_set_dma_addr(txdes, map);
>> +
>> + ftmac100_txdes_set_first_segment(txdes);
>> + ftmac100_txdes_set_last_segment(txdes);
>> + ftmac100_txdes_set_txint(txdes);
>> + ftmac100_txdes_set_buffer_size(txdes, len);
>
> I wonder if its not too expensive to read/modify/write txdes->txdes1
>
> Maybe you should use a temporary u32 var and perform one final write on
> txdes->txdes1 (with the set_dma_own)
That's OK, the compiler combines these bits updates together and write once
since txdes is not volatile. (I checked the object file)
It's not expensive.
BTW, DMA_OWN bit is at txdes0
>> + priv->tx_pending++;
>> + if (priv->tx_pending == TX_QUEUE_ENTRIES) {
>> + if (net_ratelimit())
>> + netdev_info(netdev, "tx queue full\n");
>> +
>> + netif_stop_queue(netdev);
>> + }
>> +
>> + /* start transmit */
>> + ftmac100_txdes_set_dma_own(txdes);
>
> txdes->txdes1 = txdes1;
>
> BTW, shouldnt you use cpu_to_be32() or cpu_to_le32(), if this driver is
> multi platform ?
OK, but I am not sure if I use them the right way.
Please check my v3 patch later.
Thanks,
Po-Yu Chuang
^ permalink raw reply
* Re: [PATCH v2] net: add Faraday FTMAC100 10/100 Ethernet driver
From: Po-Yu Chuang @ 2011-01-19 9:40 UTC (permalink / raw)
To: Joe Perches
Cc: netdev, linux-kernel, ratbert, bhutchings, eric.dumazet, dilinger
In-Reply-To: <1295284788.21277.65.camel@Joe-Laptop>
Dear Joe,
On Tue, Jan 18, 2011 at 1:19 AM, Joe Perches <joe@perches.com> wrote:
> On Mon, 2011-01-17 at 17:21 +0800, Po-Yu Chuang wrote:
>
>> + * priveate data
>
> private
Fixed.
>> +static void ftmac100_enable_all_int(struct ftmac100 *priv)
>> +{
>> + unsigned int imr;
>> +
>> + imr = FTMAC100_INT_RPKT_FINISH | FTMAC100_INT_NORXBUF
>> + | FTMAC100_INT_XPKT_OK | FTMAC100_INT_XPKT_LOST
>> + | FTMAC100_INT_RPKT_LOST | FTMAC100_INT_AHB_ERR
>> + | FTMAC100_INT_PHYSTS_CHG;
>
> This could be a #define.
OK, done.
>> + maccr = FTMAC100_MACCR_XMT_EN |
>> + FTMAC100_MACCR_RCV_EN |
>> + FTMAC100_MACCR_XDMA_EN |
>> + FTMAC100_MACCR_RDMA_EN |
>> + FTMAC100_MACCR_CRC_APD |
>> + FTMAC100_MACCR_FULLDUP |
>> + FTMAC100_MACCR_RX_RUNT |
>> + FTMAC100_MACCR_RX_BROADPKT;
>
> Here too.
OK, done.
>> +static int ftmac100_rx_packet_error(struct ftmac100 *priv,
>> + struct ftmac100_rxdes *rxdes)
> []
>> + if (unlikely(ftmac100_rxdes_frame_too_long(rxdes))) {
>> + if (net_ratelimit())
>> + netdev_info(netdev, "rx frame too long\n");
>> +
>> + netdev->stats.rx_length_errors++;
>> + error = 1;
>> + }
>> +
>> + if (unlikely(ftmac100_rxdes_runt(rxdes))) {
>
> else if ?
OK, fixed.
>> +static int ftmac100_rx_packet(struct ftmac100 *priv, int *processed)
>> +{
>> + struct net_device *netdev = priv->netdev;
>> + struct ftmac100_rxdes *rxdes;
>> + struct sk_buff *skb;
>> + int length;
>> + int copied = 0;
>> + int done = 0;
>
> You could use bool/true/false here for copied and done
> and all the other uses of an int for a logical bool.
OK, fixed.
>> +static void ftmac100_txdes_set_dma_own(struct ftmac100_txdes *txdes)
>> +{
>> + /*
>> + * Make sure dma own bit will not be set before any other
>> + * descriptor fiels.
>
> field/fields
Fixed.
>> +static int ftmac100_mdio_read(struct net_device *netdev, int phy_id, int reg)
>> +{
>> + struct ftmac100 *priv = netdev_priv(netdev);
>> + int phycr;
>> + int i;
>> +
>> + phycr = FTMAC100_PHYCR_PHYAD(phy_id) |
>> + FTMAC100_PHYCR_REGAD(reg) |
>> + FTMAC100_PHYCR_MIIRD;
>> +
>> + iowrite32(phycr, priv->base + FTMAC100_OFFSET_PHYCR);
>> + for (i = 0; i < 10; i++) {
>> + phycr = ioread32(priv->base + FTMAC100_OFFSET_PHYCR);
>> +
>> + if ((phycr & FTMAC100_PHYCR_MIIRD) == 0)
>> + return phycr & FTMAC100_PHYCR_MIIRDATA;
>> +
>> + usleep_range(100, 1000);
>> + }
>> +
>> + netdev_err(netdev, "mdio read timed out\n");
>> + return 0xffff;
>
> 0xffff is a rather odd return, perhaps a #define?
After a little digging in drivers/net/mii.c, it seems that mii lib does not
check return value if it is error. So I guess I should return 0 if error.
>> +/******************************************************************************
>> + * initialization / finalization
>> + *****************************************************************************/
>> +static int __init ftmac100_init(void)
>> +{
>> + printk(KERN_INFO "Loading " DRV_NAME ": version " DRV_VERSION " ...\n");
>
> You could use
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> before any #include and
> pr_info("Loading version " DRV_VERSION " ...\n");
OK
> One last comment on split long line indentation style
> and long function declarations.
>
> There's no required style so you can use what you are
> most comfortable doing.
>
> Most of drivers/net uses an alignment to open parenthesis
> using maximal tabs and minimal necessary spaces instead of
> an extra tabstop.
>
> Like:
>
> static int some_long_function(type var1, type var2...
> type varN)
> and
> some_long_function(var1, var2, ...
> varN);
>
> not
> static int some_long_function(type var1, type var2...
> type varN)
> and
> some_long_function(var1, var2, ...
> varN);
Well, TBH, I don't like this style because if I changed the
function name, the indentation might need to be adjusted.
Even worse, I got an infeasible case :-(
static struct ftmac100_rxdes *ftmac100_rx_locate_first_segment(
struct ftmac100 *priv)
I know my function names are quite long, but I like them to be descriptive.
Do you really insist on it?
Thanks,
Po-Yu Chuang
^ permalink raw reply
* Re: tool to send unsolicited neighbour advertisements?
From: Martin Volf @ 2011-01-19 9:42 UTC (permalink / raw)
To: Chris Friesen; +Cc: netdev, Linux Kernel Mailing List
In-Reply-To: <4D35FB33.1080305@genband.com>
On 18 January 2011 21:42, Chris Friesen <chris.friesen@genband.com> wrote:
> We're transitioning stuff to IPv6 and I've been trying (without much
> luck) to find a standard tool for sending out unsolicited neighbour
> advertisements for failover purposes.
> Is there such a thing? In ipv4 arping works fine.
Hello,
probably http://www.remlab.net/ndisc6/
--
Martin Volf
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox