* [rfc] add io barriers, remove mmiowb
@ 2008-05-21 15:28 Nick Piggin
2008-05-22 8:34 ` Jes Sorensen
0 siblings, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2008-05-21 15:28 UTC (permalink / raw)
To: linux-arch, jbarnes, Linus Torvalds
With mmiowb, ia64 introduced a completely new type of barrier and subtly
redefined existing barriers, unfortunately. I still hold a faint hope of
fixing this up some day. This is how I'm proposing to do it.
Traditionally, mb() is thought of as "provide a total ordering of all
RAM and device memory accesses at this point", and wmb() as "provide a
total ordering of all RAM and device memory stores at this point".
However, on sn2, this is rather expensive because that platform can
reorder stores to device memory at the chipset, so a mb or wmb would
have to interact with chipset registers. Apparently this penalty was
deemed too high, because mb and wmb were weakened, and mmiowb was
introduced to solve the problem.
mb and wmb are now no longer guaranteed to order system memory operations
with device memory stores. mmiowb has been introduced to provide this
ordering (when combined with a mb, wmb, or spin_unlock). Unfortunately,
it appears to be rather less well understood among both users and
implementors than even the old memory barrier scheme. It also subtly
breaks existing code that uses mb or wmb (if only on sn2). I really think
it is not a good solution.
The alternative I propose is to restore mb and wmb to their full strength.
This does mean that sn2 has to do the equivalent of mb+mmiowb, wmb+mmiowb
respectively, but that's the price you pay for weak memory ordering.
A large number of callers of wmb actually only appear to want io/io ordering,
so if we introduce io_ barrier variants, then we can actually avoid wmb
in many cases, and the remainder of them genuinely require io/ram ordering,
so sn2 really needs mmiowb there anyway.
io_*mb variants can be optimised on other platforms too, which is a plus.
This patch sketches an implementation for x86, powerpc, ia64, and attempts
to convert all mmiowb() in drivers into their appropriate wmb/io_wmb barrier.
---
Index: linux-2.6/include/asm-ia64/system.h
===================================================================
--- linux-2.6.orig/include/asm-ia64/system.h
+++ linux-2.6/include/asm-ia64/system.h
@@ -82,15 +82,19 @@ extern struct ia64_boot_param {
* it's (presumably) much slower than mf and (b) mf.a is supported for
* sequential memory pages only.
*/
-#define mb() ia64_mf()
-#define rmb() mb()
+#define mb() do { ia64_mf(); mmiowb(); } while (0)
+#define rmb() ia64_mf()
#define wmb() mb()
#define read_barrier_depends() do { } while(0)
+#define io_mb() ia64_mf()
+#define io_rmb() io_mb()
+#define io_wmb() io_mb()
+
#ifdef CONFIG_SMP
-# define smp_mb() mb()
-# define smp_rmb() rmb()
-# define smp_wmb() wmb()
+# define smp_mb() ia64_mf()
+# define smp_rmb() smp_mb()
+# define smp_wmb() smp_mb()
# define smp_read_barrier_depends() read_barrier_depends()
#else
# define smp_mb() barrier()
Index: linux-2.6/include/asm-powerpc/system.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/system.h
+++ linux-2.6/include/asm-powerpc/system.h
@@ -38,6 +38,10 @@
#define wmb() __asm__ __volatile__ ("sync" : : : "memory")
#define read_barrier_depends() do { } while(0)
+#define io_mb() __asm__ __volatile__ ("eieio" : : : "memory")
+#define io_rmb() __asm__ __volatile__ ("eieio" : : : "memory")
+#define io_wmb() barrier()
+
#define set_mb(var, value) do { var = value; mb(); } while (0)
#ifdef __KERNEL__
Index: linux-2.6/include/asm-x86/system.h
===================================================================
--- linux-2.6.orig/include/asm-x86/system.h
+++ linux-2.6/include/asm-x86/system.h
@@ -330,6 +330,10 @@ void default_idle(void);
#define wmb() asm volatile("sfence" ::: "memory")
#endif
+#define io_mb() mb()
+#define io_rmb() rmb()
+#define io_wmb() wmb()
+
/**
* read_barrier_depends - Flush all pending reads that subsequents reads
* depend on.
Index: linux-2.6/drivers/net/wireless/b43/main.c
===================================================================
--- linux-2.6.orig/drivers/net/wireless/b43/main.c
+++ linux-2.6/drivers/net/wireless/b43/main.c
@@ -357,7 +357,7 @@ static void b43_ram_write(struct b43_wld
val = swab32(val);
b43_write32(dev, B43_MMIO_RAM_CONTROL, offset);
- mmiowb();
+ io_wmb();
b43_write32(dev, B43_MMIO_RAM_DATA, val);
}
@@ -583,9 +583,9 @@ static void b43_tsf_write_locked(struct
u32 hi = (tsf & 0xFFFFFFFF00000000ULL) >> 32;
b43_write32(dev, B43_MMIO_REV3PLUS_TSF_LOW, 0);
- mmiowb();
+ io_wmb();
b43_write32(dev, B43_MMIO_REV3PLUS_TSF_HIGH, hi);
- mmiowb();
+ io_wmb();
b43_write32(dev, B43_MMIO_REV3PLUS_TSF_LOW, lo);
} else {
u16 v0 = (tsf & 0x000000000000FFFFULL);
@@ -594,13 +594,13 @@ static void b43_tsf_write_locked(struct
u16 v3 = (tsf & 0xFFFF000000000000ULL) >> 48;
b43_write16(dev, B43_MMIO_TSF_0, 0);
- mmiowb();
+ io_wmb();
b43_write16(dev, B43_MMIO_TSF_3, v3);
- mmiowb();
+ io_wmb();
b43_write16(dev, B43_MMIO_TSF_2, v2);
- mmiowb();
+ io_wmb();
b43_write16(dev, B43_MMIO_TSF_1, v1);
- mmiowb();
+ io_wmb();
b43_write16(dev, B43_MMIO_TSF_0, v0);
}
}
@@ -1605,7 +1605,7 @@ static void b43_beacon_update_trigger_wo
/* The handler might have updated the IRQ mask. */
b43_write32(dev, B43_MMIO_GEN_IRQ_MASK,
dev->irq_savedstate);
- mmiowb();
+ wmb();
spin_unlock_irq(&wl->irq_lock);
}
mutex_unlock(&wl->mutex);
@@ -1713,7 +1713,7 @@ static void b43_interrupt_tasklet(struct
dma_reason[2], dma_reason[3],
dma_reason[4], dma_reason[5]);
b43_controller_restart(dev, "DMA error");
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&dev->wl->irq_lock, flags);
return;
}
@@ -1759,7 +1759,7 @@ static void b43_interrupt_tasklet(struct
handle_irq_transmit_status(dev);
b43_interrupt_enable(dev, dev->irq_savedstate);
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&dev->wl->irq_lock, flags);
}
@@ -1817,7 +1817,7 @@ static irqreturn_t b43_interrupt_handler
dev->irq_reason = reason;
tasklet_schedule(&dev->isr_tasklet);
out:
- mmiowb();
+ wmb();
spin_unlock(&dev->wl->irq_lock);
return ret;
@@ -3295,7 +3295,7 @@ static int b43_op_config(struct ieee8021
spin_lock_irqsave(&wl->irq_lock, flags);
b43_interrupt_enable(dev, savedirqs);
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&wl->irq_lock, flags);
out_unlock_mutex:
mutex_unlock(&wl->mutex);
Index: linux-2.6/drivers/net/wireless/b43/phy.c
===================================================================
--- linux-2.6.orig/drivers/net/wireless/b43/phy.c
+++ linux-2.6/drivers/net/wireless/b43/phy.c
@@ -2428,7 +2428,7 @@ u8 b43_radio_aci_scan(struct b43_wldev *
void b43_nrssi_hw_write(struct b43_wldev *dev, u16 offset, s16 val)
{
b43_phy_write(dev, B43_PHY_NRSSILT_CTRL, offset);
- mmiowb();
+ io_wmb();
b43_phy_write(dev, B43_PHY_NRSSILT_DATA, (u16) val);
}
Index: linux-2.6/drivers/net/wireless/b43/sysfs.c
===================================================================
--- linux-2.6.orig/drivers/net/wireless/b43/sysfs.c
+++ linux-2.6/drivers/net/wireless/b43/sysfs.c
@@ -122,7 +122,7 @@ static ssize_t b43_attr_interfmode_store
b43err(wldev->wl, "Interference Mitigation not "
"supported by device\n");
}
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&wldev->wl->irq_lock, flags);
mutex_unlock(&wldev->wl->mutex);
Index: linux-2.6/drivers/infiniband/hw/ipath/ipath_iba6120.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/ipath/ipath_iba6120.c
+++ linux-2.6/drivers/infiniband/hw/ipath/ipath_iba6120.c
@@ -1351,7 +1351,7 @@ static void ipath_pe_put_tid(struct ipat
ipath_write_kreg(dd, dd->ipath_kregs->kr_scratch, 0xfeeddeaf);
writel(pa, tidp32);
ipath_write_kreg(dd, dd->ipath_kregs->kr_scratch, 0xdeadbeef);
- mmiowb();
+ wmb();
spin_unlock_irqrestore(tidlockp, flags);
}
@@ -1395,7 +1395,7 @@ static void ipath_pe_put_tid_2(struct ip
}
tidx = tidptr - dd->ipath_egrtidbase;
writel(pa, tidp32);
- mmiowb();
+ wmb();
}
Index: linux-2.6/drivers/infiniband/hw/ipath/ipath_iba7220.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/ipath/ipath_iba7220.c
+++ linux-2.6/drivers/infiniband/hw/ipath/ipath_iba7220.c
@@ -1733,7 +1733,7 @@ static void ipath_7220_put_tid(struct ip
writeq(chippa, tidptr);
} else
writeq(pa, tidptr);
- mmiowb();
+ wmb();
}
/**
Index: linux-2.6/drivers/infiniband/hw/ipath/ipath_sd7220.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/ipath/ipath_sd7220.c
+++ linux-2.6/drivers/infiniband/hw/ipath/ipath_sd7220.c
@@ -1036,14 +1036,14 @@ static int ipath_sd_setvals(struct ipath
for (idx = 0; idx < NUM_DDS_REGS; ++idx) {
data = ((dds_reg_map & 0xF) << 4) | TX_FAST_ELT;
writeq(data, iaddr + idx);
- mmiowb();
+ wmb();
ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
dds_reg_map >>= 4;
for (midx = 0; midx < DDS_ROWS; ++midx) {
u64 __iomem *daddr = taddr + ((midx << 4) + idx);
data = dds_init_vals[midx].reg_vals[idx];
writeq(data, daddr);
- mmiowb();
+ wmb();
ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
} /* End inner for (vals for this reg, each row) */
} /* end outer for (regs to be stored) */
@@ -1065,13 +1065,13 @@ static int ipath_sd_setvals(struct ipath
didx = idx + min_idx;
/* Store the next RXEQ register address */
writeq(rxeq_init_vals[idx].rdesc, iaddr + didx);
- mmiowb();
+ wmb();
ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
/* Iterate through RXEQ values */
for (vidx = 0; vidx < 4; vidx++) {
data = rxeq_init_vals[idx].rdata[vidx];
writeq(data, taddr + (vidx << 6) + idx);
- mmiowb();
+ wmb();
ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
}
} /* end outer for (Reg-writes for RXEQ) */
Index: linux-2.6/drivers/infiniband/hw/mlx4/qp.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mlx4/qp.c
+++ linux-2.6/drivers/infiniband/hw/mlx4/qp.c
@@ -1638,7 +1638,7 @@ out:
* Make sure doorbells don't leak out of SQ spinlock
* and reach the HCA out of order.
*/
- mmiowb();
+ wmb();
stamp_send_wqe(qp, stamp, size * 16);
Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_cmd.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -294,7 +294,7 @@ static int mthca_cmd_post(struct mthca_d
* Make sure that our HCR writes don't get mixed in with
* writes from another CPU starting a FW command.
*/
- mmiowb();
+ wmb();
mutex_unlock(&dev->cmd.hcr_mutex);
return err;
Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_cq.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_cq.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_cq.c
@@ -216,7 +216,7 @@ static inline void update_cons_index(str
* Make sure doorbells don't leak out of CQ spinlock
* and reach the HCA out of order:
*/
- mmiowb();
+ wmb();
}
}
Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_qp.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_qp.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_qp.c
@@ -1825,7 +1825,7 @@ out:
* Make sure doorbells don't leak out of SQ spinlock
* and reach the HCA out of order:
*/
- mmiowb();
+ wmb();
}
qp->sq.next_ind = ind;
@@ -1940,7 +1940,7 @@ out:
* Make sure doorbells don't leak out of RQ spinlock and reach
* the HCA out of order:
*/
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&qp->rq.lock, flags);
return err;
@@ -2180,7 +2180,7 @@ out:
* Make sure doorbells don't leak out of SQ spinlock and reach
* the HCA out of order:
*/
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&qp->sq.lock, flags);
return err;
Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_srq.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_srq.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_srq.c
@@ -587,7 +587,7 @@ int mthca_tavor_post_srq_recv(struct ib_
* Make sure doorbells don't leak out of SRQ spinlock and
* reach the HCA out of order:
*/
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&srq->lock, flags);
return err;
Index: linux-2.6/drivers/memstick/host/jmb38x_ms.c
===================================================================
--- linux-2.6.orig/drivers/memstick/host/jmb38x_ms.c
+++ linux-2.6/drivers/memstick/host/jmb38x_ms.c
@@ -622,7 +622,7 @@ static void jmb38x_ms_reset(struct jmb38
}
writel(HOST_CONTROL_RESET, host->addr + HOST_CONTROL);
- mmiowb();
+ io_wmb();
writel(INT_STATUS_ALL, host->addr + INT_SIGNAL_ENABLE);
writel(INT_STATUS_ALL, host->addr + INT_STATUS_ENABLE);
}
@@ -899,7 +899,7 @@ static void jmb38x_ms_remove(struct pci_
writel(0, host->addr + INT_SIGNAL_ENABLE);
writel(0, host->addr + INT_STATUS_ENABLE);
- mmiowb();
+ wmb();
dev_dbg(&jm->pdev->dev, "interrupts off\n");
spin_lock_irqsave(&host->lock, flags);
if (host->req) {
Index: linux-2.6/drivers/misc/ioc4.c
===================================================================
--- linux-2.6.orig/drivers/misc/ioc4.c
+++ linux-2.6/drivers/misc/ioc4.c
@@ -156,7 +156,7 @@ ioc4_clock_calibrate(struct ioc4_driver_
/* Reset to power-on state */
writel(0, &idd->idd_misc_regs->int_out.raw);
- mmiowb();
+ io_wmb();
/* Set up square wave */
int_out.raw = 0;
@@ -164,7 +164,7 @@ ioc4_clock_calibrate(struct ioc4_driver_
int_out.fields.mode = IOC4_INT_OUT_MODE_TOGGLE;
int_out.fields.diag = 0;
writel(int_out.raw, &idd->idd_misc_regs->int_out.raw);
- mmiowb();
+ io_wmb();
/* Check square wave period averaged over some number of cycles */
do {
Index: linux-2.6/drivers/misc/tifm_7xx1.c
===================================================================
--- linux-2.6.orig/drivers/misc/tifm_7xx1.c
+++ linux-2.6/drivers/misc/tifm_7xx1.c
@@ -397,7 +397,7 @@ static void tifm_7xx1_remove(struct pci_
fm->eject = tifm_7xx1_dummy_eject;
fm->has_ms_pif = tifm_7xx1_dummy_has_ms_pif;
writel(TIFM_IRQ_SETALL, fm->addr + FM_CLEAR_INTERRUPT_ENABLE);
- mmiowb();
+ wmb();
free_irq(dev->irq, fm);
tifm_remove_adapter(fm);
Index: linux-2.6/drivers/mmc/host/sdhci.c
===================================================================
--- linux-2.6.orig/drivers/mmc/host/sdhci.c
+++ linux-2.6/drivers/mmc/host/sdhci.c
@@ -809,7 +809,7 @@ static void sdhci_request(struct mmc_hos
} else
sdhci_send_command(host, mrq->cmd);
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&host->lock, flags);
}
@@ -861,7 +861,7 @@ static void sdhci_set_ios(struct mmc_hos
if(host->chip->quirks & SDHCI_QUIRK_RESET_CMD_DATA_ON_IOS)
sdhci_reset(host, SDHCI_RESET_CMD | SDHCI_RESET_DATA);
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&host->lock, flags);
}
@@ -901,7 +901,7 @@ static void sdhci_enable_sdio_irq(struct
writel(ier, host->ioaddr + SDHCI_INT_ENABLE);
writel(ier, host->ioaddr + SDHCI_SIGNAL_ENABLE);
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&host->lock, flags);
}
@@ -995,7 +995,7 @@ static void sdhci_tasklet_finish(unsigne
sdhci_deactivate_led(host);
#endif
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&host->lock, flags);
mmc_request_done(host->mmc, mrq);
@@ -1028,7 +1028,7 @@ static void sdhci_timeout_timer(unsigned
}
}
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&host->lock, flags);
}
@@ -1183,7 +1183,7 @@ static irqreturn_t sdhci_irq(int irq, vo
result = IRQ_HANDLED;
- mmiowb();
+ wmb();
out:
spin_unlock(&host->lock);
@@ -1269,7 +1269,7 @@ static int sdhci_resume (struct pci_dev
if (ret)
return ret;
sdhci_init(chip->hosts[i]);
- mmiowb();
+ wmb();
ret = mmc_resume_host(chip->hosts[i]->mmc);
if (ret)
return ret;
@@ -1528,7 +1528,7 @@ static int __devinit sdhci_probe_slot(st
goto reset;
#endif
- mmiowb();
+ wmb();
mmc_add_host(mmc);
Index: linux-2.6/drivers/mmc/host/tifm_sd.c
===================================================================
--- linux-2.6.orig/drivers/mmc/host/tifm_sd.c
+++ linux-2.6/drivers/mmc/host/tifm_sd.c
@@ -888,7 +888,7 @@ static int tifm_sd_initialize_host(struc
struct tifm_dev *sock = host->dev;
writel(0, sock->addr + SOCK_MMCSD_INT_ENABLE);
- mmiowb();
+ io_wmb();
host->clk_div = 61;
host->clk_freq = 20000000;
writel(TIFM_MMCSD_RESET, sock->addr + SOCK_MMCSD_SYSTEM_CONTROL);
@@ -940,7 +940,7 @@ static int tifm_sd_initialize_host(struc
writel(TIFM_MMCSD_CERR | TIFM_MMCSD_BRS | TIFM_MMCSD_EOC
| TIFM_MMCSD_ERRMASK,
sock->addr + SOCK_MMCSD_INT_ENABLE);
- mmiowb();
+ wmb();
return 0;
}
@@ -1006,7 +1006,7 @@ static void tifm_sd_remove(struct tifm_d
spin_lock_irqsave(&sock->lock, flags);
host->eject = 1;
writel(0, sock->addr + SOCK_MMCSD_INT_ENABLE);
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&sock->lock, flags);
tasklet_kill(&host->finish_tasklet);
Index: linux-2.6/drivers/net/bnx2.c
===================================================================
--- linux-2.6.orig/drivers/net/bnx2.c
+++ linux-2.6/drivers/net/bnx2.c
@@ -2880,7 +2880,7 @@ next_rx:
REG_WR(bp, MB_RX_CID_ADDR + BNX2_L2CTX_HOST_BSEQ, bnapi->rx_prod_bseq);
- mmiowb();
+ wmb();
return rx_pkt;
@@ -5886,7 +5886,7 @@ bnx2_start_xmit(struct sk_buff *skb, str
REG_WR16(bp, bp->tx_bidx_addr, prod);
REG_WR(bp, bp->tx_bseq_addr, bp->tx_prod_bseq);
- mmiowb();
+ wmb();
bp->tx_prod = prod;
dev->trans_start = jiffies;
Index: linux-2.6/drivers/net/bnx2x.c
===================================================================
--- linux-2.6.orig/drivers/net/bnx2x.c
+++ linux-2.6/drivers/net/bnx2x.c
@@ -1046,7 +1046,7 @@ next_cqe:
REG_WR(bp, BAR_TSTRORM_INTMEM +
TSTORM_RCQ_PROD_OFFSET(bp->port, fp->index), sw_comp_prod);
- mmiowb(); /* keep prod updates ordered */
+ wmb(); /* keep prod updates ordered */
fp->rx_pkt += rx_pkt;
fp->rx_calls++;
@@ -9413,7 +9413,7 @@ static int bnx2x_start_xmit(struct sk_bu
cpu_to_le32(le32_to_cpu(fp->hw_tx_prods->packets_prod) + 1);
DOORBELL(bp, fp_index, 0);
- mmiowb();
+ wmb();
fp->tx_bd_prod = bd_prod;
dev->trans_start = jiffies;
Index: linux-2.6/drivers/net/e1000/e1000_main.c
===================================================================
--- linux-2.6.orig/drivers/net/e1000/e1000_main.c
+++ linux-2.6/drivers/net/e1000/e1000_main.c
@@ -3222,8 +3222,8 @@ e1000_tx_queue(struct e1000_adapter *ada
tx_ring->next_to_use = i;
writel(i, adapter->hw.hw_addr + tx_ring->tdt);
/* we need this if more than one processor can write to our tail
- * at a time, it syncronizes IO on IA64/Altix systems */
- mmiowb();
+ * at a time */
+ wmb();
}
/**
Index: linux-2.6/drivers/net/e1000e/netdev.c
===================================================================
--- linux-2.6.orig/drivers/net/e1000e/netdev.c
+++ linux-2.6/drivers/net/e1000e/netdev.c
@@ -3467,9 +3467,9 @@ static void e1000_tx_queue(struct e1000_
writel(i, adapter->hw.hw_addr + tx_ring->tail);
/*
* we need this if more than one processor can write to our tail
- * at a time, it synchronizes IO on IA64/Altix systems
+ * at a time.
*/
- mmiowb();
+ wmb();
}
#define MINIMUM_DHCP_PACKET_SIZE 282
Index: linux-2.6/drivers/net/igb/igb_main.c
===================================================================
--- linux-2.6.orig/drivers/net/igb/igb_main.c
+++ linux-2.6/drivers/net/igb/igb_main.c
@@ -2637,8 +2637,8 @@ static inline void igb_tx_queue_adv(stru
tx_ring->next_to_use = i;
writel(i, adapter->hw.hw_addr + tx_ring->tail);
/* we need this if more than one processor can write to our tail
- * at a time, it syncronizes IO on IA64/Altix systems */
- mmiowb();
+ * at a time */
+ wmb();
}
static int __igb_maybe_stop_tx(struct net_device *netdev,
Index: linux-2.6/drivers/net/ipg.c
===================================================================
--- linux-2.6.orig/drivers/net/ipg.c
+++ linux-2.6/drivers/net/ipg.c
@@ -1965,7 +1965,7 @@ static int ipg_nic_hard_start_xmit(struc
sp->tx_current++;
- mmiowb();
+ wmb();
ipg_w32(IPG_DC_TX_DMA_POLL_NOW, DMA_CTRL);
Index: linux-2.6/drivers/net/mlx4/cmd.c
===================================================================
--- linux-2.6.orig/drivers/net/mlx4/cmd.c
+++ linux-2.6/drivers/net/mlx4/cmd.c
@@ -190,7 +190,7 @@ static int mlx4_cmd_post(struct mlx4_dev
* Make sure that our HCR writes don't get mixed in with
* writes from another CPU starting a FW command.
*/
- mmiowb();
+ wmb();
cmd->toggle = cmd->toggle ^ 1;
Index: linux-2.6/drivers/net/pcnet32.c
===================================================================
--- linux-2.6.orig/drivers/net/pcnet32.c
+++ linux-2.6/drivers/net/pcnet32.c
@@ -1407,7 +1407,7 @@ static int pcnet32_poll(struct napi_stru
/* Set interrupt enable. */
lp->a.write_csr(ioaddr, CSR0, CSR0_INTEN);
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&lp->lock, flags);
}
return work_done;
@@ -2594,7 +2594,7 @@ pcnet32_interrupt(int irq, void *dev_id)
val = lp->a.read_csr(ioaddr, CSR3);
val |= 0x5f00;
lp->a.write_csr(ioaddr, CSR3, val);
- mmiowb();
+ wmb();
__netif_rx_schedule(dev, &lp->napi);
break;
}
Index: linux-2.6/drivers/net/s2io.c
===================================================================
--- linux-2.6.orig/drivers/net/s2io.c
+++ linux-2.6/drivers/net/s2io.c
@@ -4285,7 +4285,7 @@ static int s2io_xmit(struct sk_buff *skb
writeq(val64, &tx_fifo->List_Control);
- mmiowb();
+ wmb();
put_off++;
if (put_off == fifo->tx_curr_put_info.fifo_len + 1)
Index: linux-2.6/drivers/net/sc92031.c
===================================================================
--- linux-2.6.orig/drivers/net/sc92031.c
+++ linux-2.6/drivers/net/sc92031.c
@@ -261,7 +261,7 @@ enum PMConfigBits {
* use of mdelay() at _sc92031_reset.
* Functions prefixed with _sc92031_ must be called with the lock held;
* functions prefixed with sc92031_ must be called without the lock held.
- * Use mmiowb() before unlocking if the hardware was written to.
+ * Use wmb() before unlocking if the hardware was written to.
*/
/* Locking rules for the interrupt:
@@ -371,7 +371,7 @@ static void sc92031_disable_interrupts(s
/* stop interrupts */
iowrite32(0, port_base + IntrMask);
_sc92031_dummy_read(port_base);
- mmiowb();
+ wmb();
/* wait for any concurrent interrupt/tasklet to finish */
synchronize_irq(dev->irq);
@@ -389,7 +389,7 @@ static void sc92031_enable_interrupts(st
wmb();
iowrite32(IntrBits, port_base + IntrMask);
- mmiowb();
+ wmb();
}
static void _sc92031_disable_tx_rx(struct net_device *dev)
@@ -880,7 +880,7 @@ out:
rmb();
iowrite32(intr_mask, port_base + IntrMask);
- mmiowb();
+ wmb();
spin_unlock(&priv->lock);
}
@@ -914,7 +914,7 @@ out_none:
rmb();
iowrite32(intr_mask, port_base + IntrMask);
- mmiowb();
+ wmb();
return IRQ_NONE;
}
@@ -988,7 +988,7 @@ static int sc92031_start_xmit(struct sk_
iowrite32(priv->tx_bufs_dma_addr + entry * TX_BUF_SIZE,
port_base + TxAddr0 + entry * 4);
iowrite32(tx_status, port_base + TxStatus0 + entry * 4);
- mmiowb();
+ wmb();
dev->trans_start = jiffies;
@@ -1036,7 +1036,7 @@ static int sc92031_open(struct net_devic
spin_lock_bh(&priv->lock);
_sc92031_reset(dev);
- mmiowb();
+ wmb();
spin_unlock_bh(&priv->lock);
sc92031_enable_interrupts(dev);
@@ -1072,7 +1072,7 @@ static int sc92031_stop(struct net_devic
_sc92031_disable_tx_rx(dev);
_sc92031_tx_clear(dev);
- mmiowb();
+ wmb();
spin_unlock_bh(&priv->lock);
@@ -1093,7 +1093,7 @@ static void sc92031_set_multicast_list(s
_sc92031_set_mar(dev);
_sc92031_set_rx_config(dev);
- mmiowb();
+ wmb();
spin_unlock_bh(&priv->lock);
}
@@ -1110,7 +1110,7 @@ static void sc92031_tx_timeout(struct ne
priv->tx_timeouts++;
_sc92031_reset(dev);
- mmiowb();
+ wmb();
spin_unlock(&priv->lock);
@@ -1147,7 +1147,7 @@ static int sc92031_ethtool_get_settings(
output_status = _sc92031_mii_read(port_base, MII_OutputStatus);
_sc92031_mii_scan(port_base);
- mmiowb();
+ wmb();
spin_unlock_bh(&priv->lock);
@@ -1318,7 +1318,7 @@ static int sc92031_ethtool_set_wol(struc
priv->pm_config = pm_config;
iowrite32(pm_config, port_base + PMConfig);
- mmiowb();
+ wmb();
spin_unlock_bh(&priv->lock);
@@ -1344,7 +1344,7 @@ static int sc92031_ethtool_nway_reset(st
out:
_sc92031_mii_scan(port_base);
- mmiowb();
+ wmb();
spin_unlock_bh(&priv->lock);
@@ -1535,7 +1535,7 @@ static int sc92031_suspend(struct pci_de
_sc92031_disable_tx_rx(dev);
_sc92031_tx_clear(dev);
- mmiowb();
+ wmb();
spin_unlock_bh(&priv->lock);
@@ -1560,7 +1560,7 @@ static int sc92031_resume(struct pci_dev
spin_lock_bh(&priv->lock);
_sc92031_reset(dev);
- mmiowb();
+ wmb();
spin_unlock_bh(&priv->lock);
sc92031_enable_interrupts(dev);
Index: linux-2.6/drivers/net/sfc/falcon_io.h
===================================================================
--- linux-2.6.orig/drivers/net/sfc/falcon_io.h
+++ linux-2.6/drivers/net/sfc/falcon_io.h
@@ -77,16 +77,16 @@ static inline void falcon_write(struct e
spin_lock_irqsave(&efx->biu_lock, flags);
#ifdef FALCON_USE_QWORD_IO
_falcon_writeq(efx, value->u64[0], reg + 0);
- wmb();
+ io_wmb();
_falcon_writeq(efx, value->u64[1], reg + 8);
#else
_falcon_writel(efx, value->u32[0], reg + 0);
_falcon_writel(efx, value->u32[1], reg + 4);
_falcon_writel(efx, value->u32[2], reg + 8);
- wmb();
+ io_wmb();
_falcon_writel(efx, value->u32[3], reg + 12);
#endif
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&efx->biu_lock, flags);
}
@@ -105,10 +105,10 @@ static inline void falcon_write_sram(str
_falcon_writeq(efx, value->u64[0], reg + 0);
#else
_falcon_writel(efx, value->u32[0], reg + 0);
- wmb();
+ io_wmb();
_falcon_writel(efx, value->u32[1], reg + 4);
#endif
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&efx->biu_lock, flags);
}
Index: linux-2.6/drivers/net/sky2.c
===================================================================
--- linux-2.6.orig/drivers/net/sky2.c
+++ linux-2.6/drivers/net/sky2.c
@@ -976,7 +976,7 @@ static inline void sky2_put_idx(struct s
sky2_write16(hw, Y2_QADDR(q, PREF_UNIT_PUT_IDX), idx);
/* Synchronize I/O on since next processor may write to tail */
- mmiowb();
+ wmb();
}
@@ -1098,7 +1098,7 @@ stopped:
/* reset the Rx prefetch unit */
sky2_write32(hw, Y2_QADDR(rxq, PREF_UNIT_CTRL), PREF_UNIT_RST_SET);
- mmiowb();
+ wmb();
}
/* Clean out receive buffer area, assumes receiver hardware stopped */
Index: linux-2.6/drivers/net/tg3.c
===================================================================
--- linux-2.6.orig/drivers/net/tg3.c
+++ linux-2.6/drivers/net/tg3.c
@@ -641,7 +641,7 @@ static void tg3_restart_ints(struct tg3
{
tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW,
tp->last_tag << 24);
- mmiowb();
+ wmb();
/* When doing tagged status, this work check is unnecessary.
* The last_tag we write above tells the chip which piece of
@@ -3811,7 +3811,7 @@ next_pkt_nopost:
tw32_rx_mbox(MAILBOX_RCV_JUMBO_PROD_IDX + TG3_64BIT_REG_LOW,
sw_idx);
}
- mmiowb();
+ wmb();
return received;
}
@@ -4402,7 +4402,7 @@ static int tg3_start_xmit(struct sk_buff
}
out_unlock:
- mmiowb();
+ wmb();
dev->trans_start = jiffies;
@@ -4613,7 +4613,7 @@ static int tg3_start_xmit_dma_bug(struct
}
out_unlock:
- mmiowb();
+ wmb();
dev->trans_start = jiffies;
Index: linux-2.6/drivers/net/wireless/b43legacy/ilt.c
===================================================================
--- linux-2.6.orig/drivers/net/wireless/b43legacy/ilt.c
+++ linux-2.6/drivers/net/wireless/b43legacy/ilt.c
@@ -315,14 +315,14 @@ const u16 b43legacy_ilt_sigmasqr2[B43leg
void b43legacy_ilt_write(struct b43legacy_wldev *dev, u16 offset, u16 val)
{
b43legacy_phy_write(dev, B43legacy_PHY_ILT_G_CTRL, offset);
- mmiowb();
+ io_wmb();
b43legacy_phy_write(dev, B43legacy_PHY_ILT_G_DATA1, val);
}
void b43legacy_ilt_write32(struct b43legacy_wldev *dev, u16 offset, u32 val)
{
b43legacy_phy_write(dev, B43legacy_PHY_ILT_G_CTRL, offset);
- mmiowb();
+ io_wmb();
b43legacy_phy_write(dev, B43legacy_PHY_ILT_G_DATA2,
(val & 0xFFFF0000) >> 16);
b43legacy_phy_write(dev, B43legacy_PHY_ILT_G_DATA1,
Index: linux-2.6/drivers/net/wireless/b43legacy/main.c
===================================================================
--- linux-2.6.orig/drivers/net/wireless/b43legacy/main.c
+++ linux-2.6/drivers/net/wireless/b43legacy/main.c
@@ -240,7 +240,7 @@ static void b43legacy_ram_write(struct b
val = swab32(val);
b43legacy_write32(dev, B43legacy_MMIO_RAM_CONTROL, offset);
- mmiowb();
+ io_wmb();
b43legacy_write32(dev, B43legacy_MMIO_RAM_DATA, val);
}
@@ -317,14 +317,14 @@ void b43legacy_shm_write32(struct b43leg
if (offset & 0x0003) {
/* Unaligned access */
b43legacy_shm_control_word(dev, routing, offset >> 2);
- mmiowb();
+ io_wmb();
b43legacy_write16(dev,
B43legacy_MMIO_SHM_DATA_UNALIGNED,
(value >> 16) & 0xffff);
- mmiowb();
+ io_wmb();
b43legacy_shm_control_word(dev, routing,
(offset >> 2) + 1);
- mmiowb();
+ io_wmb();
b43legacy_write16(dev, B43legacy_MMIO_SHM_DATA,
value & 0xffff);
return;
@@ -332,7 +332,7 @@ void b43legacy_shm_write32(struct b43leg
offset >>= 2;
}
b43legacy_shm_control_word(dev, routing, offset);
- mmiowb();
+ io_wmb();
b43legacy_write32(dev, B43legacy_MMIO_SHM_DATA, value);
}
@@ -344,7 +344,7 @@ void b43legacy_shm_write16(struct b43leg
if (offset & 0x0003) {
/* Unaligned access */
b43legacy_shm_control_word(dev, routing, offset >> 2);
- mmiowb();
+ io_wmb();
b43legacy_write16(dev,
B43legacy_MMIO_SHM_DATA_UNALIGNED,
value);
@@ -353,7 +353,7 @@ void b43legacy_shm_write16(struct b43leg
offset >>= 2;
}
b43legacy_shm_control_word(dev, routing, offset);
- mmiowb();
+ io_wmb();
b43legacy_write16(dev, B43legacy_MMIO_SHM_DATA, value);
}
@@ -447,7 +447,7 @@ static void b43legacy_time_lock(struct b
status = b43legacy_read32(dev, B43legacy_MMIO_MACCTL);
status |= B43legacy_MACCTL_TBTTHOLD;
b43legacy_write32(dev, B43legacy_MMIO_MACCTL, status);
- mmiowb();
+ wmb();
}
static void b43legacy_time_unlock(struct b43legacy_wldev *dev)
@@ -470,10 +470,10 @@ static void b43legacy_tsf_write_locked(s
u32 hi = (tsf & 0xFFFFFFFF00000000ULL) >> 32;
b43legacy_write32(dev, B43legacy_MMIO_REV3PLUS_TSF_LOW, 0);
- mmiowb();
+ io_wmb();
b43legacy_write32(dev, B43legacy_MMIO_REV3PLUS_TSF_HIGH,
hi);
- mmiowb();
+ io_wmb();
b43legacy_write32(dev, B43legacy_MMIO_REV3PLUS_TSF_LOW,
lo);
} else {
@@ -483,13 +483,13 @@ static void b43legacy_tsf_write_locked(s
u16 v3 = (tsf & 0xFFFF000000000000ULL) >> 48;
b43legacy_write16(dev, B43legacy_MMIO_TSF_0, 0);
- mmiowb();
+ io_wmb();
b43legacy_write16(dev, B43legacy_MMIO_TSF_3, v3);
- mmiowb();
+ io_wmb();
b43legacy_write16(dev, B43legacy_MMIO_TSF_2, v2);
- mmiowb();
+ io_wmb();
b43legacy_write16(dev, B43legacy_MMIO_TSF_1, v1);
- mmiowb();
+ io_wmb();
b43legacy_write16(dev, B43legacy_MMIO_TSF_0, v0);
}
}
@@ -1271,7 +1271,7 @@ static void b43legacy_interrupt_tasklet(
dma_reason[2], dma_reason[3],
dma_reason[4], dma_reason[5]);
b43legacy_controller_restart(dev, "DMA error");
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&dev->wl->irq_lock, flags);
return;
}
@@ -1321,7 +1321,7 @@ static void b43legacy_interrupt_tasklet(
handle_irq_transmit_status(dev);
b43legacy_interrupt_enable(dev, dev->irq_savedstate);
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&dev->wl->irq_lock, flags);
}
@@ -1414,7 +1414,7 @@ static irqreturn_t b43legacy_interrupt_h
dev->irq_reason = reason;
tasklet_schedule(&dev->isr_tasklet);
out:
- mmiowb();
+ wmb();
spin_unlock(&dev->wl->irq_lock);
return ret;
@@ -2661,7 +2661,7 @@ static int b43legacy_op_dev_config(struc
spin_lock_irqsave(&wl->irq_lock, flags);
b43legacy_interrupt_enable(dev, savedirqs);
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&wl->irq_lock, flags);
out_unlock_mutex:
mutex_unlock(&wl->mutex);
Index: linux-2.6/drivers/net/wireless/b43legacy/phy.c
===================================================================
--- linux-2.6.orig/drivers/net/wireless/b43legacy/phy.c
+++ linux-2.6/drivers/net/wireless/b43legacy/phy.c
@@ -132,7 +132,7 @@ u16 b43legacy_phy_read(struct b43legacy_
void b43legacy_phy_write(struct b43legacy_wldev *dev, u16 offset, u16 val)
{
b43legacy_write16(dev, B43legacy_MMIO_PHY_CONTROL, offset);
- mmiowb();
+ io_wmb();
b43legacy_write16(dev, B43legacy_MMIO_PHY_DATA, val);
}
Index: linux-2.6/drivers/net/wireless/b43legacy/pio.h
===================================================================
--- linux-2.6.orig/drivers/net/wireless/b43legacy/pio.h
+++ linux-2.6/drivers/net/wireless/b43legacy/pio.h
@@ -96,7 +96,7 @@ void b43legacy_pio_write(struct b43legac
u16 offset, u16 value)
{
b43legacy_write16(queue->dev, queue->mmio_base + offset, value);
- mmiowb();
+ wmb();
}
Index: linux-2.6/drivers/net/wireless/b43legacy/radio.c
===================================================================
--- linux-2.6.orig/drivers/net/wireless/b43legacy/radio.c
+++ linux-2.6/drivers/net/wireless/b43legacy/radio.c
@@ -95,7 +95,7 @@ void b43legacy_radio_lock(struct b43lega
B43legacy_WARN_ON(status & B43legacy_MACCTL_RADIOLOCK);
status |= B43legacy_MACCTL_RADIOLOCK;
b43legacy_write32(dev, B43legacy_MMIO_MACCTL, status);
- mmiowb();
+ wmb();
udelay(10);
}
@@ -108,7 +108,7 @@ void b43legacy_radio_unlock(struct b43le
B43legacy_WARN_ON(!(status & B43legacy_MACCTL_RADIOLOCK));
status &= ~B43legacy_MACCTL_RADIOLOCK;
b43legacy_write32(dev, B43legacy_MMIO_MACCTL, status);
- mmiowb();
+ wmb();
}
u16 b43legacy_radio_read16(struct b43legacy_wldev *dev, u16 offset)
@@ -141,7 +141,7 @@ u16 b43legacy_radio_read16(struct b43leg
void b43legacy_radio_write16(struct b43legacy_wldev *dev, u16 offset, u16 val)
{
b43legacy_write16(dev, B43legacy_MMIO_RADIO_CONTROL, offset);
- mmiowb();
+ io_wmb();
b43legacy_write16(dev, B43legacy_MMIO_RADIO_DATA_LOW, val);
}
@@ -333,7 +333,7 @@ u8 b43legacy_radio_aci_scan(struct b43le
void b43legacy_nrssi_hw_write(struct b43legacy_wldev *dev, u16 offset, s16 val)
{
b43legacy_phy_write(dev, B43legacy_PHY_NRSSILT_CTRL, offset);
- mmiowb();
+ io_wmb();
b43legacy_phy_write(dev, B43legacy_PHY_NRSSILT_DATA, (u16)val);
}
Index: linux-2.6/drivers/net/wireless/b43legacy/sysfs.c
===================================================================
--- linux-2.6.orig/drivers/net/wireless/b43legacy/sysfs.c
+++ linux-2.6/drivers/net/wireless/b43legacy/sysfs.c
@@ -143,7 +143,7 @@ static ssize_t b43legacy_attr_interfmode
if (err)
b43legacyerr(wldev->wl, "Interference Mitigation not "
"supported by device\n");
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&wldev->wl->irq_lock, flags);
mutex_unlock(&wldev->wl->mutex);
Index: linux-2.6/drivers/scsi/qla1280.c
===================================================================
--- linux-2.6.orig/drivers/scsi/qla1280.c
+++ linux-2.6/drivers/scsi/qla1280.c
@@ -2994,7 +2994,7 @@ qla1280_64bit_start_scsi(struct scsi_qla
ha->actthreads++;
WRT_REG_WORD(®->mailbox4, ha->req_ring_index);
/* Enforce mmio write ordering; see comment in qla1280_isp_cmd(). */
- mmiowb();
+ wmb();
out:
if (status)
@@ -3244,7 +3244,7 @@ qla1280_32bit_start_scsi(struct scsi_qla
ha->actthreads++;
WRT_REG_WORD(®->mailbox4, ha->req_ring_index);
/* Enforce mmio write ordering; see comment in qla1280_isp_cmd(). */
- mmiowb();
+ wmb();
out:
if (status)
@@ -3356,19 +3356,19 @@ qla1280_isp_cmd(struct scsi_qla_host *ha
/*
* Update request index to mailbox4 (Request Queue In).
- * The mmiowb() ensures that this write is ordered with writes by other
- * CPUs. Without the mmiowb(), it is possible for the following:
+ * The wmb() ensures that this write is ordered with writes by other
+ * CPUs. Without the wmb(), it is possible for the following:
* CPUA posts write of index 5 to mailbox4
* CPUA releases host lock
* CPUB acquires host lock
* CPUB posts write of index 6 to mailbox4
* On PCI bus, order reverses and write of 6 posts, then index 5,
* causing chip to issue full queue of stale commands
- * The mmiowb() prevents future writes from crossing the barrier.
+ * The wmb() prevents future writes from crossing the barrier.
* See Documentation/DocBook/deviceiobook.tmpl for more information.
*/
WRT_REG_WORD(®->mailbox4, ha->req_ring_index);
- mmiowb();
+ wmb();
LEAVE("qla1280_isp_cmd");
}
Index: linux-2.6/drivers/serial/serial_txx9.c
===================================================================
--- linux-2.6.orig/drivers/serial/serial_txx9.c
+++ linux-2.6/drivers/serial/serial_txx9.c
@@ -252,7 +252,7 @@ static void serial_txx9_initialize(struc
sio_out(up, TXX9_SIFCR, TXX9_SIFCR_SWRST);
/* TX4925 BUG WORKAROUND. Accessing SIOC register
* immediately after soft reset causes bus error. */
- mmiowb();
+ io_wmb();
udelay(1);
while ((sio_in(up, TXX9_SIFCR) & TXX9_SIFCR_SWRST) && --tmout)
udelay(1);
Index: linux-2.6/drivers/ssb/pci.c
===================================================================
--- linux-2.6.orig/drivers/ssb/pci.c
+++ linux-2.6/drivers/ssb/pci.c
@@ -279,7 +279,7 @@ static int sprom_do_write(struct ssb_bus
else if (i % 2)
ssb_printk(".");
writew(sprom[i], bus->mmio + SSB_SPROM_BASE + (i * 2));
- mmiowb();
+ io_wmb();
msleep(20);
}
err = pci_read_config_dword(pdev, SSB_SPROMCTL, &spromctl);
Index: linux-2.6/drivers/ssb/pcmcia.c
===================================================================
--- linux-2.6.orig/drivers/ssb/pcmcia.c
+++ linux-2.6/drivers/ssb/pcmcia.c
@@ -353,7 +353,7 @@ static void ssb_pcmcia_write8(struct ssb
err = select_core_and_segment(dev, &offset);
if (likely(!err))
writeb(value, bus->mmio + offset);
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&bus->bar_lock, flags);
}
@@ -367,7 +367,7 @@ static void ssb_pcmcia_write16(struct ss
err = select_core_and_segment(dev, &offset);
if (likely(!err))
writew(value, bus->mmio + offset);
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&bus->bar_lock, flags);
}
@@ -383,7 +383,7 @@ static void ssb_pcmcia_write32(struct ss
writew((value & 0x0000FFFF), bus->mmio + offset);
writew(((value & 0xFFFF0000) >> 16), bus->mmio + offset + 2);
}
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&bus->bar_lock, flags);
}
@@ -439,7 +439,7 @@ static void ssb_pcmcia_block_write(struc
SSB_WARN_ON(1);
}
unlock:
- mmiowb();
+ wmb();
spin_unlock_irqrestore(&bus->bar_lock, flags);
}
#endif /* CONFIG_SSB_BLOCKIO */
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [rfc] add io barriers, remove mmiowb
2008-05-21 15:28 [rfc] add io barriers, remove mmiowb Nick Piggin
@ 2008-05-22 8:34 ` Jes Sorensen
2008-05-22 9:51 ` Nick Piggin
0 siblings, 1 reply; 7+ messages in thread
From: Jes Sorensen @ 2008-05-22 8:34 UTC (permalink / raw)
To: Nick Piggin; +Cc: linux-arch, jbarnes, Linus Torvalds
>>>>> "Nick" == Nick Piggin <npiggin@suse.de> writes:
Nick> mb and wmb are now no longer guaranteed to order system memory
Nick> operations with device memory stores. mmiowb has been introduced
Nick> to provide this ordering (when combined with a mb, wmb, or
Nick> spin_unlock). Unfortunately, it appears to be rather less well
Nick> understood among both users and implementors than even the old
Nick> memory barrier scheme. It also subtly breaks existing code that
Nick> uses mb or wmb (if only on sn2). I really think it is not a good
Nick> solution.
Nick> The alternative I propose is to restore mb and wmb to their full
Nick> strength. This does mean that sn2 has to do the equivalent of
Nick> mb+mmiowb, wmb+mmiowb respectively, but that's the price you pay
Nick> for weak memory ordering.
Nick,
Introducing this constraint would make me less than pleased I have to
admit. It's a very expensive operation to do since it requires going
out talking to the PCI bridge, doing that on every wmb() is going to
really hurt :-(
Cheers,
Jes
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc] add io barriers, remove mmiowb
2008-05-22 8:34 ` Jes Sorensen
@ 2008-05-22 9:51 ` Nick Piggin
2008-05-22 12:28 ` Jes Sorensen
0 siblings, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2008-05-22 9:51 UTC (permalink / raw)
To: Jes Sorensen; +Cc: linux-arch, jbarnes, Linus Torvalds
On Thu, May 22, 2008 at 04:34:51AM -0400, Jes Sorensen wrote:
> >>>>> "Nick" == Nick Piggin <npiggin@suse.de> writes:
>
> Nick> mb and wmb are now no longer guaranteed to order system memory
> Nick> operations with device memory stores. mmiowb has been introduced
> Nick> to provide this ordering (when combined with a mb, wmb, or
> Nick> spin_unlock). Unfortunately, it appears to be rather less well
> Nick> understood among both users and implementors than even the old
> Nick> memory barrier scheme. It also subtly breaks existing code that
> Nick> uses mb or wmb (if only on sn2). I really think it is not a good
> Nick> solution.
>
> Nick> The alternative I propose is to restore mb and wmb to their full
> Nick> strength. This does mean that sn2 has to do the equivalent of
> Nick> mb+mmiowb, wmb+mmiowb respectively, but that's the price you pay
> Nick> for weak memory ordering.
>
> Nick,
>
> Introducing this constraint would make me less than pleased I have to
> admit. It's a very expensive operation to do since it requires going
> out talking to the PCI bridge, doing that on every wmb() is going to
> really hurt :-(
Right, but probably the large majority of wmb() callers actually
just want io_wmb(). This would relieve much of the performance
problem, I'd say.
Of those that really want a wmb() and cannot be converted to
io_wmb(), I don't think it is a good option to actually just weaken
wmb() because we deem that doing what the caller asked for is too
expensive.
I guess with the ia64_mf(), Altix probably does not reorder PCI
stores past earlier cacheable stores, so _some_ wmb()s actually
do not require the full mmiowb case (if we only need to order
an earlier RAM store with a later PCI store). However, again,
weakening wmb() is not a good option because it really requires
an audit of the entire tree to do that.
We _could_ introduce partial barriers like store/iostore iostore/store,
but really, I think the io_wmb is a pretty good first step, and I
haven't actually seen any numbers indicating it would be a performance
problem.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc] add io barriers, remove mmiowb
2008-05-22 9:51 ` Nick Piggin
@ 2008-05-22 12:28 ` Jes Sorensen
2008-05-22 16:34 ` Jesse Barnes
2008-05-22 23:59 ` Paul Mackerras
0 siblings, 2 replies; 7+ messages in thread
From: Jes Sorensen @ 2008-05-22 12:28 UTC (permalink / raw)
To: Nick Piggin; +Cc: linux-arch, jbarnes, Linus Torvalds
Nick Piggin wrote:
> Right, but probably the large majority of wmb() callers actually
> just want io_wmb(). This would relieve much of the performance
> problem, I'd say.
>
> Of those that really want a wmb() and cannot be converted to
> io_wmb(), I don't think it is a good option to actually just weaken
> wmb() because we deem that doing what the caller asked for is too
> expensive.
Hi Nick,
I believe there's a fair number of places where wmb() is used for
memory ordering not related to IO.
> I guess with the ia64_mf(), Altix probably does not reorder PCI
> stores past earlier cacheable stores, so _some_ wmb()s actually
> do not require the full mmiowb case (if we only need to order
> an earlier RAM store with a later PCI store). However, again,
> weakening wmb() is not a good option because it really requires
> an audit of the entire tree to do that.
Nope, unfortunately not, ia64_mf() isn't strong enough to prevent the
reordering, it's done in the PCI controller, so in order to guarantee
the the reording you have to go all the way out to the PCI controller,
which is very slow.
> We _could_ introduce partial barriers like store/iostore iostore/store,
> but really, I think the io_wmb is a pretty good first step, and I
> haven't actually seen any numbers indicating it would be a performance
> problem.
I must admit I am not 100% upto speed on the entire discussion, but I
think the io_wmb() and friends did go around in the past and got shot
down.
Cheers,
Jes
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc] add io barriers, remove mmiowb
2008-05-22 12:28 ` Jes Sorensen
@ 2008-05-22 16:34 ` Jesse Barnes
2008-05-23 1:44 ` Nick Piggin
2008-05-22 23:59 ` Paul Mackerras
1 sibling, 1 reply; 7+ messages in thread
From: Jesse Barnes @ 2008-05-22 16:34 UTC (permalink / raw)
To: Jes Sorensen; +Cc: Nick Piggin, linux-arch, Linus Torvalds
On Thursday, May 22, 2008 5:28 am Jes Sorensen wrote:
> Nick Piggin wrote:
> > Right, but probably the large majority of wmb() callers actually
> > just want io_wmb(). This would relieve much of the performance
> > problem, I'd say.
> >
> > Of those that really want a wmb() and cannot be converted to
> > io_wmb(), I don't think it is a good option to actually just weaken
> > wmb() because we deem that doing what the caller asked for is too
> > expensive.
>
> Hi Nick,
>
> I believe there's a fair number of places where wmb() is used for
> memory ordering not related to IO.
>
> > I guess with the ia64_mf(), Altix probably does not reorder PCI
> > stores past earlier cacheable stores, so _some_ wmb()s actually
> > do not require the full mmiowb case (if we only need to order
> > an earlier RAM store with a later PCI store). However, again,
> > weakening wmb() is not a good option because it really requires
> > an audit of the entire tree to do that.
>
> Nope, unfortunately not, ia64_mf() isn't strong enough to prevent the
> reordering, it's done in the PCI controller, so in order to guarantee
> the the reording you have to go all the way out to the PCI controller,
> which is very slow.
And more than that, the local PCI controller has to wait for any outstanding
writes to arrive at the target host bridge. That's why the operation is so
expensive.
> > We _could_ introduce partial barriers like store/iostore iostore/store,
> > but really, I think the io_wmb is a pretty good first step, and I
> > haven't actually seen any numbers indicating it would be a performance
> > problem.
>
> I must admit I am not 100% upto speed on the entire discussion, but I
> think the io_wmb() and friends did go around in the past and got shot
> down.
To be fair to the ia64 guys who pushed this (me), I think the powerpc guys
were supposed to introduce the other set of barriers they needed at around
the same time, so we'd have the complete set. I guess they never got around
to it.
Given that core kernel code using wmb() usually doesn't care about I/O
ordering, making it into a heavyweight operation might be a bad idea,
especially if powerpc wants to weaken its wmb() operations eventually.
Is there really a conflict of definitions except for between ia64 and powerpc
here? IIRC they needed more types of barriers to speed things up, but never
introduced them, and so had to make some of the existing barriers much more
expensive than they would have liked...
Jesse
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc] add io barriers, remove mmiowb
2008-05-22 16:34 ` Jesse Barnes
@ 2008-05-23 1:44 ` Nick Piggin
0 siblings, 0 replies; 7+ messages in thread
From: Nick Piggin @ 2008-05-23 1:44 UTC (permalink / raw)
To: Jesse Barnes; +Cc: Jes Sorensen, linux-arch, Linus Torvalds
On Thu, May 22, 2008 at 09:34:07AM -0700, Jesse Barnes wrote:
> On Thursday, May 22, 2008 5:28 am Jes Sorensen wrote:
> > Nick Piggin wrote:
> > > Right, but probably the large majority of wmb() callers actually
> > > just want io_wmb(). This would relieve much of the performance
> > > problem, I'd say.
> > >
> > > Of those that really want a wmb() and cannot be converted to
> > > io_wmb(), I don't think it is a good option to actually just weaken
> > > wmb() because we deem that doing what the caller asked for is too
> > > expensive.
> >
> > Hi Nick,
> >
> > I believe there's a fair number of places where wmb() is used for
> > memory ordering not related to IO.
> >
> > > I guess with the ia64_mf(), Altix probably does not reorder PCI
> > > stores past earlier cacheable stores, so _some_ wmb()s actually
> > > do not require the full mmiowb case (if we only need to order
> > > an earlier RAM store with a later PCI store). However, again,
> > > weakening wmb() is not a good option because it really requires
> > > an audit of the entire tree to do that.
> >
> > Nope, unfortunately not, ia64_mf() isn't strong enough to prevent the
> > reordering, it's done in the PCI controller, so in order to guarantee
> > the the reording you have to go all the way out to the PCI controller,
> > which is very slow.
>
> And more than that, the local PCI controller has to wait for any outstanding
> writes to arrive at the target host bridge. That's why the operation is so
> expensive.
Right, but if the code *needs* a wmb(), then skipping the full ordering
steps basically means that sn2 doesn't implement the Linux memory
barrier specification properly. It may be expensive, but it is
perfectly legitimate for code to say
writel(iomem)
wmb();
written = 1;
And expect iomem to be seen at the device before an io store from
another CPU which has seen written == 1 (provided it issues the
correct barriers too).
> > > We _could_ introduce partial barriers like store/iostore iostore/store,
> > > but really, I think the io_wmb is a pretty good first step, and I
> > > haven't actually seen any numbers indicating it would be a performance
> > > problem.
> >
> > I must admit I am not 100% upto speed on the entire discussion, but I
> > think the io_wmb() and friends did go around in the past and got shot
> > down.
>
> To be fair to the ia64 guys who pushed this (me), I think the powerpc guys
> were supposed to introduce the other set of barriers they needed at around
> the same time, so we'd have the complete set. I guess they never got around
> to it.
OK, I'm not trying to assign any blame ;) I just want to try improving
things.
> Given that core kernel code using wmb() usually doesn't care about I/O
> ordering, making it into a heavyweight operation might be a bad idea,
> especially if powerpc wants to weaken its wmb() operations eventually.
There are a lot of suspect barriers which should be smp_ variants, as
Paul says. Attached is a very quick pass of a few key directories...
needs review from respective maintainers, but it gives an idea.
powerpc does *not* want to weaken its wmb(), but what it can do is
take advantage of a cheaper io_wmb(), like sn2.
> Is there really a conflict of definitions except for between ia64 and powerpc
> here? IIRC they needed more types of barriers to speed things up, but never
> introduced them, and so had to make some of the existing barriers much more
> expensive than they would have liked...
There are a couple of other types of barriers I guess you could
introduce... partial barriers like load/store store/load iostore/store
etc.; or acquire/release for IO. I think the latter isn't such a bad
idea, but either way you're introducing new barrier concepts when we
already have trouble getting the existing ones right.
io_ barriers I guess should be easier to understand if one already
understands existing barriers, so I think they are the best first
step.
--
Index: linux-2.6/arch/x86/kernel/kvmclock.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/kvmclock.c
+++ linux-2.6/arch/x86/kernel/kvmclock.c
@@ -69,10 +69,10 @@ unsigned long kvm_get_wallclock(void)
native_write_msr(MSR_KVM_WALL_CLOCK, low, high);
do {
version = wall_clock.wc_version;
- rmb();
+ smp_rmb();
wc_sec = wall_clock.wc_sec;
wc_nsec = wall_clock.wc_nsec;
- rmb();
+ smp_rmb();
} while ((wall_clock.wc_version != version) || (version & 1));
delta = kvm_clock_read() - delta;
Index: linux-2.6/arch/x86/lguest/boot.c
===================================================================
--- linux-2.6.orig/arch/x86/lguest/boot.c
+++ linux-2.6/arch/x86/lguest/boot.c
@@ -113,7 +113,7 @@ static void async_hcall(unsigned long ca
lguest_data.hcalls[next_call].arg2 = arg2;
lguest_data.hcalls[next_call].arg3 = arg3;
/* Arguments must all be written before we mark it to go */
- wmb();
+ smp_wmb();
lguest_data.hcall_status[next_call] = 0;
if (++next_call == LHCALL_RING_SIZE)
next_call = 0;
@@ -627,11 +627,11 @@ static cycle_t lguest_clock_read(void)
/* This read memory barrier tells the compiler and the CPU that
* this can't be reordered: we have to complete the above
* before going on. */
- rmb();
+ smp_rmb();
/* Now we read the nanoseconds part. */
nsec = lguest_data.time.tv_nsec;
/* Make sure we've done that. */
- rmb();
+ smp_rmb();
/* Now if the seconds part has changed, try again. */
} while (unlikely(lguest_data.time.tv_sec != sec));
Index: linux-2.6/arch/x86/xen/smp.c
===================================================================
--- linux-2.6.orig/arch/x86/xen/smp.c
+++ linux-2.6/arch/x86/xen/smp.c
@@ -302,7 +302,7 @@ int __cpuinit xen_cpu_up(unsigned int cp
smp_store_cpu_info(cpu);
set_cpu_sibling_map(cpu);
/* This must be done before setting cpu_online_map */
- wmb();
+ smp_wmb();
cpu_set(cpu, cpu_online_map);
Index: linux-2.6/arch/x86/xen/time.c
===================================================================
--- linux-2.6.orig/arch/x86/xen/time.c
+++ linux-2.6/arch/x86/xen/time.c
@@ -247,12 +247,12 @@ static unsigned get_time_values_from_xen
do {
dst->version = src->version;
- rmb(); /* fetch version before data */
+ smp_rmb(); /* fetch version before data */
dst->tsc_timestamp = src->tsc_timestamp;
dst->system_timestamp = src->system_time;
dst->tsc_to_nsec_mul = src->tsc_to_system_mul;
dst->tsc_shift = src->tsc_shift;
- rmb(); /* test version after fetching data */
+ smp_rmb(); /* test version after fetching data */
} while ((src->version & 1) | (dst->version ^ src->version));
return dst->version;
@@ -332,10 +332,10 @@ static void xen_read_wallclock(struct ti
/* get wallclock at system boot */
do {
version = s->wc_version;
- rmb(); /* fetch version before time */
+ smp_rmb(); /* fetch version before time */
now.tv_sec = s->wc_sec;
now.tv_nsec = s->wc_nsec;
- rmb(); /* fetch time before checking version */
+ smp_rmb(); /* fetch time before checking version */
} while ((s->wc_version & 1) | (version ^ s->wc_version));
delta = xen_clocksource_read(); /* time since system boot */
Index: linux-2.6/crypto/xor.c
===================================================================
--- linux-2.6.orig/crypto/xor.c
+++ linux-2.6/crypto/xor.c
@@ -78,11 +78,11 @@ do_xor_speed(struct xor_block_template *
now = jiffies;
count = 0;
while (jiffies == now) {
- mb(); /* prevent loop optimzation */
+ barrier(); /* prevent loop optimzation */
tmpl->do_2(BENCH_SIZE, b1, b2);
- mb();
+ barrier();
count++;
- mb();
+ barrier();
}
if (count > max)
max = count;
Index: linux-2.6/drivers/block/xen-blkfront.c
===================================================================
--- linux-2.6.orig/drivers/block/xen-blkfront.c
+++ linux-2.6/drivers/block/xen-blkfront.c
@@ -481,7 +481,7 @@ static irqreturn_t blkif_interrupt(int i
again:
rp = info->ring.sring->rsp_prod;
- rmb(); /* Ensure we see queued responses up to 'rp'. */
+ smp_rmb(); /* Ensure we see queued responses up to 'rp'. */
for (i = info->ring.rsp_cons; i != rp; i++) {
unsigned long id;
Index: linux-2.6/drivers/oprofile/buffer_sync.c
===================================================================
--- linux-2.6.orig/drivers/oprofile/buffer_sync.c
+++ linux-2.6/drivers/oprofile/buffer_sync.c
@@ -416,7 +416,7 @@ static void increment_tail(struct oprofi
{
unsigned long new_tail = b->tail_pos + 1;
- rmb();
+ smp_rmb();
if (new_tail < b->buffer_size)
b->tail_pos = new_tail;
Index: linux-2.6/drivers/oprofile/cpu_buffer.c
===================================================================
--- linux-2.6.orig/drivers/oprofile/cpu_buffer.c
+++ linux-2.6/drivers/oprofile/cpu_buffer.c
@@ -137,7 +137,7 @@ static void increment_head(struct oprofi
/* Ensure anything written to the slot before we
* increment is visible */
- wmb();
+ smp_wmb();
if (new_head < b->buffer_size)
b->head_pos = new_head;
Index: linux-2.6/drivers/xen/events.c
===================================================================
--- linux-2.6.orig/drivers/xen/events.c
+++ linux-2.6/drivers/xen/events.c
@@ -527,10 +527,8 @@ void xen_evtchn_do_upcall(struct pt_regs
if (__get_cpu_var(nesting_count)++)
goto out;
-#ifndef CONFIG_X86 /* No need for a barrier -- XCHG is a barrier on x86. */
/* Clear master flag /before/ clearing selector flag. */
- rmb();
-#endif
+ smp_rmb();
pending_words = xchg(&vcpu_info->evtchn_pending_sel, 0);
while (pending_words != 0) {
unsigned long pending_bits;
Index: linux-2.6/drivers/xen/grant-table.c
===================================================================
--- linux-2.6.orig/drivers/xen/grant-table.c
+++ linux-2.6/drivers/xen/grant-table.c
@@ -150,7 +150,7 @@ static void update_grant_entry(grant_ref
*/
shared[ref].frame = frame;
shared[ref].domid = domid;
- wmb();
+ smp_wmb();
shared[ref].flags = flags;
}
@@ -264,7 +264,7 @@ unsigned long gnttab_end_foreign_transfe
cpu_relax();
}
- rmb(); /* Read the frame number /after/ reading completion status. */
+ smp_rmb();/* Read the frame number /after/ reading completion status. */
frame = shared[ref].frame;
BUG_ON(frame == 0);
Index: linux-2.6/include/xen/interface/io/ring.h
===================================================================
--- linux-2.6.orig/include/xen/interface/io/ring.h
+++ linux-2.6/include/xen/interface/io/ring.h
@@ -182,12 +182,12 @@ struct __name##_back_ring { \
(((_cons) - (_r)->rsp_prod_pvt) >= RING_SIZE(_r))
#define RING_PUSH_REQUESTS(_r) do { \
- wmb(); /* back sees requests /before/ updated producer index */ \
+ smp_wmb(); /* back sees requests /before/ updated producer index */ \
(_r)->sring->req_prod = (_r)->req_prod_pvt; \
} while (0)
#define RING_PUSH_RESPONSES(_r) do { \
- wmb(); /* front sees responses /before/ updated producer index */ \
+ smp_wmb(); /* front sees responses /before/ updated producer index */ \
(_r)->sring->rsp_prod = (_r)->rsp_prod_pvt; \
} while (0)
@@ -224,9 +224,9 @@ struct __name##_back_ring { \
#define RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(_r, _notify) do { \
RING_IDX __old = (_r)->sring->req_prod; \
RING_IDX __new = (_r)->req_prod_pvt; \
- wmb(); /* back sees requests /before/ updated producer index */ \
+ smp_wmb(); /* back sees requests /before/ updated producer index */ \
(_r)->sring->req_prod = __new; \
- mb(); /* back sees new requests /before/ we check req_event */ \
+ smp_mb(); /* back sees new requests /before/ we check req_event */ \
(_notify) = ((RING_IDX)(__new - (_r)->sring->req_event) < \
(RING_IDX)(__new - __old)); \
} while (0)
@@ -234,9 +234,9 @@ struct __name##_back_ring { \
#define RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(_r, _notify) do { \
RING_IDX __old = (_r)->sring->rsp_prod; \
RING_IDX __new = (_r)->rsp_prod_pvt; \
- wmb(); /* front sees responses /before/ updated producer index */ \
+ smp_wmb(); /* front sees responses /before/ updated producer index */ \
(_r)->sring->rsp_prod = __new; \
- mb(); /* front sees new responses /before/ we check rsp_event */ \
+ smp_mb(); /* front sees new responses /before/ we check rsp_event */ \
(_notify) = ((RING_IDX)(__new - (_r)->sring->rsp_event) < \
(RING_IDX)(__new - __old)); \
} while (0)
@@ -245,7 +245,7 @@ struct __name##_back_ring { \
(_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \
if (_work_to_do) break; \
(_r)->sring->req_event = (_r)->req_cons + 1; \
- mb(); \
+ smp_mb(); \
(_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \
} while (0)
@@ -253,7 +253,7 @@ struct __name##_back_ring { \
(_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \
if (_work_to_do) break; \
(_r)->sring->rsp_event = (_r)->rsp_cons + 1; \
- mb(); \
+ smp_mb(); \
(_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \
} while (0)
Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -38,7 +38,7 @@ static inline void frozen_process(void)
{
if (!unlikely(current->flags & PF_NOFREEZE)) {
current->flags |= PF_FROZEN;
- wmb();
+ smp_wmb();
}
clear_freeze_flag(current);
}
@@ -122,7 +122,7 @@ static int freeze_task(struct task_struc
wake_up_state(p, TASK_INTERRUPTIBLE);
}
} else {
- rmb();
+ smp_rmb();
if (frozen(p)) {
ret = 0;
} else {
Index: linux-2.6/kernel/sched_rt.c
===================================================================
--- linux-2.6.orig/kernel/sched_rt.c
+++ linux-2.6/kernel/sched_rt.c
@@ -20,7 +20,7 @@ static inline void rt_set_overload(struc
* if we looked at the mask, but the mask was not
* updated yet.
*/
- wmb();
+ smp_wmb();
atomic_inc(&rq->rd->rto_count);
}
Index: linux-2.6/net/sched/cls_rsvp.h
===================================================================
--- linux-2.6.orig/net/sched/cls_rsvp.h
+++ linux-2.6/net/sched/cls_rsvp.h
@@ -516,7 +516,7 @@ insert:
if (((*fp)->spi.mask&f->spi.mask) != f->spi.mask)
break;
f->next = *fp;
- wmb();
+ smp_wmb();
*fp = f;
*arg = (unsigned long)f;
@@ -542,7 +542,7 @@ insert:
break;
}
s->next = *sp;
- wmb();
+ smp_wmb();
*sp = s;
goto insert;
Index: linux-2.6/net/sched/cls_u32.c
===================================================================
--- linux-2.6.orig/net/sched/cls_u32.c
+++ linux-2.6/net/sched/cls_u32.c
@@ -649,7 +649,7 @@ static int u32_change(struct tcf_proto *
break;
n->next = *ins;
- wmb();
+ smp_wmb();
*ins = n;
*arg = (unsigned long)n;
Index: linux-2.6/security/selinux/ss/sidtab.c
===================================================================
--- linux-2.6.orig/security/selinux/ss/sidtab.c
+++ linux-2.6/security/selinux/ss/sidtab.c
@@ -71,11 +71,11 @@ int sidtab_insert(struct sidtab *s, u32
if (prev) {
newnode->next = prev->next;
- wmb();
+ smp_wmb();
prev->next = newnode;
} else {
newnode->next = s->htable[hvalue];
- wmb();
+ smp_wmb();
s->htable[hvalue] = newnode;
}
Index: linux-2.6/arch/ia64/kernel/process.c
===================================================================
--- linux-2.6.orig/arch/ia64/kernel/process.c
+++ linux-2.6/arch/ia64/kernel/process.c
@@ -314,7 +314,7 @@ cpu_idle (void)
#ifdef CONFIG_SMP
min_xtp();
#endif
- rmb();
+ smp_rmb();
if (mark_idle)
(*mark_idle)(1);
Index: linux-2.6/arch/ia64/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/ia64/kernel/smp.c
+++ linux-2.6/arch/ia64/kernel/smp.c
@@ -112,13 +112,13 @@ handle_call_data(void)
info = data->info;
wait = data->wait;
- mb();
+ smp_mb();
atomic_inc(&data->started);
/* At this point the structure may be gone unless wait is true. */
(*func)(info);
/* Notify the sending CPU that the task is done. */
- mb();
+ smp_mb();
if (wait)
atomic_inc(&data->finished);
}
@@ -153,9 +153,9 @@ handle_IPI (int irq, void *dev_id)
unsigned long *pending_ipis = &__ia64_per_cpu_var(ipi_operation);
unsigned long ops;
- mb(); /* Order interrupt and bit testing. */
+ smp_mb(); /* Order interrupt and bit testing. */
while ((ops = xchg(pending_ipis, 0)) != 0) {
- mb(); /* Order bit clearing and data access. */
+ smp_mb(); /* Order bit clearing and data access. */
do {
unsigned long which;
@@ -181,7 +181,7 @@ handle_IPI (int irq, void *dev_id)
break;
}
} while (ops);
- mb(); /* Order data access and bit testing. */
+ smp_mb(); /* Order data access and bit testing. */
}
put_cpu();
return IRQ_HANDLED;
@@ -313,7 +313,7 @@ smp_flush_tlb_cpumask(cpumask_t xcpumask
for_each_cpu_mask(cpu, cpumask)
counts[cpu] = local_tlb_flush_counts[cpu].count;
- mb();
+ smp_mb();
for_each_cpu_mask(cpu, cpumask) {
if (cpu == mycpu)
flush_mycpu = 1;
@@ -398,7 +398,7 @@ smp_call_function_single (int cpuid, voi
spin_lock_bh(&call_lock);
call_data = &data;
- mb(); /* ensure store to call_data precedes setting of IPI_CALL_FUNC */
+ smp_mb(); /* ensure store to call_data precedes setting of IPI_CALL_FUNC */
send_IPI_single(cpuid, IPI_CALL_FUNC);
/* Wait for response */
@@ -462,7 +462,7 @@ int smp_call_function_mask(cpumask_t mas
atomic_set(&data.finished, 0);
call_data = &data;
- mb(); /* ensure store to call_data precedes setting of IPI_CALL_FUNC*/
+ smp_mb(); /* ensure store to call_data precedes setting of IPI_CALL_FUNC*/
/* Send a message to other CPUs */
if (cpus_equal(mask, allbutself))
@@ -528,7 +528,7 @@ smp_call_function (void (*func) (void *i
atomic_set(&data.finished, 0);
call_data = &data;
- mb(); /* ensure store to call_data precedes setting of IPI_CALL_FUNC */
+ smp_mb(); /* ensure store to call_data precedes setting of IPI_CALL_FUNC */
send_IPI_allbutself(IPI_CALL_FUNC);
/* Wait for response */
Index: linux-2.6/arch/ia64/kernel/unaligned.c
===================================================================
--- linux-2.6.orig/arch/ia64/kernel/unaligned.c
+++ linux-2.6/arch/ia64/kernel/unaligned.c
@@ -856,7 +856,7 @@ emulate_load_int (unsigned long ifa, loa
* use ordering fence.
*/
if (ld.x6_op == 0x5 || ld.x6_op == 0xa)
- mb();
+ smp_mb();
/*
* invalidate ALAT entry in case of advanced load
@@ -937,7 +937,7 @@ emulate_store_int (unsigned long ifa, lo
* stX.rel: use fence instead of release
*/
if (ld.x6_op == 0xd)
- mb();
+ smp_mb();
return 0;
}
Index: linux-2.6/arch/ia64/kvm/process.c
===================================================================
--- linux-2.6.orig/arch/ia64/kvm/process.c
+++ linux-2.6/arch/ia64/kvm/process.c
@@ -722,7 +722,7 @@ void leave_hypervisor_tail(void)
}
}
- rmb();
+ smp_rmb();
if (v->arch.irq_new_pending) {
v->arch.irq_new_pending = 0;
VMX(v, irq_check) = 0;
Index: linux-2.6/arch/ia64/kvm/vcpu.c
===================================================================
--- linux-2.6.orig/arch/ia64/kvm/vcpu.c
+++ linux-2.6/arch/ia64/kvm/vcpu.c
@@ -934,7 +934,7 @@ void vcpu_unpend_interrupt(struct kvm_vc
local_irq_restore(spsr);
if (ret) {
vcpu->arch.irq_new_pending = 1;
- wmb();
+ smp_wmb();
}
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc] add io barriers, remove mmiowb
2008-05-22 12:28 ` Jes Sorensen
2008-05-22 16:34 ` Jesse Barnes
@ 2008-05-22 23:59 ` Paul Mackerras
1 sibling, 0 replies; 7+ messages in thread
From: Paul Mackerras @ 2008-05-22 23:59 UTC (permalink / raw)
To: Jes Sorensen; +Cc: Nick Piggin, linux-arch, jbarnes, Linus Torvalds
Jes Sorensen writes:
> I believe there's a fair number of places where wmb() is used for
> memory ordering not related to IO.
Shouldn't those places be using smp_wmb() then?
Paul.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-05-23 1:44 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-21 15:28 [rfc] add io barriers, remove mmiowb Nick Piggin
2008-05-22 8:34 ` Jes Sorensen
2008-05-22 9:51 ` Nick Piggin
2008-05-22 12:28 ` Jes Sorensen
2008-05-22 16:34 ` Jesse Barnes
2008-05-23 1:44 ` Nick Piggin
2008-05-22 23:59 ` Paul Mackerras
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox