* [PATCHSET] block: fix PIO cache coherency bug, take 2
@ 2006-06-04 3:41 Tejun Heo
2006-06-04 3:41 ` [PATCH 1/5] arm: implement flush_kernel_dcache_page() Tejun Heo
` (5 more replies)
0 siblings, 6 replies; 27+ messages in thread
From: Tejun Heo @ 2006-06-04 3:41 UTC (permalink / raw)
To: Jens Axboe, James Bottomley, Dave Miller, bzolnier, james.steward,
jgarzik, mattjreimer, Guennadi Liakhovetski, rmk, lkml, linux-ide,
linux-scsi, htejun
Hello, all.
Here's another round of block PIO cache coherency fix patchset. The
previous try[1] was rejected because flush_dcache_page() was excessive
and couldn't be called from irq context. A new cachetlb interface has
been introduced - flush_kernel_dcache_page(), which is only
responsible for flushing the kernel mapping and safe to call from irq
context. The function is implemented only for parisc. This patchset
adds implementation for arm.
blk kmap wrappers have been dropped and calls to
flush_kernel_dcache_page() have been directly added. Because
flush_kernel_dcache_page() hasn't been implemented on many
architectures, converting to such wrappers breaks cache coherency for
such architectures. kmap should be updated after all archtectures
with aliasing caches implement flush_kernel_dcache_page().
Russell, can you please verify arm's flush_kernel_dcache_page()? I
tried to implement flush_anon_page() too but didn't know what to do
with anon_vma object. It seems that a call to
__cpuc_flush_user_range() should do the job but it requires
vma->vm_flags to see whether it's an executable page. To access vma
from anon mapped page, page->mapping:anon_vma->lock should be grabbed
and probably the first vma on the list can be used, which is kind of
complex. I think the options here are...
* adding vma argument to flush_anon_page()
* always flush for the worst vm_flags
I have only compile tested. Please verify this fixes the coherency
problem on arm.
Jens, if everyone is happy with this, can you push this patchset
through blk tree? As this change only adds calls to
flush_kernel_dcache_page() which is currently implement only on parisc
and arm, I think including this fix into 2.6.17 shouldn't cause too
much trouble.
Thanks.
--
tejun
[1] http://article.gmane.org/gmane.linux.kernel/367509
^ permalink raw reply [flat|nested] 27+ messages in thread* [PATCH 1/5] arm: implement flush_kernel_dcache_page() 2006-06-04 3:41 [PATCHSET] block: fix PIO cache coherency bug, take 2 Tejun Heo @ 2006-06-04 3:41 ` Tejun Heo 2006-06-04 3:49 ` [PATCH 1/5] (REPOST) " Tejun Heo 2006-06-04 6:45 ` [PATCH 1/5] " David Miller 2006-06-04 3:41 ` [PATCH 2/5] ide: add cpu cache flushes after kmapping and modifying a page Tejun Heo ` (4 subsequent siblings) 5 siblings, 2 replies; 27+ messages in thread From: Tejun Heo @ 2006-06-04 3:41 UTC (permalink / raw) To: Jens Axboe, James Bottomley, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, rmk, lkml, linux-ide, linux-scsi Cc: Tejun Heo Implement flush_kernel_dcache_page() for arm. Signed-off-by: Tejun Heo <htejun@gmail.com> --- Documentation/serial/driver | 9 +++--- arch/arm/mach-ixp23xx/core.c | 18 ++--------- arch/sparc64/kernel/head.S | 30 ------------------- arch/sparc64/kernel/setup.c | 23 ++++++++------- arch/sparc64/kernel/smp.c | 16 ++++++++-- block/cfq-iosched.c | 52 +++++++-------------------------- drivers/message/fusion/mptbase.c | 27 ++++------------- drivers/scsi/ppa.c | 7 ---- drivers/scsi/scsi_devinfo.c | 1 - drivers/scsi/scsi_lib.c | 2 + drivers/scsi/scsi_transport_sas.c | 4 +-- include/asm-arm/arch-ixp23xx/memory.h | 2 + include/asm-arm/cacheflush.h | 6 ++++ include/asm-generic/pgtable.h | 11 ++++++- include/asm-mips/pgtable.h | 10 +----- include/asm-sparc64/pgtable.h | 17 ----------- mm/slab.c | 27 +++++++++-------- net/ipv4/tcp_highspeed.c | 3 +- 18 files changed, 85 insertions(+), 180 deletions(-) b088ab1aa777a3168b10dff2aead67fcea7118a5 diff --git a/Documentation/serial/driver b/Documentation/serial/driver index 88ad615..df82116 100644 --- a/Documentation/serial/driver +++ b/Documentation/serial/driver @@ -214,13 +214,12 @@ hardware. The interaction of the iflag bits is as follows (parity error given as an example): Parity error INPCK IGNPAR - n/a 0 n/a character received, marked as + None n/a n/a character received + Yes n/a 0 character discarded + Yes 0 1 character received, marked as TTY_NORMAL - None 1 n/a character received, marked as - TTY_NORMAL - Yes 1 0 character received, marked as + Yes 1 1 character received, marked as TTY_PARITY - Yes 1 1 character discarded Other flags may be used (eg, xon/xoff characters) if your hardware supports hardware "soft" flow control. diff --git a/arch/arm/mach-ixp23xx/core.c b/arch/arm/mach-ixp23xx/core.c index affd1d5..092ee12 100644 --- a/arch/arm/mach-ixp23xx/core.c +++ b/arch/arm/mach-ixp23xx/core.c @@ -178,12 +178,8 @@ static int ixp23xx_irq_set_type(unsigned static void ixp23xx_irq_mask(unsigned int irq) { - volatile unsigned long *intr_reg; + volatile unsigned long *intr_reg = IXP23XX_INTR_EN1 + (irq / 32); - if (irq >= 56) - irq += 8; - - intr_reg = IXP23XX_INTR_EN1 + (irq / 32); *intr_reg &= ~(1 << (irq % 32)); } @@ -203,25 +199,17 @@ static void ixp23xx_irq_ack(unsigned int */ static void ixp23xx_irq_level_unmask(unsigned int irq) { - volatile unsigned long *intr_reg; + volatile unsigned long *intr_reg = IXP23XX_INTR_EN1 + (irq / 32); ixp23xx_irq_ack(irq); - if (irq >= 56) - irq += 8; - - intr_reg = IXP23XX_INTR_EN1 + (irq / 32); *intr_reg |= (1 << (irq % 32)); } static void ixp23xx_irq_edge_unmask(unsigned int irq) { - volatile unsigned long *intr_reg; - - if (irq >= 56) - irq += 8; + volatile unsigned long *intr_reg = IXP23XX_INTR_EN1 + (irq / 32); - intr_reg = IXP23XX_INTR_EN1 + (irq / 32); *intr_reg |= (1 << (irq % 32)); } diff --git a/arch/sparc64/kernel/head.S b/arch/sparc64/kernel/head.S index 31c5892..3eadac5 100644 --- a/arch/sparc64/kernel/head.S +++ b/arch/sparc64/kernel/head.S @@ -10,7 +10,6 @@ #include <linux/config.h> #include <linux/version.h> #include <linux/errno.h> -#include <linux/threads.h> #include <asm/thread_info.h> #include <asm/asi.h> #include <asm/pstate.h> @@ -494,35 +493,6 @@ tlb_fixup_done: call prom_init mov %l7, %o0 ! OpenPROM cif handler - /* Initialize current_thread_info()->cpu as early as possible. - * In order to do that accurately we have to patch up the get_cpuid() - * assembler sequences. And that, in turn, requires that we know - * if we are on a Starfire box or not. While we're here, patch up - * the sun4v sequences as well. - */ - call check_if_starfire - nop - call per_cpu_patch - nop - call sun4v_patch - nop - -#ifdef CONFIG_SMP - call hard_smp_processor_id - nop - cmp %o0, NR_CPUS - blu,pt %xcc, 1f - nop - call boot_cpu_id_too_large - nop - /* Not reached... */ - -1: -#else - mov 0, %o0 -#endif - stb %o0, [%g6 + TI_CPU] - /* Off we go.... */ call start_kernel nop diff --git a/arch/sparc64/kernel/setup.c b/arch/sparc64/kernel/setup.c index 9cf1c88..005167f 100644 --- a/arch/sparc64/kernel/setup.c +++ b/arch/sparc64/kernel/setup.c @@ -220,7 +220,7 @@ char reboot_command[COMMAND_LINE_SIZE]; static struct pt_regs fake_swapper_regs = { { 0, }, 0, 0, 0, 0 }; -void __init per_cpu_patch(void) +static void __init per_cpu_patch(void) { struct cpuid_patch_entry *p; unsigned long ver; @@ -280,7 +280,7 @@ void __init per_cpu_patch(void) } } -void __init sun4v_patch(void) +static void __init sun4v_patch(void) { struct sun4v_1insn_patch_entry *p1; struct sun4v_2insn_patch_entry *p2; @@ -315,15 +315,6 @@ void __init sun4v_patch(void) } } -#ifdef CONFIG_SMP -void __init boot_cpu_id_too_large(int cpu) -{ - prom_printf("Serious problem, boot cpu id (%d) >= NR_CPUS (%d)\n", - cpu, NR_CPUS); - prom_halt(); -} -#endif - void __init setup_arch(char **cmdline_p) { /* Initialize PROM console and command line. */ @@ -341,6 +332,16 @@ #elif defined(CONFIG_PROM_CONSOLE) conswitchp = &prom_con; #endif + /* Work out if we are starfire early on */ + check_if_starfire(); + + /* Now we know enough to patch the get_cpuid sequences + * used by trap code. + */ + per_cpu_patch(); + + sun4v_patch(); + boot_flags_init(*cmdline_p); idprom_init(); diff --git a/arch/sparc64/kernel/smp.c b/arch/sparc64/kernel/smp.c index 4e8cd79..90eaca3 100644 --- a/arch/sparc64/kernel/smp.c +++ b/arch/sparc64/kernel/smp.c @@ -1264,6 +1264,7 @@ void __init smp_tick_init(void) boot_cpu_id = hard_smp_processor_id(); current_tick_offset = timer_tick_offset; + cpu_set(boot_cpu_id, cpu_online_map); prof_counter(boot_cpu_id) = prof_multiplier(boot_cpu_id) = 1; } @@ -1344,6 +1345,18 @@ void __init smp_setup_cpu_possible_map(v void __devinit smp_prepare_boot_cpu(void) { + int cpu = hard_smp_processor_id(); + + if (cpu >= NR_CPUS) { + prom_printf("Serious problem, boot cpu id >= NR_CPUS\n"); + prom_halt(); + } + + current_thread_info()->cpu = cpu; + __local_per_cpu_offset = __per_cpu_offset(cpu); + + cpu_set(smp_processor_id(), cpu_online_map); + cpu_set(smp_processor_id(), phys_cpu_present_map); } int __devinit __cpu_up(unsigned int cpu) @@ -1420,7 +1433,4 @@ #endif for (i = 0; i < NR_CPUS; i++, ptr += size) memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start); - - /* Setup %g5 for the boot cpu. */ - __local_per_cpu_offset = __per_cpu_offset(smp_processor_id()); } diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 8e9d848..11ce6aa 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -133,7 +133,6 @@ struct cfq_data { mempool_t *crq_pool; int rq_in_driver; - int hw_tag; /* * schedule slice state info @@ -501,13 +500,10 @@ static void cfq_resort_rr_list(struct cf /* * if queue was preempted, just add to front to be fair. busy_rr - * isn't sorted, but insert at the back for fairness. + * isn't sorted. */ if (preempted || list == &cfqd->busy_rr) { - if (preempted) - list = list->prev; - - list_add_tail(&cfqq->cfq_list, list); + list_add(&cfqq->cfq_list, list); return; } @@ -668,15 +664,6 @@ static void cfq_activate_request(request struct cfq_data *cfqd = q->elevator->elevator_data; cfqd->rq_in_driver++; - - /* - * If the depth is larger 1, it really could be queueing. But lets - * make the mark a little higher - idling could still be good for - * low queueing, and a low queueing number could also just indicate - * a SCSI mid layer like behaviour where limit+1 is often seen. - */ - if (!cfqd->hw_tag && cfqd->rq_in_driver > 4) - cfqd->hw_tag = 1; } static void cfq_deactivate_request(request_queue_t *q, struct request *rq) @@ -892,13 +879,6 @@ static struct cfq_queue *cfq_set_active_ cfqq = list_entry_cfqq(cfqd->cur_rr.next); /* - * If no new queues are available, check if the busy list has some - * before falling back to idle io. - */ - if (!cfqq && !list_empty(&cfqd->busy_rr)) - cfqq = list_entry_cfqq(cfqd->busy_rr.next); - - /* * if we have idle queues and no rt or be queues had pending * requests, either allow immediate service if the grace period * has passed or arm the idle grace timer @@ -1478,8 +1458,7 @@ retry: * set ->slice_left to allow preemption for a new process */ cfqq->slice_left = 2 * cfqd->cfq_slice_idle; - if (!cfqd->hw_tag) - cfq_mark_cfqq_idle_window(cfqq); + cfq_mark_cfqq_idle_window(cfqq); cfq_mark_cfqq_prio_changed(cfqq); cfq_init_prio_data(cfqq); } @@ -1670,7 +1649,7 @@ cfq_update_idle_window(struct cfq_data * { int enable_idle = cfq_cfqq_idle_window(cfqq); - if (!cic->ioc->task || !cfqd->cfq_slice_idle || cfqd->hw_tag) + if (!cic->ioc->task || !cfqd->cfq_slice_idle) enable_idle = 0; else if (sample_valid(cic->ttime_samples)) { if (cic->ttime_mean > cfqd->cfq_slice_idle) @@ -1761,24 +1740,14 @@ cfq_crq_enqueued(struct cfq_data *cfqd, cfqq->next_crq = cfq_choose_req(cfqd, cfqq->next_crq, crq); - cic = crq->io_context; - /* * we never wait for an async request and we don't allow preemption * of an async request. so just return early */ - if (!cfq_crq_is_sync(crq)) { - /* - * sync process issued an async request, if it's waiting - * then expire it and kick rq handling. - */ - if (cic == cfqd->active_cic && - del_timer(&cfqd->idle_slice_timer)) { - cfq_slice_expired(cfqd, 0); - cfq_start_queueing(cfqd, cfqq); - } + if (!cfq_crq_is_sync(crq)) return; - } + + cic = crq->io_context; cfq_update_io_thinktime(cfqd, cic); cfq_update_io_seektime(cfqd, cic, crq); @@ -2196,9 +2165,10 @@ static void cfq_idle_class_timer(unsigne * race with a non-idle queue, reset timer */ end = cfqd->last_end_request + CFQ_IDLE_GRACE; - if (!time_after_eq(jiffies, end)) - mod_timer(&cfqd->idle_class_timer, end); - else + if (!time_after_eq(jiffies, end)) { + cfqd->idle_class_timer.expires = end; + add_timer(&cfqd->idle_class_timer); + } else cfq_schedule_dispatch(cfqd); spin_unlock_irqrestore(cfqd->queue->queue_lock, flags); diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c index a300840..9080853 100644 --- a/drivers/message/fusion/mptbase.c +++ b/drivers/message/fusion/mptbase.c @@ -1605,21 +1605,6 @@ mpt_resume(struct pci_dev *pdev) } #endif -static int -mpt_signal_reset(int index, MPT_ADAPTER *ioc, int reset_phase) -{ - if ((MptDriverClass[index] == MPTSPI_DRIVER && - ioc->bus_type != SPI) || - (MptDriverClass[index] == MPTFC_DRIVER && - ioc->bus_type != FC) || - (MptDriverClass[index] == MPTSAS_DRIVER && - ioc->bus_type != SAS)) - /* make sure we only call the relevant reset handler - * for the bus */ - return 0; - return (MptResetHandlers[index])(ioc, reset_phase); -} - /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/ /* * mpt_do_ioc_recovery - Initialize or recover MPT adapter. @@ -1900,14 +1885,14 @@ #endif if ((ret == 0) && MptResetHandlers[ii]) { dprintk((MYIOC_s_INFO_FMT "Calling IOC post_reset handler #%d\n", ioc->name, ii)); - rc += mpt_signal_reset(ii, ioc, MPT_IOC_POST_RESET); + rc += (*(MptResetHandlers[ii]))(ioc, MPT_IOC_POST_RESET); handlers++; } if (alt_ioc_ready && MptResetHandlers[ii]) { drsprintk((MYIOC_s_INFO_FMT "Calling alt-%s post_reset handler #%d\n", ioc->name, ioc->alt_ioc->name, ii)); - rc += mpt_signal_reset(ii, ioc->alt_ioc, MPT_IOC_POST_RESET); + rc += (*(MptResetHandlers[ii]))(ioc->alt_ioc, MPT_IOC_POST_RESET); handlers++; } } @@ -3282,11 +3267,11 @@ #endif if (MptResetHandlers[ii]) { dprintk((MYIOC_s_INFO_FMT "Calling IOC pre_reset handler #%d\n", ioc->name, ii)); - r += mpt_signal_reset(ii, ioc, MPT_IOC_PRE_RESET); + r += (*(MptResetHandlers[ii]))(ioc, MPT_IOC_PRE_RESET); if (ioc->alt_ioc) { dprintk((MYIOC_s_INFO_FMT "Calling alt-%s pre_reset handler #%d\n", ioc->name, ioc->alt_ioc->name, ii)); - r += mpt_signal_reset(ii, ioc->alt_ioc, MPT_IOC_PRE_RESET); + r += (*(MptResetHandlers[ii]))(ioc->alt_ioc, MPT_IOC_PRE_RESET); } } } @@ -5721,11 +5706,11 @@ #endif if (MptResetHandlers[ii]) { dtmprintk((MYIOC_s_INFO_FMT "Calling IOC reset_setup handler #%d\n", ioc->name, ii)); - r += mpt_signal_reset(ii, ioc, MPT_IOC_SETUP_RESET); + r += (*(MptResetHandlers[ii]))(ioc, MPT_IOC_SETUP_RESET); if (ioc->alt_ioc) { dtmprintk((MYIOC_s_INFO_FMT "Calling alt-%s setup reset handler #%d\n", ioc->name, ioc->alt_ioc->name, ii)); - r += mpt_signal_reset(ii, ioc->alt_ioc, MPT_IOC_SETUP_RESET); + r += (*(MptResetHandlers[ii]))(ioc->alt_ioc, MPT_IOC_SETUP_RESET); } } } diff --git a/drivers/scsi/ppa.c b/drivers/scsi/ppa.c index 108910f..fee843f 100644 --- a/drivers/scsi/ppa.c +++ b/drivers/scsi/ppa.c @@ -982,12 +982,6 @@ static int device_check(ppa_struct *dev) return -ENODEV; } -static int ppa_adjust_queue(struct scsi_device *device) -{ - blk_queue_bounce_limit(device->request_queue, BLK_BOUNCE_HIGH); - return 0; -} - static struct scsi_host_template ppa_template = { .module = THIS_MODULE, .proc_name = "ppa", @@ -1003,7 +997,6 @@ static struct scsi_host_template ppa_tem .cmd_per_lun = 1, .use_clustering = ENABLE_CLUSTERING, .can_queue = 1, - .slave_alloc = ppa_adjust_queue, }; /*************************************************************************** diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c index 62f8cb7..941c1e1 100644 --- a/drivers/scsi/scsi_devinfo.c +++ b/drivers/scsi/scsi_devinfo.c @@ -165,7 +165,6 @@ static struct { {"HP", "HSV100", NULL, BLIST_REPORTLUN2 | BLIST_NOSTARTONADD}, {"HP", "C1557A", NULL, BLIST_FORCELUN}, {"HP", "C3323-300", "4269", BLIST_NOTQ}, - {"HP", "C5713A", NULL, BLIST_NOREPORTLUN}, {"IBM", "AuSaV1S2", NULL, BLIST_FORCELUN}, {"IBM", "ProFibre 4000R", "*", BLIST_SPARSELUN | BLIST_LARGELUN}, {"IBM", "2105", NULL, BLIST_RETRY_HWERROR}, diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index faee475..764a8b3 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -367,7 +367,7 @@ static int scsi_req_map_sg(struct reques int nsegs, unsigned bufflen, gfp_t gfp) { struct request_queue *q = rq->q; - int nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT; + int nr_pages = (bufflen + PAGE_SIZE - 1) >> PAGE_SHIFT; unsigned int data_len = 0, len, bytes, off; struct page *page; struct bio *bio = NULL; diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c index f3b1606..8b6d65e 100644 --- a/drivers/scsi/scsi_transport_sas.c +++ b/drivers/scsi/scsi_transport_sas.c @@ -955,8 +955,7 @@ static int sas_user_scan(struct Scsi_Hos list_for_each_entry(rphy, &sas_host->rphy_list, list) { struct sas_phy *parent = dev_to_phy(rphy->dev.parent); - if (rphy->identify.device_type != SAS_END_DEVICE || - rphy->scsi_target_id == -1) + if (rphy->scsi_target_id == -1) continue; if ((channel == SCAN_WILD_CARD || channel == parent->port_identifier) && @@ -978,6 +977,7 @@ static int sas_user_scan(struct Scsi_Hos #define SETUP_TEMPLATE(attrb, field, perm, test) \ i->private_##attrb[count] = class_device_attr_##field; \ i->private_##attrb[count].attr.mode = perm; \ + i->private_##attrb[count].store = NULL; \ i->attrb[count] = &i->private_##attrb[count]; \ if (test) \ count++ diff --git a/include/asm-arm/arch-ixp23xx/memory.h b/include/asm-arm/arch-ixp23xx/memory.h index c85fc06..6e19f46 100644 --- a/include/asm-arm/arch-ixp23xx/memory.h +++ b/include/asm-arm/arch-ixp23xx/memory.h @@ -49,7 +49,7 @@ static inline int __ixp23xx_arch_is_cohe { extern unsigned int processor_id; - if (((processor_id & 15) >= 4) || machine_is_roadrunner()) + if (((processor_id & 15) >= 2) || machine_is_roadrunner()) return 1; return 0; diff --git a/include/asm-arm/cacheflush.h b/include/asm-arm/cacheflush.h index 746be56..7ab6ec3 100644 --- a/include/asm-arm/cacheflush.h +++ b/include/asm-arm/cacheflush.h @@ -331,6 +331,12 @@ #define flush_dcache_mmap_lock(mapping) #define flush_dcache_mmap_unlock(mapping) \ write_unlock_irq(&(mapping)->tree_lock) +static inline void flush_kernel_dcache_page(struct page *page) +{ + __cpuc_flush_dcache_page(page_address(page)); +} +#define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE + #define flush_icache_user_range(vma,page,addr,len) \ flush_dcache_page(page) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index c2059a3..358e4d3 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -159,8 +159,17 @@ #ifndef __HAVE_ARCH_LAZY_MMU_PROT_UPDATE #define lazy_mmu_prot_update(pte) do { } while (0) #endif -#ifndef __HAVE_ARCH_MOVE_PTE +#ifndef __HAVE_ARCH_MULTIPLE_ZERO_PAGE #define move_pte(pte, prot, old_addr, new_addr) (pte) +#else +#define move_pte(pte, prot, old_addr, new_addr) \ +({ \ + pte_t newpte = (pte); \ + if (pte_present(pte) && pfn_valid(pte_pfn(pte)) && \ + pte_page(pte) == ZERO_PAGE(old_addr)) \ + newpte = mk_pte(ZERO_PAGE(new_addr), (prot)); \ + newpte; \ +}) #endif /* diff --git a/include/asm-mips/pgtable.h b/include/asm-mips/pgtable.h index f80fe75..174a3cd 100644 --- a/include/asm-mips/pgtable.h +++ b/include/asm-mips/pgtable.h @@ -70,15 +70,7 @@ extern unsigned long zero_page_mask; #define ZERO_PAGE(vaddr) \ (virt_to_page(empty_zero_page + (((unsigned long)(vaddr)) & zero_page_mask))) -#define __HAVE_ARCH_MOVE_PTE -#define move_pte(pte, prot, old_addr, new_addr) \ -({ \ - pte_t newpte = (pte); \ - if (pte_present(pte) && pfn_valid(pte_pfn(pte)) && \ - pte_page(pte) == ZERO_PAGE(old_addr)) \ - newpte = mk_pte(ZERO_PAGE(new_addr), (prot)); \ - newpte; \ -}) +#define __HAVE_ARCH_MULTIPLE_ZERO_PAGE extern void paging_init(void); diff --git a/include/asm-sparc64/pgtable.h b/include/asm-sparc64/pgtable.h index cd464f4..c44e746 100644 --- a/include/asm-sparc64/pgtable.h +++ b/include/asm-sparc64/pgtable.h @@ -689,23 +689,6 @@ static inline void set_pte_at(struct mm_ #define pte_clear(mm,addr,ptep) \ set_pte_at((mm), (addr), (ptep), __pte(0UL)) -#ifdef DCACHE_ALIASING_POSSIBLE -#define __HAVE_ARCH_MOVE_PTE -#define move_pte(pte, prot, old_addr, new_addr) \ -({ \ - pte_t newpte = (pte); \ - if (tlb_type != hypervisor && pte_present(pte)) { \ - unsigned long this_pfn = pte_pfn(pte); \ - \ - if (pfn_valid(this_pfn) && \ - (((old_addr) ^ (new_addr)) & (1 << 13))) \ - flush_dcache_page_all(current->mm, \ - pfn_to_page(this_pfn)); \ - } \ - newpte; \ -}) -#endif - extern pgd_t swapper_pg_dir[2048]; extern pmd_t swapper_low_pmd_dir[2048]; diff --git a/mm/slab.c b/mm/slab.c index f1b644e..d31a06b 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -207,6 +207,11 @@ #define BUFCTL_FREE (((kmem_bufctl_t)(~0 #define BUFCTL_ACTIVE (((kmem_bufctl_t)(~0U))-2) #define SLAB_LIMIT (((kmem_bufctl_t)(~0U))-3) +/* Max number of objs-per-slab for caches which use off-slab slabs. + * Needed to avoid a possible looping condition in cache_grow(). + */ +static unsigned long offslab_limit; + /* * struct slab * @@ -1351,6 +1356,12 @@ void __init kmem_cache_init(void) NULL, NULL); } + /* Inc off-slab bufctl limit until the ceiling is hit. */ + if (!(OFF_SLAB(sizes->cs_cachep))) { + offslab_limit = sizes->cs_size - sizeof(struct slab); + offslab_limit /= sizeof(kmem_bufctl_t); + } + sizes->cs_dmacachep = kmem_cache_create(names->name_dma, sizes->cs_size, ARCH_KMALLOC_MINALIGN, @@ -1769,7 +1780,6 @@ static void set_up_list3s(struct kmem_ca static size_t calculate_slab_order(struct kmem_cache *cachep, size_t size, size_t align, unsigned long flags) { - unsigned long offslab_limit; size_t left_over = 0; int gfporder; @@ -1781,18 +1791,9 @@ static size_t calculate_slab_order(struc if (!num) continue; - if (flags & CFLGS_OFF_SLAB) { - /* - * Max number of objs-per-slab for caches which - * use off-slab slabs. Needed to avoid a possible - * looping condition in cache_grow(). - */ - offslab_limit = size - sizeof(struct slab); - offslab_limit /= sizeof(kmem_bufctl_t); - - if (num > offslab_limit) - break; - } + /* More than offslab_limit objects will cause problems */ + if ((flags & CFLGS_OFF_SLAB) && num > offslab_limit) + break; /* Found something acceptable - save it away */ cachep->num = num; diff --git a/net/ipv4/tcp_highspeed.c b/net/ipv4/tcp_highspeed.c index ba7c63c..b72fa55 100644 --- a/net/ipv4/tcp_highspeed.c +++ b/net/ipv4/tcp_highspeed.c @@ -135,8 +135,7 @@ static void hstcp_cong_avoid(struct sock /* Do additive increase */ if (tp->snd_cwnd < tp->snd_cwnd_clamp) { - /* cwnd = cwnd + a(w) / cwnd */ - tp->snd_cwnd_cnt += ca->ai + 1; + tp->snd_cwnd_cnt += ca->ai; if (tp->snd_cwnd_cnt >= tp->snd_cwnd) { tp->snd_cwnd_cnt -= tp->snd_cwnd; tp->snd_cwnd++; -- 1.3.2 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/5] (REPOST) arm: implement flush_kernel_dcache_page() 2006-06-04 3:41 ` [PATCH 1/5] arm: implement flush_kernel_dcache_page() Tejun Heo @ 2006-06-04 3:49 ` Tejun Heo 2006-06-04 6:45 ` [PATCH 1/5] " David Miller 1 sibling, 0 replies; 27+ messages in thread From: Tejun Heo @ 2006-06-04 3:49 UTC (permalink / raw) To: Jens Axboe, James Bottomley, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, rmk, lkml, linux-ide, linux-scsi Implement flush_kernel_dcache_page() for arm. Signed-off-by: Tejun Heo <htejun@gmail.com> --- Sorry, the patch contained in the previous post was generated against the wrong base. Please ignore it. include/asm-arm/cacheflush.h | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) fad719d62838161fb0d6f306c6e060f8ef2ddfd0 diff --git a/include/asm-arm/cacheflush.h b/include/asm-arm/cacheflush.h index 746be56..7ab6ec3 100644 --- a/include/asm-arm/cacheflush.h +++ b/include/asm-arm/cacheflush.h @@ -331,6 +331,12 @@ #define flush_dcache_mmap_lock(mapping) #define flush_dcache_mmap_unlock(mapping) \ write_unlock_irq(&(mapping)->tree_lock) +static inline void flush_kernel_dcache_page(struct page *page) +{ + __cpuc_flush_dcache_page(page_address(page)); +} +#define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE + #define flush_icache_user_range(vma,page,addr,len) \ flush_dcache_page(page) -- 1.3.2 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/5] arm: implement flush_kernel_dcache_page() 2006-06-04 3:41 ` [PATCH 1/5] arm: implement flush_kernel_dcache_page() Tejun Heo 2006-06-04 3:49 ` [PATCH 1/5] (REPOST) " Tejun Heo @ 2006-06-04 6:45 ` David Miller 2006-06-04 6:53 ` Tejun Heo 1 sibling, 1 reply; 27+ messages in thread From: David Miller @ 2006-06-04 6:45 UTC (permalink / raw) To: htejun Cc: axboe, James.Bottomley, davem, bzolnier, james.steward, jgarzik, mattjreimer, g.liakhovetski, rmk, linux-kernel, linux-ide, linux-scsi From: Tejun Heo <htejun@gmail.com> Date: Sun, 4 Jun 2006 12:41:19 +0900 > arch/sparc64/kernel/head.S | 30 ------------------- > arch/sparc64/kernel/setup.c | 23 ++++++++------- > arch/sparc64/kernel/smp.c | 16 ++++++++-- You're reverting a totally unrelated sparc64 bug fix in Linus's tree. Be careful in how you generate your patches. Thanks :) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/5] arm: implement flush_kernel_dcache_page() 2006-06-04 6:45 ` [PATCH 1/5] " David Miller @ 2006-06-04 6:53 ` Tejun Heo 2006-06-04 7:04 ` David Miller 0 siblings, 1 reply; 27+ messages in thread From: Tejun Heo @ 2006-06-04 6:53 UTC (permalink / raw) To: David Miller Cc: axboe, James.Bottomley, davem, bzolnier, james.steward, jgarzik, mattjreimer, g.liakhovetski, rmk, linux-kernel, linux-ide, linux-scsi David Miller wrote: > From: Tejun Heo <htejun@gmail.com> > Date: Sun, 4 Jun 2006 12:41:19 +0900 > >> arch/sparc64/kernel/head.S | 30 ------------------- >> arch/sparc64/kernel/setup.c | 23 ++++++++------- >> arch/sparc64/kernel/smp.c | 16 ++++++++-- > > You're reverting a totally unrelated sparc64 bug fix > in Linus's tree. > > Be careful in how you generate your patches. Yeap, HEAD and index got out of sync while fetching & checking out. I've sent a regenerated patch as a reply to the original message. Haven't you received it? Sorry about the noise. -- tejun ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/5] arm: implement flush_kernel_dcache_page() 2006-06-04 6:53 ` Tejun Heo @ 2006-06-04 7:04 ` David Miller 0 siblings, 0 replies; 27+ messages in thread From: David Miller @ 2006-06-04 7:04 UTC (permalink / raw) To: htejun Cc: axboe, James.Bottomley, davem, bzolnier, james.steward, jgarzik, mattjreimer, g.liakhovetski, rmk, linux-kernel, linux-ide, linux-scsi From: Tejun Heo <htejun@gmail.com> Date: Sun, 04 Jun 2006 15:53:56 +0900 > Yeap, HEAD and index got out of sync while fetching & checking out. > I've sent a regenerated patch as a reply to the original message. > Haven't you received it? Yeah I saw it right after I wrote that email :) ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 2/5] ide: add cpu cache flushes after kmapping and modifying a page 2006-06-04 3:41 [PATCHSET] block: fix PIO cache coherency bug, take 2 Tejun Heo 2006-06-04 3:41 ` [PATCH 1/5] arm: implement flush_kernel_dcache_page() Tejun Heo @ 2006-06-04 3:41 ` Tejun Heo 2006-06-04 8:17 ` Christoph Hellwig 2006-06-04 3:41 ` [PATCH 3/5] libata: " Tejun Heo ` (3 subsequent siblings) 5 siblings, 1 reply; 27+ messages in thread From: Tejun Heo @ 2006-06-04 3:41 UTC (permalink / raw) To: Jens Axboe, James Bottomley, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, rmk, lkml, linux-ide, linux-scsi Cc: Tejun Heo Add calls to flush_kernel_dcache_page() after CPU has kmapped and modified a page. This fixes PIO cache coherency bugs on architectures with aliased caches. Signed-off-by: Tejun Heo <htejun@gmail.com> --- drivers/ide/ide-floppy.c | 1 + drivers/ide/ide-taskfile.c | 2 ++ 2 files changed, 3 insertions(+), 0 deletions(-) 861367f65bbbbc5c9f5d3a27aab91c587a3a9049 diff --git a/drivers/ide/ide-floppy.c b/drivers/ide/ide-floppy.c index a53e3ce..5be22c2 100644 --- a/drivers/ide/ide-floppy.c +++ b/drivers/ide/ide-floppy.c @@ -618,6 +618,7 @@ static void idefloppy_input_buffers (ide data = bvec_kmap_irq(bvec, &flags); drive->hwif->atapi_input_bytes(drive, data, count); + flush_kernel_dcache_page(kmap_atomic_to_page(data)); bvec_kunmap_irq(data, &flags); bcount -= count; diff --git a/drivers/ide/ide-taskfile.c b/drivers/ide/ide-taskfile.c index 9233b81..c183c07 100644 --- a/drivers/ide/ide-taskfile.c +++ b/drivers/ide/ide-taskfile.c @@ -294,6 +294,8 @@ #endif else taskfile_input_data(drive, buf, SECTOR_WORDS); + if (!write) + flush_kernel_dcache_page(kmap_atomic_to_page(buf)); kunmap_atomic(buf, KM_BIO_SRC_IRQ); #ifdef CONFIG_HIGHMEM local_irq_restore(flags); -- 1.3.2 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 2/5] ide: add cpu cache flushes after kmapping and modifying a page 2006-06-04 3:41 ` [PATCH 2/5] ide: add cpu cache flushes after kmapping and modifying a page Tejun Heo @ 2006-06-04 8:17 ` Christoph Hellwig 2006-06-04 9:09 ` Tejun Heo 0 siblings, 1 reply; 27+ messages in thread From: Christoph Hellwig @ 2006-06-04 8:17 UTC (permalink / raw) To: Tejun Heo Cc: Jens Axboe, James Bottomley, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, rmk, lkml, linux-ide, linux-scsi On Sun, Jun 04, 2006 at 12:41:20PM +0900, Tejun Heo wrote: > data = bvec_kmap_irq(bvec, &flags); > drive->hwif->atapi_input_bytes(drive, data, count); > + flush_kernel_dcache_page(kmap_atomic_to_page(data)); > bvec_kunmap_irq(data, &flags); shouldn't bvec_kunmap_irq do the flush_kernel_dcache_page call? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/5] ide: add cpu cache flushes after kmapping and modifying a page 2006-06-04 8:17 ` Christoph Hellwig @ 2006-06-04 9:09 ` Tejun Heo 0 siblings, 0 replies; 27+ messages in thread From: Tejun Heo @ 2006-06-04 9:09 UTC (permalink / raw) To: Christoph Hellwig, Tejun Heo, Jens Axboe, James Bottomley, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, rmk, lkml, linux-ide, linux-scsi Christoph Hellwig wrote: > On Sun, Jun 04, 2006 at 12:41:20PM +0900, Tejun Heo wrote: >> data = bvec_kmap_irq(bvec, &flags); >> drive->hwif->atapi_input_bytes(drive, data, count); >> + flush_kernel_dcache_page(kmap_atomic_to_page(data)); >> bvec_kunmap_irq(data, &flags); > > shouldn't bvec_kunmap_irq do the flush_kernel_dcache_page call? > Eventually, yes. At the moment, not all archs implement flush_kernel_dcache_page(), so converting kmap(); modify buffer; flush_dcache_page(); kunmap(); to kmap_wrapper(); modify buffer; kunmap_wrapper_which_calls_flush_kernel_dcache_page() breaks cache coherency on those archs. The current patches simply add calls to flush_kernel_dcache_page() where missing such that it doesn't break anything while fixing cache coherency for arm and parisc. In the long term... 1. implement flush_kernel_dcache_page() for all needed archs 2. update kmap interface such that the caller is mandated to specify whether the buffer has been modified or not when unmapping (maybe addition of simple boolean argument?) 3. update bvec_kmap_*() similarly 4. update all calls to kunmap & friends. Thanks. -- tejun ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 3/5] libata: add cpu cache flushes after kmapping and modifying a page 2006-06-04 3:41 [PATCHSET] block: fix PIO cache coherency bug, take 2 Tejun Heo 2006-06-04 3:41 ` [PATCH 1/5] arm: implement flush_kernel_dcache_page() Tejun Heo 2006-06-04 3:41 ` [PATCH 2/5] ide: add cpu cache flushes after kmapping and modifying a page Tejun Heo @ 2006-06-04 3:41 ` Tejun Heo 2006-06-04 3:41 ` [PATCH 5/5] md: " Tejun Heo ` (2 subsequent siblings) 5 siblings, 0 replies; 27+ messages in thread From: Tejun Heo @ 2006-06-04 3:41 UTC (permalink / raw) To: Jens Axboe, James Bottomley, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, rmk, lkml, linux-ide, linux-scsi Cc: Tejun Heo Add calls to flush_kernel_dcache_page() after CPU has kmapped and modified a page. This fixes PIO cache coherency bugs on architectures with aliased caches. Signed-off-by: Tejun Heo <htejun@gmail.com> --- drivers/scsi/libata-core.c | 5 +++++ drivers/scsi/libata-scsi.c | 1 + 2 files changed, 6 insertions(+), 0 deletions(-) cc874e5080d87eff23a1576df11ddaaeae9575ec diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c index b046ffa..47eb263 100644 --- a/drivers/scsi/libata-core.c +++ b/drivers/scsi/libata-core.c @@ -2821,6 +2821,7 @@ static void ata_sg_clean(struct ata_queu struct scatterlist *psg = &qc->pad_sgent; void *addr = kmap_atomic(psg->page, KM_IRQ0); memcpy(addr + psg->offset, pad_buf, qc->pad_len); + flush_kernel_dcache_page(kmap_atomic_to_page(addr)); kunmap_atomic(addr, KM_IRQ0); } } else { @@ -3451,6 +3452,8 @@ static void ata_pio_sector(struct ata_qu do_write = (qc->tf.flags & ATA_TFLAG_WRITE); ata_data_xfer(ap, buf, ATA_SECT_SIZE, do_write); + if (!do_write) + flush_kernel_dcache_page(page); kunmap(page); } @@ -3533,6 +3536,8 @@ next_sg: /* do the actual data transfer */ ata_data_xfer(ap, buf, count, do_write); + if (!do_write) + flush_kernel_dcache_page(page); kunmap(page); if (bytes) diff --git a/drivers/scsi/libata-scsi.c b/drivers/scsi/libata-scsi.c index a0289ec..b65d7f5 100644 --- a/drivers/scsi/libata-scsi.c +++ b/drivers/scsi/libata-scsi.c @@ -1500,6 +1500,7 @@ static inline void ata_scsi_rbuf_put(str struct scatterlist *sg; sg = (struct scatterlist *) cmd->request_buffer; + flush_kernel_dcache_page(kmap_atomic_to_page(buf - sg->offset)); kunmap_atomic(buf - sg->offset, KM_USER0); } } -- 1.3.2 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 5/5] md: add cpu cache flushes after kmapping and modifying a page 2006-06-04 3:41 [PATCHSET] block: fix PIO cache coherency bug, take 2 Tejun Heo ` (2 preceding siblings ...) 2006-06-04 3:41 ` [PATCH 3/5] libata: " Tejun Heo @ 2006-06-04 3:41 ` Tejun Heo 2006-06-04 3:41 ` [PATCH 4/5] SCSI: " Tejun Heo 2006-06-04 20:44 ` [PATCHSET] block: fix PIO cache coherency bug, take 2 Russell King 5 siblings, 0 replies; 27+ messages in thread From: Tejun Heo @ 2006-06-04 3:41 UTC (permalink / raw) To: Jens Axboe, James Bottomley, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, rmk, lkml, linux-ide, linux-scsi Cc: Tejun Heo Add calls to flush_kernel_dcache_page() after CPU has kmapped and modified a page. This fixes PIO cache coherency bugs on architectures with aliased caches. Signed-off-by: Tejun Heo <htejun@gmail.com> --- drivers/md/raid1.c | 1 + drivers/md/raid5.c | 6 ++++-- drivers/md/raid6main.c | 6 ++++-- 3 files changed, 9 insertions(+), 4 deletions(-) 716500bdf7de6acb87e36c8146d83dd3c429bc82 diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 4070eff..30ca7cf 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -720,6 +720,7 @@ static struct page **alloc_behind_pages( goto do_sync_io; memcpy(kmap(pages[i]) + bvec->bv_offset, kmap(bvec->bv_page) + bvec->bv_offset, bvec->bv_len); + flush_kernel_dcache_page(pages[i]); kunmap(pages[i]); kunmap(bvec->bv_page); } diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 3184360..3adb64f 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -813,10 +813,12 @@ static void copy_data(int frombio, struc if (clen > 0) { char *ba = __bio_kmap_atomic(bio, i, KM_USER0); - if (frombio) + if (frombio) { memcpy(pa+page_offset, ba+b_offset, clen); - else + } else { memcpy(ba+b_offset, pa+page_offset, clen); + flush_kernel_dcache_page(kmap_atomic_to_page(ba)); + } __bio_kunmap_atomic(ba, KM_USER0); } if (clen < len) /* hit end of page */ diff --git a/drivers/md/raid6main.c b/drivers/md/raid6main.c index bc69355..b9700bd 100644 --- a/drivers/md/raid6main.c +++ b/drivers/md/raid6main.c @@ -727,10 +727,12 @@ static void copy_data(int frombio, struc if (clen > 0) { char *ba = __bio_kmap_atomic(bio, i, KM_USER0); - if (frombio) + if (frombio) { memcpy(pa+page_offset, ba+b_offset, clen); - else + } else { memcpy(ba+b_offset, pa+page_offset, clen); + flush_kernel_dcache_page(kmap_atomic_to_page(ba)); + } __bio_kunmap_atomic(ba, KM_USER0); } if (clen < len) /* hit end of page */ -- 1.3.2 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 4/5] SCSI: add cpu cache flushes after kmapping and modifying a page 2006-06-04 3:41 [PATCHSET] block: fix PIO cache coherency bug, take 2 Tejun Heo ` (3 preceding siblings ...) 2006-06-04 3:41 ` [PATCH 5/5] md: " Tejun Heo @ 2006-06-04 3:41 ` Tejun Heo 2006-06-04 8:20 ` Christoph Hellwig 2006-06-04 20:44 ` [PATCHSET] block: fix PIO cache coherency bug, take 2 Russell King 5 siblings, 1 reply; 27+ messages in thread From: Tejun Heo @ 2006-06-04 3:41 UTC (permalink / raw) To: Jens Axboe, James Bottomley, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, rmk, lkml, linux-ide, linux-scsi Cc: Tejun Heo Add calls to flush_kernel_dcache_page() after CPU has kmapped and modified a page. This fixes PIO cache coherency bugs on architectures with aliased caches. Signed-off-by: Tejun Heo <htejun@gmail.com> --- drivers/scsi/3w-9xxx.c | 1 + drivers/scsi/3w-xxxx.c | 1 + drivers/scsi/aacraid/aachba.c | 4 +++- drivers/scsi/ide-scsi.c | 1 + drivers/scsi/ips.c | 2 ++ drivers/scsi/iscsi_tcp.c | 1 + drivers/scsi/megaraid.c | 2 ++ drivers/scsi/qlogicpti.c | 1 + drivers/scsi/scsi_debug.c | 1 + drivers/scsi/scsi_lib.c | 1 + 10 files changed, 14 insertions(+), 1 deletions(-) 9b4bdd1409efb726d4a6561a4f7e2aff878ab4f4 diff --git a/drivers/scsi/3w-9xxx.c b/drivers/scsi/3w-9xxx.c index caeb6d2..172f16b 100644 --- a/drivers/scsi/3w-9xxx.c +++ b/drivers/scsi/3w-9xxx.c @@ -1948,6 +1948,7 @@ static void twa_scsiop_execute_scsi_comp local_irq_save(flags); buf = kmap_atomic(sg->page, KM_IRQ0) + sg->offset; memcpy(buf, tw_dev->generic_buffer_virt[request_id], sg->length); + flush_kernel_dcache_page(kmap_atomic_to_page(buf - sg->offset)); kunmap_atomic(buf - sg->offset, KM_IRQ0); local_irq_restore(flags); } diff --git a/drivers/scsi/3w-xxxx.c b/drivers/scsi/3w-xxxx.c index e8e41e6..8449551 100644 --- a/drivers/scsi/3w-xxxx.c +++ b/drivers/scsi/3w-xxxx.c @@ -1527,6 +1527,7 @@ static void tw_transfer_internal(TW_Devi struct scatterlist *sg; sg = (struct scatterlist *)cmd->request_buffer; + flush_kernel_dcache_page(kmap_atomic_to_page(buf - sg->offset)); kunmap_atomic(buf - sg->offset, KM_IRQ0); local_irq_restore(flags); } diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c index 642a3b4..b7c00b8 100644 --- a/drivers/scsi/aacraid/aachba.c +++ b/drivers/scsi/aacraid/aachba.c @@ -376,8 +376,10 @@ static void aac_internal_transfer(struct memcpy(buf + offset, data, transfer_len - offset); - if (scsicmd->use_sg) + if (scsicmd->use_sg) { + flush_kernel_dcache_page(kmap_atomic_to_page(buf - sg->offset)); kunmap_atomic(buf - sg->offset, KM_IRQ0); + } } diff --git a/drivers/scsi/ide-scsi.c b/drivers/scsi/ide-scsi.c index 39b760a..9c28b95 100644 --- a/drivers/scsi/ide-scsi.c +++ b/drivers/scsi/ide-scsi.c @@ -189,6 +189,7 @@ static void idescsi_input_buffers (ide_d pc->sg->offset; drive->hwif->atapi_input_bytes(drive, buf + pc->b_count, count); + flush_kernel_dcache_page(kmap_atomic_to_page(buf - pc->sg->offset)); kunmap_atomic(buf - pc->sg->offset, KM_IRQ0); local_irq_restore(flags); } else { diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c index a4c0b04..29eb3f0 100644 --- a/drivers/scsi/ips.c +++ b/drivers/scsi/ips.c @@ -3682,6 +3682,8 @@ ips_scmd_buf_write(Scsi_Cmnd * scmd, voi local_irq_save(flags); buffer = kmap_atomic(sg[i].page, KM_IRQ0) + sg[i].offset; memcpy(buffer, &cdata[xfer_cnt], min_cnt); + flush_kernel_dcache_page( + kmap_atomic_to_page(buffer - sg[i].offset)); kunmap_atomic(buffer - sg[i].offset, KM_IRQ0); local_irq_restore(flags); diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c index 2068b66..ae9784c 100644 --- a/drivers/scsi/iscsi_tcp.c +++ b/drivers/scsi/iscsi_tcp.c @@ -945,6 +945,7 @@ static int iscsi_scsi_data_in(struct isc dest = kmap_atomic(sg[i].page, KM_SOFTIRQ0); rc = iscsi_ctask_copy(conn, ctask, dest + sg[i].offset, sg[i].length, offset); + flush_kernel_dcache_page(kmap_atomic_to_page(dest)); kunmap_atomic(dest, KM_SOFTIRQ0); if (rc == -EAGAIN) /* continue with the next SKB/PDU */ diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c index de35ffe..7cb7590 100644 --- a/drivers/scsi/megaraid.c +++ b/drivers/scsi/megaraid.c @@ -671,6 +671,8 @@ #endif struct scatterlist *sg; sg = (struct scatterlist *)cmd->request_buffer; + flush_kernel_dcache_page( + kmap_atomic_to_page(buf - sg->offset)); kunmap_atomic(buf - sg->offset, KM_IRQ0); } cmd->result = (DID_OK << 16); diff --git a/drivers/scsi/qlogicpti.c b/drivers/scsi/qlogicpti.c index c7e78dc..f8201f2 100644 --- a/drivers/scsi/qlogicpti.c +++ b/drivers/scsi/qlogicpti.c @@ -1146,6 +1146,7 @@ static void scsi_rbuf_put(struct scsi_cm struct scatterlist *sg; sg = (struct scatterlist *) cmd->request_buffer; + flush_kernel_dcache_page(kmap_atomic_to_page(buf - sg->offset)); kunmap_atomic(buf - sg->offset, KM_IRQ0); } } diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c index 5a5d2af..88543db 100644 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -511,6 +511,7 @@ static int fill_from_dev_buffer(struct s len = arr_len - req_len; } memcpy(kaddr_off, arr + req_len, len); + flush_kernel_dcache_page(kmap_atomic_to_page(kaddr)); kunmap_atomic(kaddr, KM_USER0); act_len += len; } diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 764a8b3..8bb2f6c 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -945,6 +945,7 @@ void scsi_io_completion(struct scsi_cmnd unsigned long flags; char *to = bio_kmap_irq(req->bio, &flags); memcpy(to, cmd->buffer, cmd->bufflen); + flush_kernel_dcache_page(kmap_atomic_to_page(to)); bio_kunmap_irq(to, &flags); } kfree(cmd->buffer); -- 1.3.2 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 4/5] SCSI: add cpu cache flushes after kmapping and modifying a page 2006-06-04 3:41 ` [PATCH 4/5] SCSI: " Tejun Heo @ 2006-06-04 8:20 ` Christoph Hellwig 2006-06-04 9:13 ` Tejun Heo 0 siblings, 1 reply; 27+ messages in thread From: Christoph Hellwig @ 2006-06-04 8:20 UTC (permalink / raw) To: Tejun Heo Cc: Jens Axboe, James Bottomley, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, rmk, lkml, linux-ide, linux-scsi On Sun, Jun 04, 2006 at 12:41:20PM +0900, Tejun Heo wrote: > local_irq_save(flags); > buf = kmap_atomic(sg->page, KM_IRQ0) + sg->offset; > memcpy(buf, tw_dev->generic_buffer_virt[request_id], sg->length); > + flush_kernel_dcache_page(kmap_atomic_to_page(buf - sg->offset)); > kunmap_atomic(buf - sg->offset, KM_IRQ0); > local_irq_restore(flags); all these should switch to scsi_kmap_atomic_sg which should do the flush_kernel_dcache_page call for you. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 4/5] SCSI: add cpu cache flushes after kmapping and modifying a page 2006-06-04 8:20 ` Christoph Hellwig @ 2006-06-04 9:13 ` Tejun Heo 2006-06-04 20:24 ` Guennadi Liakhovetski 0 siblings, 1 reply; 27+ messages in thread From: Tejun Heo @ 2006-06-04 9:13 UTC (permalink / raw) To: Christoph Hellwig, Jens Axboe, James Bottomley, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, rmk, lkml, linux-ide, linux-scsi Christoph Hellwig wrote: > On Sun, Jun 04, 2006 at 12:41:20PM +0900, Tejun Heo wrote: >> local_irq_save(flags); >> buf = kmap_atomic(sg->page, KM_IRQ0) + sg->offset; >> memcpy(buf, tw_dev->generic_buffer_virt[request_id], sg->length); >> + flush_kernel_dcache_page(kmap_atomic_to_page(buf - sg->offset)); >> kunmap_atomic(buf - sg->offset, KM_IRQ0); >> local_irq_restore(flags); > > all these should switch to scsi_kmap_atomic_sg which should do the > flush_kernel_dcache_page call for you. > This is not specific to scsi or block. This is a common problem for all kmap users. As I wrote in the other mail, I think this should be mandated at the kmap/kunmap() interface. -- tejun ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 4/5] SCSI: add cpu cache flushes after kmapping and modifying a page 2006-06-04 9:13 ` Tejun Heo @ 2006-06-04 20:24 ` Guennadi Liakhovetski 0 siblings, 0 replies; 27+ messages in thread From: Guennadi Liakhovetski @ 2006-06-04 20:24 UTC (permalink / raw) To: Tejun Heo Cc: Christoph Hellwig, Jens Axboe, James Bottomley, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, rmk, lkml, linux-ide, linux-scsi On Sun, 4 Jun 2006, Tejun Heo wrote: > Christoph Hellwig wrote: > > On Sun, Jun 04, 2006 at 12:41:20PM +0900, Tejun Heo wrote: > > > local_irq_save(flags); > > > buf = kmap_atomic(sg->page, KM_IRQ0) + sg->offset; > > > memcpy(buf, tw_dev->generic_buffer_virt[request_id], > > > sg->length); > > > + flush_kernel_dcache_page(kmap_atomic_to_page(buf - > > > sg->offset)); > > > kunmap_atomic(buf - sg->offset, KM_IRQ0); > > > local_irq_restore(flags); > > > > all these should switch to scsi_kmap_atomic_sg which should do the > > flush_kernel_dcache_page call for you. > > > > This is not specific to scsi or block. This is a common problem for all kmap > users. As I wrote in the other mail, I think this should be mandated at the > kmap/kunmap() interface. Right. As I wrote scsi_k(un)map_atomic_sg I did mention that they, probably, should go to a higher layer as they were not scsi-specific, but as I didn't have a good idea of where exactly to put them, I called them scsi_* and put in scsi_lib.c. Suggestions for a better place and namespace for them very welcome. Or just feel free to move / rename them as you see appropriate. See, e.g., http://marc.theaimsgroup.com/?l=linux-scsi&m=112345886816099&w=2 and the related thread from August last year for possible other potential users of this API. Thanks Guennadi --- Guennadi Liakhovetski ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHSET] block: fix PIO cache coherency bug, take 2 2006-06-04 3:41 [PATCHSET] block: fix PIO cache coherency bug, take 2 Tejun Heo ` (4 preceding siblings ...) 2006-06-04 3:41 ` [PATCH 4/5] SCSI: " Tejun Heo @ 2006-06-04 20:44 ` Russell King 2006-06-04 22:23 ` Russell King 2006-06-05 13:43 ` James Bottomley 5 siblings, 2 replies; 27+ messages in thread From: Russell King @ 2006-06-04 20:44 UTC (permalink / raw) To: Tejun Heo Cc: Jens Axboe, James Bottomley, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, lkml, linux-ide, linux-scsi On Sun, Jun 04, 2006 at 12:41:19PM +0900, Tejun Heo wrote: > Russell, can you please verify arm's flush_kernel_dcache_page()? That should be fine from a theoretical standpoint, but I can't say much more than that - I have _great_ difficulty in reproducing the problem with IDE and as such I consider myself out of the game of testing for this problem: | Date: Fri, 13 Jan 2006 22:02:15 +0000 | From: Russell King <rmk+lkml@arm.linux.org.uk> | To: Tejun Heo <htejun@gmail.com> | Subject: Re: [PATCHSET] block: fix PIO cache coherency bug | | On Sat, Jan 14, 2006 at 12:24:16AM +0900, Tejun Heo wrote: | > Russell, can you please test whether this fixes the bug on arm? If | > this fixes the bug and people agree with the approach, I'll follow up | > with patches for yet unconverted drivers and documentation update. | | Unfortunately, as I previously explained, I'm not able to test this. | The reason is that in order to reproduce the bug, you need a system | with a VIVT write-back write-allocate cache. | | Unfortunately, the few systems I have which have such a cache do not | have IDE, SCSI nor SATA (not even PCMCIA.) I suggest contacting the | folk who reported the bug in the first instance. You need to approach other members of the ARM community to test these patches. Unfortunately I don't have a list of who has found the problem and who is in a state to be able to reproduce it - since most members are embedded engineers, they tend to move on to other projects quite rapidly. What I suggest is that we just throw _something_ which looks right into the kernel and see what happens. I can't see any other possible way to proceed, _especially_ as we've had 6 months of very little progress on this issue. > I tried to implement flush_anon_page() too but didn't know what to do > with anon_vma object. I'm not sure what this is about... -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHSET] block: fix PIO cache coherency bug, take 2 2006-06-04 20:44 ` [PATCHSET] block: fix PIO cache coherency bug, take 2 Russell King @ 2006-06-04 22:23 ` Russell King 2006-06-05 14:27 ` James Bottomley 2006-06-05 13:43 ` James Bottomley 1 sibling, 1 reply; 27+ messages in thread From: Russell King @ 2006-06-04 22:23 UTC (permalink / raw) To: Tejun Heo, Jens Axboe, James Bottomley, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, lkml, linux-ide, linux-scsi On Sun, Jun 04, 2006 at 09:44:44PM +0100, Russell King wrote: > On Sun, Jun 04, 2006 at 12:41:19PM +0900, Tejun Heo wrote: > > Russell, can you please verify arm's flush_kernel_dcache_page()? > > That should be fine from a theoretical standpoint I'll add to this statement that the cache flushing on ARM is only ever required when the page ends up in userspace - if we're reading a page into the page cache to throw it out via NFS or sendfile then the cache flush is a complete waste of time. Why is this? Well, if the data has been written to the kernel mapping by the CPU, it may be contained in the cache lines corresponding to that mapping. When that memory is passed to a network interface to send, the network interface either reads data from the kernel mapping via PIO (in which case it reads the data from the cache), or it performs DMA. In the case of DMA, the DMA API handles the cache coherency issues with respect to dirty data in the kernel mapping. Moreover, there's the problem of read-ahead. When data is read from a block device, more data than that which is required is usually read in case it's required a short time later. If the data is not required, it is thrown away during an eviction cycle. However, any cache flushing that you've performed for uses "other than kernel space" is then a waste of resources - the only time that such handling is needed is when the data is actually used for these other uses - which in the case of ARM means userspace. Given the above, I believe that the method being proposed will be _far_ too expensive, maybe to the point where (eg) disabling read- ahead _entirely_ on ARM makes the system overall more efficient. In this respect, I continue to believe that the way ARM (in principle) does flush_dcache_page() is what is required here - if the page has not been mapped into userspace, it merely marks the page as containing dirty cache lines, and the resulting cache maintainence will only happen when (and if) the page really does get mapped into userspace. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHSET] block: fix PIO cache coherency bug, take 2 2006-06-04 22:23 ` Russell King @ 2006-06-05 14:27 ` James Bottomley 2006-06-05 14:44 ` Russell King 0 siblings, 1 reply; 27+ messages in thread From: James Bottomley @ 2006-06-05 14:27 UTC (permalink / raw) To: Russell King Cc: Tejun Heo, Jens Axboe, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, lkml, linux-ide, linux-scsi On Sun, 2006-06-04 at 23:23 +0100, Russell King wrote: > I'll add to this statement that the cache flushing on ARM is only > ever required when the page ends up in userspace - if we're reading > a page into the page cache to throw it out via NFS or sendfile then > the cache flush is a complete waste of time. Right .. and this is the scenario. There are two cases where devices kmap a user page into kernel space and then proceed to read from or write to it (flush_dcache_page() is specifically for the latter because the user won't see the data the kernel just wrote unless this happens because kernel and user addresses aren't congruent on parisc). The first case is manufactured data (such as command emulation) and the second is pio data rather than DMA (such as command re-completion or IDE). > In this respect, I continue to believe that the way ARM (in principle) > does flush_dcache_page() is what is required here - if the page has > not been mapped into userspace, it merely marks the page as containing > dirty cache lines, and the resulting cache maintainence will only > happen when (and if) the page really does get mapped into userspace. For this particular scenario, the page is almost always mapped initially in user space because the user is requesting the I/O to a given userspace address ... get_user_pages() ensures that it is allocated and flushed before being passed to IDE or SCSI. The problem on parisc, however, is not that userspace doesn't see the page as dirty, it's that we've dirtied the page through the kernel mappings, so userspace itself cannot possibly see the change until the cache over the kernel address is flushed (the userspace and kernel space addresses are not congruent in our cache scheme, so get separate cache lines). James ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHSET] block: fix PIO cache coherency bug, take 2 2006-06-05 14:27 ` James Bottomley @ 2006-06-05 14:44 ` Russell King 2006-06-05 15:24 ` James Bottomley 0 siblings, 1 reply; 27+ messages in thread From: Russell King @ 2006-06-05 14:44 UTC (permalink / raw) To: James Bottomley Cc: Tejun Heo, Jens Axboe, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, lkml, linux-ide, linux-scsi On Mon, Jun 05, 2006 at 09:27:36AM -0500, James Bottomley wrote: > On Sun, 2006-06-04 at 23:23 +0100, Russell King wrote: > > I'll add to this statement that the cache flushing on ARM is only > > ever required when the page ends up in userspace - if we're reading > > a page into the page cache to throw it out via NFS or sendfile then > > the cache flush is a complete waste of time. > > Right .. and this is the scenario. There are two cases where devices > kmap a user page into kernel space and then proceed to read from or > write to it (flush_dcache_page() is specifically for the latter because > the user won't see the data the kernel just wrote unless this happens > because kernel and user addresses aren't congruent on parisc). > > The first case is manufactured data (such as command emulation) and the > second is pio data rather than DMA (such as command re-completion or > IDE). When a user program wants to obtain data from a block device, there are two ways it goes about it: 1. via read(). Read copies the data out of the kernel mapping of the page cache, so there's no coherency issues as far as PIO is concerned. 2. via a page which has been mmap()'d. In this case, we are performing a "PIO read from device write" operation to page to fill the page cache with data, which must complete _before_ we hand the page to userspace. In neither case will the page be available to the user before the PIO operation has been completed - if it was, there would be one very big security hole since the previous data would be visible. Hence I find your comment "There are two cases where devices kmap a user page into kernel space and then proceed to read from or write to it" to be misleading - at the exact point in time that the device driver is manipulating the data in that page, it is not a user page. > > In this respect, I continue to believe that the way ARM (in principle) > > does flush_dcache_page() is what is required here - if the page has > > not been mapped into userspace, it merely marks the page as containing > > dirty cache lines, and the resulting cache maintainence will only > > happen when (and if) the page really does get mapped into userspace. > > For this particular scenario, the page is almost always mapped initially > in user space because the user is requesting the I/O to a given > userspace address ... Here we fundamentally disagree - and I'm afraid that it seems that you didn't actually read what I wrote since there's an obvious disparity between me saying "if the page has not been mapped into userspace" and you saying "the page is almost always mapped initially in user space" - we're _definitely_ talking about different things here. How can we proceed with this if this kind of misintepretation is rampant? -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHSET] block: fix PIO cache coherency bug, take 2 2006-06-05 14:44 ` Russell King @ 2006-06-05 15:24 ` James Bottomley 2006-06-05 15:34 ` Russell King 0 siblings, 1 reply; 27+ messages in thread From: James Bottomley @ 2006-06-05 15:24 UTC (permalink / raw) To: Russell King Cc: Tejun Heo, Jens Axboe, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, lkml, linux-ide, linux-scsi On Mon, 2006-06-05 at 15:44 +0100, Russell King wrote: > Hence I find your comment "There are two cases where devices kmap a > user page into kernel space and then proceed to read from or write to > it" to be misleading - at the exact point in time that the device > driver is manipulating the data in that page, it is not a user page. zero copy doesn't quite follow that ownership model. We don't really do anything to block user access to the page while I/O is underway (the only time we actually do this is the nopage stuff) if the user wants do do something stupid like write to a page they've asked us to read data into, the resulting coherency cockup is their lookout, and which data actually ends up in the page undefined. So, both the user and the kernel mappings exist on the page while it's undergoing kmap and modification. However, regardless of whether it's mapped into user space or not, even if it were later going to be mapped at a non-congruent user address, the kernel mappings have to be flushed to make the data written via them visible to the user (and, for a VIPT cache, they have to be flushed before the mapping is torn down otherwise we might not have the PTE to flush them via ...) James ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHSET] block: fix PIO cache coherency bug, take 2 2006-06-05 15:24 ` James Bottomley @ 2006-06-05 15:34 ` Russell King 2006-06-05 15:47 ` James Bottomley 0 siblings, 1 reply; 27+ messages in thread From: Russell King @ 2006-06-05 15:34 UTC (permalink / raw) To: James Bottomley Cc: Tejun Heo, Jens Axboe, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, lkml, linux-ide, linux-scsi On Mon, Jun 05, 2006 at 10:24:44AM -0500, James Bottomley wrote: > On Mon, 2006-06-05 at 15:44 +0100, Russell King wrote: > > Hence I find your comment "There are two cases where devices kmap a > > user page into kernel space and then proceed to read from or write to > > it" to be misleading - at the exact point in time that the device > > driver is manipulating the data in that page, it is not a user page. > > zero copy doesn't quite follow that ownership model. What has zero copy (your reply) got to do with faulting pages into userspace (my message). I'm sorry, I don't understand why you've brought this up. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHSET] block: fix PIO cache coherency bug, take 2 2006-06-05 15:34 ` Russell King @ 2006-06-05 15:47 ` James Bottomley 2006-06-05 15:48 ` Russell King 0 siblings, 1 reply; 27+ messages in thread From: James Bottomley @ 2006-06-05 15:47 UTC (permalink / raw) To: Russell King Cc: Tejun Heo, Jens Axboe, Dave Miller, bzolnier, jgarzik, mattjreimer, Guennadi Liakhovetski, lkml, linux-ide, linux-scsi On Mon, 2006-06-05 at 16:34 +0100, Russell King wrote: > What has zero copy (your reply) got to do with faulting pages into > userspace (my message). I'm sorry, I don't understand why you've > brought this up. The zero copy case is the case where we end up with user and kernel mappings simultaneously on the page. The nopage (or fault) case is where we end up with them sequentially. Both cases actually require the same cache treatment, but it's easiest to understand in the zero copy case. James ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHSET] block: fix PIO cache coherency bug, take 2 2006-06-05 15:47 ` James Bottomley @ 2006-06-05 15:48 ` Russell King 2006-06-05 16:16 ` James Bottomley 0 siblings, 1 reply; 27+ messages in thread From: Russell King @ 2006-06-05 15:48 UTC (permalink / raw) To: James Bottomley Cc: Tejun Heo, Jens Axboe, Dave Miller, bzolnier, jgarzik, mattjreimer, Guennadi Liakhovetski, lkml, linux-ide, linux-scsi On Mon, Jun 05, 2006 at 10:47:40AM -0500, James Bottomley wrote: > On Mon, 2006-06-05 at 16:34 +0100, Russell King wrote: > > What has zero copy (your reply) got to do with faulting pages into > > userspace (my message). I'm sorry, I don't understand why you've > > brought this up. > > The zero copy case is the case where we end up with user and kernel > mappings simultaneously on the page. The nopage (or fault) case is > where we end up with them sequentially. Both cases actually require the > same cache treatment, but it's easiest to understand in the zero copy > case. When does the zero copy case occur? -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHSET] block: fix PIO cache coherency bug, take 2 2006-06-05 15:48 ` Russell King @ 2006-06-05 16:16 ` James Bottomley 2006-06-05 16:37 ` Russell King 0 siblings, 1 reply; 27+ messages in thread From: James Bottomley @ 2006-06-05 16:16 UTC (permalink / raw) To: Russell King Cc: Tejun Heo, Jens Axboe, Dave Miller, bzolnier, jgarzik, mattjreimer, Guennadi Liakhovetski, lkml, linux-ide, linux-scsi On Mon, 2006-06-05 at 16:48 +0100, Russell King wrote: > When does the zero copy case occur? Almost all user initiated I/O. Glibc tends to implement this as mmap internally. James ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHSET] block: fix PIO cache coherency bug, take 2 2006-06-05 16:16 ` James Bottomley @ 2006-06-05 16:37 ` Russell King 0 siblings, 0 replies; 27+ messages in thread From: Russell King @ 2006-06-05 16:37 UTC (permalink / raw) To: James Bottomley Cc: Tejun Heo, Jens Axboe, Dave Miller, bzolnier, jgarzik, mattjreimer, Guennadi Liakhovetski, lkml, linux-ide, linux-scsi On Mon, Jun 05, 2006 at 11:16:42AM -0500, James Bottomley wrote: > On Mon, 2006-06-05 at 16:48 +0100, Russell King wrote: > > When does the zero copy case occur? > > Almost all user initiated I/O. Glibc tends to implement this as mmap > internally. Ah, so you _are_ talking about something different from what I'm talking about. I'm concentrating _solely_ on the _read_ side, and all my responses so far have been concerned about that only. Anyway, I think this discussion has become meaningless since we've been talking at cross purposes for the entire thread. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHSET] block: fix PIO cache coherency bug, take 2 2006-06-04 20:44 ` [PATCHSET] block: fix PIO cache coherency bug, take 2 Russell King 2006-06-04 22:23 ` Russell King @ 2006-06-05 13:43 ` James Bottomley 2006-06-06 11:00 ` Miklos Szeredi 1 sibling, 1 reply; 27+ messages in thread From: James Bottomley @ 2006-06-05 13:43 UTC (permalink / raw) To: Russell King Cc: Tejun Heo, Jens Axboe, Dave Miller, bzolnier, james.steward, jgarzik, mattjreimer, Guennadi Liakhovetski, lkml, linux-ide, linux-scsi On Sun, 2006-06-04 at 21:44 +0100, Russell King wrote: > > I tried to implement flush_anon_page() too but didn't know what to > do > > with anon_vma object. > > I'm not sure what this is about... This was for fuse on parisc. It should have no bearing on the current IDE problem. What it's trying to solve is the fact that flush_dcache_page() doesn't necessarily flush anonymous pages (because of the way the mappings list works). However, in order to make an anonymous page in user space visible via the kernel address, we have to have it flushed (this is what fuse does to transfer data into pages). So this API was introduced into the right places to permit that to happen. Most VIPT architectures are CAM based, so flush_dcache_page() actually sweeps up all the anon pages as well. However, if the implementation (like parisc's) has to loop over page_mapping(page) then it will likely need to implement flush_anon_page() for fuse to work. James ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHSET] block: fix PIO cache coherency bug, take 2 2006-06-05 13:43 ` James Bottomley @ 2006-06-06 11:00 ` Miklos Szeredi 0 siblings, 0 replies; 27+ messages in thread From: Miklos Szeredi @ 2006-06-06 11:00 UTC (permalink / raw) To: James.Bottomley Cc: rmk+lkml, htejun, axboe, davem, bzolnier, james.steward, jgarzik, mattjreimer, g.liakhovetski, linux-kernel, linux-ide, linux-scsi > > > I tried to implement flush_anon_page() too but didn't know what to > > do > > > with anon_vma object. > > > > I'm not sure what this is about... > > This was for fuse on parisc. I have reports that this effects some ARM architectures as well. And direct I/O over NFS and a couple other things which also use the get_user_pages() mechanism will also not work properly without flush_anon_page(). Miklos ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2006-06-06 11:00 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-06-04 3:41 [PATCHSET] block: fix PIO cache coherency bug, take 2 Tejun Heo 2006-06-04 3:41 ` [PATCH 1/5] arm: implement flush_kernel_dcache_page() Tejun Heo 2006-06-04 3:49 ` [PATCH 1/5] (REPOST) " Tejun Heo 2006-06-04 6:45 ` [PATCH 1/5] " David Miller 2006-06-04 6:53 ` Tejun Heo 2006-06-04 7:04 ` David Miller 2006-06-04 3:41 ` [PATCH 2/5] ide: add cpu cache flushes after kmapping and modifying a page Tejun Heo 2006-06-04 8:17 ` Christoph Hellwig 2006-06-04 9:09 ` Tejun Heo 2006-06-04 3:41 ` [PATCH 3/5] libata: " Tejun Heo 2006-06-04 3:41 ` [PATCH 5/5] md: " Tejun Heo 2006-06-04 3:41 ` [PATCH 4/5] SCSI: " Tejun Heo 2006-06-04 8:20 ` Christoph Hellwig 2006-06-04 9:13 ` Tejun Heo 2006-06-04 20:24 ` Guennadi Liakhovetski 2006-06-04 20:44 ` [PATCHSET] block: fix PIO cache coherency bug, take 2 Russell King 2006-06-04 22:23 ` Russell King 2006-06-05 14:27 ` James Bottomley 2006-06-05 14:44 ` Russell King 2006-06-05 15:24 ` James Bottomley 2006-06-05 15:34 ` Russell King 2006-06-05 15:47 ` James Bottomley 2006-06-05 15:48 ` Russell King 2006-06-05 16:16 ` James Bottomley 2006-06-05 16:37 ` Russell King 2006-06-05 13:43 ` James Bottomley 2006-06-06 11:00 ` Miklos Szeredi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).