* [PATCH] flush_cache_page while pte valid
@ 2002-11-11 18:25 Hugh Dickins
2002-11-11 18:35 ` Andrew Morton
2002-11-11 23:19 ` David S. Miller
0 siblings, 2 replies; 29+ messages in thread
From: Hugh Dickins @ 2002-11-11 18:25 UTC (permalink / raw)
To: Andrew Morton; +Cc: Dave McCracken, Rik van Riel, David S. Miller, linux-kernel
On some architectures (cachetlb.txt gives HyperSparc as an example)
it is essential to flush_cache_page while pte is still valid: the
rmap VM diverged from the base 2.4 VM before that fix was made,
so this error has crept back into 2.5.
Patch below applies to 2.5.47 or 2.5.47-mm1 - needs more work over
shared pagetables, but they've silently fallen out of 2.5.47-mm1:
oversight? I'll send Alan the equivalent for 2.4-ac shortly.
(I wonder, what happens if userspace now modifies the page
after the flush_cache_page, before the pte is invalidated?)
--- 2.5.47/mm/rmap.c Mon Oct 7 20:37:50 2002
+++ linux/mm/rmap.c Mon Nov 11 17:01:27 2002
@@ -393,9 +393,9 @@
}
/* Nuke the page table entry. */
+ flush_cache_page(vma, address);
pte = ptep_get_and_clear(ptep);
flush_tlb_page(vma, address);
- flush_cache_page(vma, address);
/* Store the swap location in the pte. See handle_pte_fault() ... */
if (PageSwapCache(page)) {
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [PATCH] flush_cache_page while pte valid 2002-11-11 18:25 [PATCH] flush_cache_page while pte valid Hugh Dickins @ 2002-11-11 18:35 ` Andrew Morton 2002-11-11 23:19 ` David S. Miller 1 sibling, 0 replies; 29+ messages in thread From: Andrew Morton @ 2002-11-11 18:35 UTC (permalink / raw) To: Hugh Dickins; +Cc: Dave McCracken, Rik van Riel, David S. Miller, linux-kernel Hugh Dickins wrote: > > On some architectures (cachetlb.txt gives HyperSparc as an example) > it is essential to flush_cache_page while pte is still valid: the > rmap VM diverged from the base 2.4 VM before that fix was made, > so this error has crept back into 2.5. OK, thanks. > Patch below applies to 2.5.47 or 2.5.47-mm1 - needs more work over > shared pagetables, but they've silently fallen out of 2.5.47-mm1: > oversight? Yes, oversight. dcache-rcu was similarly lost. Pathetic, isn't it? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] flush_cache_page while pte valid 2002-11-11 18:25 [PATCH] flush_cache_page while pte valid Hugh Dickins 2002-11-11 18:35 ` Andrew Morton @ 2002-11-11 23:19 ` David S. Miller 2002-11-11 23:34 ` [PATCH] [RFC] increase MAX_ADDR_LEN Roland Dreier 2002-11-12 6:53 ` [PATCH] flush_cache_page while pte valid Hugh Dickins 1 sibling, 2 replies; 29+ messages in thread From: David S. Miller @ 2002-11-11 23:19 UTC (permalink / raw) To: hugh; +Cc: akpm, dmccr, riel, linux-kernel From: Hugh Dickins <hugh@veritas.com> Date: Mon, 11 Nov 2002 18:25:25 +0000 (GMT) On some architectures (cachetlb.txt gives HyperSparc as an example) it is essential to flush_cache_page while pte is still valid: the rmap VM diverged from the base 2.4 VM before that fix was made, so this error has crept back into 2.5. ... (I wonder, what happens if userspace now modifies the page after the flush_cache_page, before the pte is invalidated?) Thanks for catching this. On architectures that are affected (such as the mentioned HyperSPARC chips), the cpu will take a trap and OOPS the kernel if the PTE is invalidated before the cache flush is made. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH] [RFC] increase MAX_ADDR_LEN 2002-11-11 23:19 ` David S. Miller @ 2002-11-11 23:34 ` Roland Dreier 2002-11-11 23:38 ` David S. Miller 2002-11-12 6:53 ` [PATCH] flush_cache_page while pte valid Hugh Dickins 1 sibling, 1 reply; 29+ messages in thread From: Roland Dreier @ 2002-11-11 23:34 UTC (permalink / raw) To: David S. Miller; +Cc: linux-kernel Hi Dave, I posted this to lkml last week and didn't get any response, positive or negative, so I'm sending this directly to you. Would you include these patches (or something that solves the same problem) in 2.5? This will make it possible to add IP-over-InfiniBand when the driver is ready without changes to the core kernel. Since the driver will not be ready until 2.6 time, I would very much like to get all the core changes required into the kernel now, before it's too late. I'm very open to a suggestion of a better way to handle the fact that IPoIB has a 20 byte hardware address, and I'd be happy to implement a patch for an alternative approach if that's what it will take to get support into the kernel. Thanks, Roland <roland@topspin.com> Here's my previous email: Included below are some changes that I would like to see included (in some form) in the standard kernel. They will make future support for InfiniBand much simpler; while the code at <http://infiniband.sf.net> is nowhere near ready for widespread use (let alone inclusion in the kernel), if we can get these small changes in the kernel then it will be much simpler to distribute and use InfiniBand drivers in the future. Note that I am definitely not asking that these patches be included as-is in the kernel. I would just like to open discussion by raising an issue and proposing one possible solution. These patches (against Linus's BK tree of a few days ago): 1. Add ARPHRD_INFINIBAND to if_arp.h. This is extremely minor, but does make an IP-over-InfiniBand driver cleaner. My feeling is we might as well do this. 2. Increase MAX_ADDR_LEN to 32 (from 8) and make some small changes to prevent this from causing problems for existing code. 3. Get rid of suspicious-looking uses of MAX_ADDR_LEN in the sungem and s390/net/lcs drivers. I have no idea whether these changes are necessary but I wanted to call this code to the maintainer's attention. I expect #2 to be somewhat controversial, so let me give my motivation for proposing this. The IETF draft for IP-over-InfiniBand (which seems very unlikely to change at this point) defines the IPoIB hardware address to be 20 bytes long. Of course, not everyone feels this was the best way to define the encapsulation, but the issue was extensively debated in the IETF ipoib working group, and the 20 byte hardware address seems to be the least bad option. (By the way, I have no particular attachment to the exact number 32. As long as we raise MAX_ADDR_LEN to at least 20, I'll be happy. It does seem desirable for MAX_ADDR_LEN to be a multiple of 8. With this in mind, 24 would be perfectly acceptable for IPoIB; I just chose 32 becase I had to choose something!) There are two ways to implement an IPoIB network driver: 1. The driver can tell the kernel that the hardware address length is less than it really is, and rewrite packets as they pass into and out of the driver. I've actually implemented this, and while it works, it is ugly, complex, and implies that special tools are required in userspace (for example, creating an ARP entry requires a special mechanism for passing the hardware address directly to the driver, and then creating a corresponding entry in the kernel's ARP table). 2. The driver can give the real 20-byte hardware address length to the kernel. This seems much cleaner, but of course it requires the value of MAX_ADDR_LEN to be increased. Since the second option seems so much better, I would like to get the needed changes into the kernel before the 2.6 release. If this change is not in the standard kernel, then vendor kernels will likely not pick it up. This will substantially complicate the task of developing an IPoIB driver, since anyone who wants to test or use the driver will have to patch their kernel. If these changes go into the standard kernel, then an IPoIB driver can simply be distributed as a module that is compiled and loaded into any 2.6 kernel. Please try these patches and let me know if they cause any problems for you. I am currently running these patches and a small dummy net driver that registers a device with type ARPHRD_INFINIBAND and hardware address length 20 on my workstation without trouble. ifconfig does produce bogus output for the dummy network device but that is another battle. (I would be interested in ideas for how to resolve the fact that the sa_data member of struct sockaddr is only 14 bytes long) Of course I am also eager to find out other developers' thoughts on how to implement IPoIB on Linux. Thanks, Roland <roland@topspin.com> =================================================================== You can import this changeset into BK by piping this whole message to: '| bk receive [path to repository]' or apply the patch as usual. =================================================================== ChangeSet@1.874, 2002-11-07 11:37:37-08:00, roland@topspin.com Increase MAX_ADDR_LEN from 8 to 32 (to support IP-over-InfiniBand hardware addresses) ChangeSet@1.873, 2002-11-07 11:35:31-08:00, roland@topspin.com Add IANA-assigned ARPHRD_INFINIBAND value for IP-over-InfiniBand to if_arp.h drivers/net/sungem.c | 2 +- drivers/s390/net/lcs.c | 2 +- include/linux/if_arp.h | 1 + include/linux/netdevice.h | 2 +- net/core/dev.c | 4 ++-- 5 files changed, 6 insertions(+), 5 deletions(-) diff -Nru a/drivers/net/sungem.c b/drivers/net/sungem.c --- a/drivers/net/sungem.c Thu Nov 7 12:47:32 2002 +++ b/drivers/net/sungem.c Thu Nov 7 12:47:32 2002 @@ -2858,7 +2858,7 @@ printk(KERN_ERR "%s: can't get mac-address\n", dev->name); return -1; } - memcpy(dev->dev_addr, addr, MAX_ADDR_LEN); + memcpy(dev->dev_addr, addr, 6); #else get_gem_mac_nonobp(gp->pdev, gp->dev->dev_addr); #endif diff -Nru a/drivers/s390/net/lcs.c b/drivers/s390/net/lcs.c --- a/drivers/s390/net/lcs.c Thu Nov 7 12:47:32 2002 +++ b/drivers/s390/net/lcs.c Thu Nov 7 12:47:32 2002 @@ -3569,7 +3569,7 @@ struct ifmcaddr6 *im6; struct inet6_dev *in6_dev; #endif - char buf[MAX_ADDR_LEN]; + char buf[LCS_ADDR_LEN]; lcs_ipm_list *curr_lmem, *new_lmem; lcs_state oldstate; diff -Nru a/include/linux/if_arp.h b/include/linux/if_arp.h --- a/include/linux/if_arp.h Thu Nov 7 12:47:32 2002 +++ b/include/linux/if_arp.h Thu Nov 7 12:47:32 2002 @@ -40,6 +40,7 @@ #define ARPHRD_METRICOM 23 /* Metricom STRIP (new IANA id) */ #define ARPHRD_IEEE1394 24 /* IEEE 1394 IPv4 - RFC 2734 */ #define ARPHRD_EUI64 27 /* EUI-64 */ +#define ARPHRD_INFINIBAND 32 /* InfiniBand */ /* Dummy types for non ARP hardware */ #define ARPHRD_SLIP 256 diff -Nru a/include/linux/netdevice.h b/include/linux/netdevice.h --- a/include/linux/netdevice.h Thu Nov 7 12:47:32 2002 +++ b/include/linux/netdevice.h Thu Nov 7 12:47:32 2002 @@ -65,7 +65,7 @@ #endif -#define MAX_ADDR_LEN 8 /* Largest hardware address length */ +#define MAX_ADDR_LEN 32 /* Largest hardware address length */ /* * Compute the worst case header length according to the protocols diff -Nru a/net/core/dev.c b/net/core/dev.c --- a/net/core/dev.c Thu Nov 7 12:47:32 2002 +++ b/net/core/dev.c Thu Nov 7 12:47:32 2002 @@ -2062,7 +2062,7 @@ case SIOCGIFHWADDR: memcpy(ifr->ifr_hwaddr.sa_data, dev->dev_addr, - MAX_ADDR_LEN); + min(sizeof ifr->ifr_hwaddr.sa_data, dev->addr_len)); ifr->ifr_hwaddr.sa_family = dev->type; return 0; @@ -2083,7 +2083,7 @@ if (ifr->ifr_hwaddr.sa_family != dev->type) return -EINVAL; memcpy(dev->broadcast, ifr->ifr_hwaddr.sa_data, - MAX_ADDR_LEN); + min(sizeof ifr->ifr_hwaddr.sa_data, dev->addr_len)); notifier_call_chain(&netdev_chain, NETDEV_CHANGEADDR, dev); return 0; ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] [RFC] increase MAX_ADDR_LEN 2002-11-11 23:34 ` [PATCH] [RFC] increase MAX_ADDR_LEN Roland Dreier @ 2002-11-11 23:38 ` David S. Miller 2002-11-11 23:58 ` Roland Dreier 2002-11-12 0:18 ` Alan Cox 0 siblings, 2 replies; 29+ messages in thread From: David S. Miller @ 2002-11-11 23:38 UTC (permalink / raw) To: roland; +Cc: linux-kernel So how are apps able to specify such larger hw addresses to configure a driver if IFHWADDRLEN is still 6? I'm not going to increase MAX_ADDR_LEN if there is no user ABI capable of configuring such larger addresses properly. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] [RFC] increase MAX_ADDR_LEN 2002-11-11 23:38 ` David S. Miller @ 2002-11-11 23:58 ` Roland Dreier 2002-11-12 0:18 ` Alan Cox 1 sibling, 0 replies; 29+ messages in thread From: Roland Dreier @ 2002-11-11 23:58 UTC (permalink / raw) To: David S. Miller; +Cc: linux-kernel >>>>> "David" == David S Miller <davem@redhat.com> writes: David> So how are apps able to specify such larger hw addresses to David> configure a driver if IFHWADDRLEN is still 6? In the InfiniBand case, the device's hardware address comes from a combination of the port GID (which is set by the InfiniBand subnet manager through an IB-specific mechanism) and the queue pair number that the driver gets when it initializes. There definitely still are problems to solve, such as specifying static ARP entries. David> I'm not going to increase MAX_ADDR_LEN if there is no user David> ABI capable of configuring such larger addresses properly. What would you consider a palatable ABI? (I'm happy to implement it) Enlarging sa_data in struct sockaddr doesn't seem feasible. I guess we could add a new socket ioctl() or extend SIOCGIFHWADDR/SIOCSIFHWADDR somehow.... Thanks, Roland <roland@topspin.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] [RFC] increase MAX_ADDR_LEN 2002-11-11 23:38 ` David S. Miller 2002-11-11 23:58 ` Roland Dreier @ 2002-11-12 0:18 ` Alan Cox 2002-11-12 0:01 ` Roland Dreier 1 sibling, 1 reply; 29+ messages in thread From: Alan Cox @ 2002-11-12 0:18 UTC (permalink / raw) To: David S. Miller; +Cc: roland, Linux Kernel Mailing List On Mon, 2002-11-11 at 23:38, David S. Miller wrote: > > So how are apps able to specify such larger hw addresses to configure > a driver if IFHWADDRLEN is still 6? > > I'm not going to increase MAX_ADDR_LEN if there is no user ABI capable > of configuring such larger addresses properly. The kernel just ignores it. We support multiple devices with larger address lengths. Its mostly a legacy constant ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] [RFC] increase MAX_ADDR_LEN 2002-11-12 0:18 ` Alan Cox @ 2002-11-12 0:01 ` Roland Dreier 2002-11-12 14:23 ` Alan Cox 0 siblings, 1 reply; 29+ messages in thread From: Roland Dreier @ 2002-11-12 0:01 UTC (permalink / raw) To: Alan Cox; +Cc: David S. Miller, Linux Kernel Mailing List >>>>> "Alan" == Alan Cox <alan@lxorguk.ukuu.org.uk> writes: >>>>> On Mon, 2002-11-11 at 23:38, David S. Miller wrote: Dave> So how are apps able to specify such larger hw addresses to Dave> configure a driver if IFHWADDRLEN is still 6? Dave> I'm not going to increase MAX_ADDR_LEN if there is no user Dave> ABI capable of configuring such larger addresses properly. Alan> The kernel just ignores it. We support multiple devices with Alan> larger address lengths. Its mostly a legacy constant What drivers in the kernel are there with address length more than MAX_ADDR_LEN? What do they put dev->addr_len and dev->dev_addr? Thanks, Roland <roland@topspin.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] [RFC] increase MAX_ADDR_LEN 2002-11-12 0:01 ` Roland Dreier @ 2002-11-12 14:23 ` Alan Cox 2002-11-12 14:20 ` Folkert van Heusden 2002-11-12 15:14 ` Roland Dreier 0 siblings, 2 replies; 29+ messages in thread From: Alan Cox @ 2002-11-12 14:23 UTC (permalink / raw) To: Roland Dreier; +Cc: David S. Miller, Linux Kernel Mailing List On Tue, 2002-11-12 at 00:01, Roland Dreier wrote: > What drivers in the kernel are there with address length more than > MAX_ADDR_LEN? What do they put dev->addr_len and dev->dev_addr? AX.25 addresses are 7 bytes long at the physical layer, but because they including routing info up to about 60 bytes long elsewhere. Fortunately lengths are passed to all but the hw level SIOCSIF/SIOCGIF calls, and even those can be increased a little without harm as they use the struct sockaddr - in fact I think 14 bytes would be the limit. Seems trivial to do in 2.5 and quite doable in 2.4 if you actually have to worry about this at the SIOCSIFADDR/GIFHWADDR level. Alan ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] [RFC] increase MAX_ADDR_LEN 2002-11-12 14:23 ` Alan Cox @ 2002-11-12 14:20 ` Folkert van Heusden 2002-11-12 15:14 ` Roland Dreier 1 sibling, 0 replies; 29+ messages in thread From: Folkert van Heusden @ 2002-11-12 14:20 UTC (permalink / raw) To: alan; +Cc: roland, davem, linux-kernel Hi, Am I right that in net/ipv4/tcp_ipv4.c in function "tcp_v4_get_port" the portnumber for a new connection is generated? Because fiddling with that code seems to have no effect on the portnumbers generated for new connections. Folkert ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] [RFC] increase MAX_ADDR_LEN 2002-11-12 14:23 ` Alan Cox 2002-11-12 14:20 ` Folkert van Heusden @ 2002-11-12 15:14 ` Roland Dreier 2002-11-12 16:00 ` Alan Cox 1 sibling, 1 reply; 29+ messages in thread From: Roland Dreier @ 2002-11-12 15:14 UTC (permalink / raw) To: Alan Cox; +Cc: David S. Miller, Linux Kernel Mailing List >>>>> "Alan" == Alan Cox <alan@lxorguk.ukuu.org.uk> writes: Alan> AX.25 addresses are 7 bytes long at the physical layer, but Alan> because they including routing info up to about 60 bytes Alan> long elsewhere. Fortunately lengths are passed to all but Alan> the hw level SIOCSIF/SIOCGIF calls, and even those can be Alan> increased a little without harm as they use the struct Alan> sockaddr - in fact I think 14 bytes would be the limit. Alan> Seems trivial to do in 2.5 and quite doable in 2.4 if you Alan> actually have to worry about this at the Alan> SIOCSIFADDR/GIFHWADDR level. Hmm... the problem I would like to solve is that IP-over-InfiniBand has 20 byte hardware addresses. One can implement a driver that lies about its HW address len (you have an internal ARP cache and only expose a hash value/cookie to the rest of the world). In fact I've done just that for IPoIB. It works but it's ugly and overly complex. I would like to find a clean solution for 2.5/2.6, but it looks like it will require changes to the net core. I'd like to do those now so the IPoIB driver can just drop in cleanly later. (I want to be able to just add drivers/infiniband during 2.6 without and invasive changes required) To do that seems to require increasing MAX_ADDR_LEN -- otherwise I don't see what the driver can put in addr_len/dev_addr. Thanks, Roland ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] [RFC] increase MAX_ADDR_LEN 2002-11-12 15:14 ` Roland Dreier @ 2002-11-12 16:00 ` Alan Cox 2002-11-12 15:50 ` YOSHIFUJI Hideaki / 吉藤英明 2002-11-12 20:36 ` Roland Dreier 0 siblings, 2 replies; 29+ messages in thread From: Alan Cox @ 2002-11-12 16:00 UTC (permalink / raw) To: Roland Dreier; +Cc: David S. Miller, Linux Kernel Mailing List On Tue, 2002-11-12 at 15:14, Roland Dreier wrote: > Hmm... the problem I would like to solve is that IP-over-InfiniBand > has 20 byte hardware addresses. One can implement a driver that lies > about its HW address len (you have an internal ARP cache and only > expose a hash value/cookie to the rest of the world). In fact I've > done just that for IPoIB. Our ARP code handles AX.25 fine, and that can include long addresses, so either we have a live bug or it works 8). The multicast layer does have problems with addresses over 8 bytes (MAX_ADDR_LEN). > It works but it's ugly and overly complex. I would like to find a > clean solution for 2.5/2.6, but it looks like it will require changes > to the net core. I'd like to do those now so the IPoIB driver can > just drop in cleanly later. (I want to be able to just add > drivers/infiniband during 2.6 without and invasive changes required) I think you need to do two things. 1. Increase MAX_ADDR_LEN 2. Add some new address setting ioctls, and ensure the old ones keep the old address length limit. That is needed because the old caller wont have allocated enough address space for a 20 byte address return. You have to solve both though, and the first patch should probably be the one to add more sensible address set/get functions. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] [RFC] increase MAX_ADDR_LEN 2002-11-12 16:00 ` Alan Cox @ 2002-11-12 15:50 ` YOSHIFUJI Hideaki / 吉藤英明 2002-11-12 20:36 ` Roland Dreier 1 sibling, 0 replies; 29+ messages in thread From: YOSHIFUJI Hideaki / 吉藤英明 @ 2002-11-12 15:50 UTC (permalink / raw) To: alan; +Cc: roland, davem, linux-kernel In article <1037116836.8500.55.camel@irongate.swansea.linux.org.uk> (at 12 Nov 2002 16:00:36 +0000), Alan Cox <alan@lxorguk.ukuu.org.uk> says: > 2. Add some new address setting ioctls, and ensure the old ones keep the > old address length limit. That is needed because the old caller wont > have allocated enough address space for a 20 byte address return. > > You have to solve both though, and the first patch should probably be > the one to add more sensible address set/get functions. *BSDs have SIOCGLIFPHYADDR etc., but, they're obsolete; we should use rtnetlink (or routing socket in BSDs) to manage addresses. So, not having such ioctls for long addresses would be ok. -- Hideaki YOSHIFUJI @ USAGI Project <yoshfuji@linux-ipv6.org> GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] [RFC] increase MAX_ADDR_LEN 2002-11-12 16:00 ` Alan Cox 2002-11-12 15:50 ` YOSHIFUJI Hideaki / 吉藤英明 @ 2002-11-12 20:36 ` Roland Dreier 2002-11-12 22:31 ` David S. Miller 1 sibling, 1 reply; 29+ messages in thread From: Roland Dreier @ 2002-11-12 20:36 UTC (permalink / raw) To: Alan Cox; +Cc: David S. Miller, Linux Kernel Mailing List >>>>> "Alan" == Alan Cox <alan@lxorguk.ukuu.org.uk> writes: Alan> 1. Increase MAX_ADDR_LEN 2. Add some new address setting Alan> ioctls, and ensure the old ones keep the old address length Alan> limit. That is needed because the old caller wont have Alan> allocated enough address space for a 20 byte address return. Thanks to YOSHIFUJI Hideaki / 吉藤英明, I had a look at rtnetlink. It seems like we would get the necessary address setting functionality if I implemented the following: 1. Add an RTM_SETLINK message type that handles at least the IFLA_ADDRESS attribute. This would replace SIOCSIFHWADDR for interfaces with long hardware addresses. 2. Add code to handle receiving RTM_NEWNEIGH and RTM_DELNEIGH messages from user space. This would replace SIOCSARP and SIOCDARP for interfaces with long hardware addresses. Dave, Alan, if I wrote a patch to do this would you accept it? (And following that increase MAX_ADDR_LEN?) (By the way the original patch I posted added code to the SIOCSIFHWADDR/SIOCGIFHWADDR handler to prevent a long hardware address from overrunning the ifr_data member that user space passed in) Thanks, Roland ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] [RFC] increase MAX_ADDR_LEN 2002-11-12 20:36 ` Roland Dreier @ 2002-11-12 22:31 ` David S. Miller 2002-11-12 22:39 ` Roland Dreier 2002-12-04 18:31 ` rtnetlink replacement for SIOCSIFHWADDR Roland Dreier 0 siblings, 2 replies; 29+ messages in thread From: David S. Miller @ 2002-11-12 22:31 UTC (permalink / raw) To: roland; +Cc: alan, linux-kernel From: Roland Dreier <roland@topspin.com> Date: 12 Nov 2002 12:36:10 -0800 Dave, Alan, if I wrote a patch to do this would you accept it? (And following that increase MAX_ADDR_LEN?) I would have to see it first, but likely yes. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] [RFC] increase MAX_ADDR_LEN 2002-11-12 22:31 ` David S. Miller @ 2002-11-12 22:39 ` Roland Dreier 2002-12-04 18:31 ` rtnetlink replacement for SIOCSIFHWADDR Roland Dreier 1 sibling, 0 replies; 29+ messages in thread From: Roland Dreier @ 2002-11-12 22:39 UTC (permalink / raw) To: David S. Miller; +Cc: alan, linux-kernel >>>>> "David" == David S Miller <davem@redhat.com> writes: Roland> if I wrote a patch to do this would you accept it? (And Roland> following that increase MAX_ADDR_LEN?) David> I would have to see it first, but likely yes. Of course, that's what I would expect. (And I expect to go through a few revisions before it's good enough) I just don't want to start on something that you think is categorically a stupid idea. I will code up the rtnetlink extensions I proposed and post them when they are ready. Thanks, Roland ^ permalink raw reply [flat|nested] 29+ messages in thread
* rtnetlink replacement for SIOCSIFHWADDR @ 2002-12-04 18:31 ` Roland Dreier 2002-12-04 20:11 ` Andi Kleen 0 siblings, 1 reply; 29+ messages in thread From: Roland Dreier @ 2002-12-04 18:31 UTC (permalink / raw) To: ak, kuznet; +Cc: linux-kernel, davem Hi, since you are the architects of the rtnetlink facility, I am writing to you to ask your opinion on how to add some functionality. I'm exploring what changes are needed in the kernel to support devices with large addr_len. My motivation is to get the infrastructure for IP-over-InfiniBand, which has addr_len 20, into the kernel, and I am trying to produce a patch that can be accepted into mainline 2.5. With a few a minor fixes to avoid overrunning array bounds, almost everything just works. The RTM_NEWNEIGH and RTM_DELNEIGH messages seem to provide what is needed to manage ARP entries with large L2 addresses (replacing the SIOCSARP and SIOCDARP ioctls). The one major piece that is missing is a replacement for SIOCSIFHWADDR (which can only set L2 addresses up to 14 bytes long, because of the size of sa_data in struct sockaddr). I can see two ways one might set an interface's L2 address through rtnetlink: We could extend the RTM_NEWLINK message (possibly using the change mask) to include changing just the L2 address, and add support in the kernel for receiving these messages from userspace. Or, we could add a new RTM_SETLINK message that userspace can send to the kernel to modify an interface's properties. Of course I am open to other suggestions for how to replace SIOCSIFHWADDR. I would very much appreciate your thoughts on how to proceed. Thanks, Roland <roland@topspin.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: rtnetlink replacement for SIOCSIFHWADDR 2002-12-04 18:31 ` rtnetlink replacement for SIOCSIFHWADDR Roland Dreier @ 2002-12-04 20:11 ` Andi Kleen 2002-12-31 23:11 ` [PATCH] increase MAX_ADDR_LEN Roland Dreier 0 siblings, 1 reply; 29+ messages in thread From: Andi Kleen @ 2002-12-04 20:11 UTC (permalink / raw) To: Roland Dreier; +Cc: ak, kuznet, linux-kernel, davem > Or, we could add a new RTM_SETLINK message that userspace can send > to the kernel to modify an interface's properties. Defining a RTM_SETLINK would seem to be cleanest to me. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH] increase MAX_ADDR_LEN 2002-12-04 20:11 ` Andi Kleen @ 2002-12-31 23:11 ` Roland Dreier 2003-01-06 15:52 ` [PATCH] increase MAX_ADDR_LEN (resend) Roland Dreier 2003-01-07 8:58 ` [PATCH] increase MAX_ADDR_LEN David S. Miller 0 siblings, 2 replies; 29+ messages in thread From: Roland Dreier @ 2002-12-31 23:11 UTC (permalink / raw) To: davem; +Cc: ak, linux-kernel The patch included below is intended to increase MAX_ADDR_LEN to 32, which will make adding IP-over-InfiniBand support much simpler. I included the rtnetlink-based support for big hardware adresses that we discussed earlier. Specifically, the patch: 1. Adds ARPHRD_INFINIBAND to if_arp.h. This is extremely minor, but will make an IP-over-InfiniBand driver cleaner. 2. Increases MAX_ADDR_LEN to 32 (from 8). 3. Makes some small changes to net/core/dev.c to prevent the increased MAX_ADDR_LEN from overflowing when the SIOCGIFHWADDR and SIOCSIFHWADDR ioctls are used on interfaces with addr_len > 14 (the size of sa_data in struct sockaddr). 4. Adds RTM_SETLINK to <linux/rtnetlink.h> and adds a function do_setlink() to net/core/rtnetlink.c to handle this message from userspace. This lets userspace set the hardware address and broadcast address for interfaces with addr_len > 14. 5. Cleans up the initializer for link_rtnetlink_table[] so that it is much easier to read and change. There is some suspicious-looking use of MAX_ADDR_LEN in drivers/net/sungem.c and drivers/s390/net/lcs.c but I did not touch those files in this patch. This patch was created and tested with UML on kernel 2.5.47, but applies cleanly to Linus's current 2.5.53-BK (with a few offsets). Please apply this patch or let me know what needs to change for it to be applied. Thanks, Roland <roland@topspin.com> diff -Naur linux-2.5.47/include/linux/if_arp.h linux-2.5.47-max-addr/include/linux/if_arp.h --- linux-2.5.47/include/linux/if_arp.h Sun Nov 10 19:28:29 2002 +++ linux-2.5.47-max-addr/include/linux/if_arp.h Mon Dec 30 13:50:07 2002 @@ -40,6 +40,7 @@ #define ARPHRD_METRICOM 23 /* Metricom STRIP (new IANA id) */ #define ARPHRD_IEEE1394 24 /* IEEE 1394 IPv4 - RFC 2734 */ #define ARPHRD_EUI64 27 /* EUI-64 */ +#define ARPHRD_INFINIBAND 32 /* InfiniBand */ /* Dummy types for non ARP hardware */ #define ARPHRD_SLIP 256 diff -Naur linux-2.5.47/include/linux/netdevice.h linux-2.5.47-max-addr/include/linux/netdevice.h --- linux-2.5.47/include/linux/netdevice.h Sun Nov 10 19:28:31 2002 +++ linux-2.5.47-max-addr/include/linux/netdevice.h Mon Dec 30 12:05:42 2002 @@ -65,7 +65,7 @@ #endif -#define MAX_ADDR_LEN 8 /* Largest hardware address length */ +#define MAX_ADDR_LEN 32 /* Largest hardware address length */ /* * Compute the worst case header length according to the protocols diff -Naur linux-2.5.47/include/linux/rtnetlink.h linux-2.5.47-max-addr/include/linux/rtnetlink.h --- linux-2.5.47/include/linux/rtnetlink.h Sun Nov 10 19:28:29 2002 +++ linux-2.5.47-max-addr/include/linux/rtnetlink.h Tue Dec 31 14:38:13 2002 @@ -17,6 +17,7 @@ #define RTM_NEWLINK (RTM_BASE+0) #define RTM_DELLINK (RTM_BASE+1) #define RTM_GETLINK (RTM_BASE+2) +#define RTM_SETLINK (RTM_BASE+3) #define RTM_NEWADDR (RTM_BASE+4) #define RTM_DELADDR (RTM_BASE+5) diff -Naur linux-2.5.47/net/core/dev.c linux-2.5.47-max-addr/net/core/dev.c --- linux-2.5.47/net/core/dev.c Sun Nov 10 19:28:16 2002 +++ linux-2.5.47-max-addr/net/core/dev.c Mon Dec 30 12:06:06 2002 @@ -2062,7 +2062,7 @@ case SIOCGIFHWADDR: memcpy(ifr->ifr_hwaddr.sa_data, dev->dev_addr, - MAX_ADDR_LEN); + min(sizeof ifr->ifr_hwaddr.sa_data, (size_t) dev->addr_len)); ifr->ifr_hwaddr.sa_family = dev->type; return 0; @@ -2083,7 +2083,7 @@ if (ifr->ifr_hwaddr.sa_family != dev->type) return -EINVAL; memcpy(dev->broadcast, ifr->ifr_hwaddr.sa_data, - MAX_ADDR_LEN); + min(sizeof ifr->ifr_hwaddr.sa_data, (size_t) dev->addr_len)); notifier_call_chain(&netdev_chain, NETDEV_CHANGEADDR, dev); return 0; diff -Naur linux-2.5.47/net/core/rtnetlink.c linux-2.5.47-max-addr/net/core/rtnetlink.c --- linux-2.5.47/net/core/rtnetlink.c Sun Nov 10 19:28:33 2002 +++ linux-2.5.47-max-addr/net/core/rtnetlink.c Tue Dec 31 14:38:50 2002 @@ -220,6 +220,40 @@ return skb->len; } +static int do_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) { + struct ifinfomsg *ifm = NLMSG_DATA(nlh); + struct rtattr **ida = arg; + struct net_device *dev; + int err; + + dev = dev_get_by_index(ifm->ifi_index); + if (!dev) { + return -ENODEV; + } + + err = -EINVAL; + + if (ida[IFLA_ADDRESS - 1]) { + if (ida[IFLA_ADDRESS - 1]->rta_len != RTA_LENGTH(dev->addr_len)) { + goto out; + } + memcpy(dev->dev_addr, RTA_DATA(ida[IFLA_ADDRESS - 1]), dev->addr_len); + } + + if (ida[IFLA_BROADCAST - 1]) { + if (ida[IFLA_BROADCAST - 1]->rta_len != RTA_LENGTH(dev->addr_len)) { + goto out; + } + memcpy(dev->broadcast, RTA_DATA(ida[IFLA_BROADCAST - 1]), dev->addr_len); + } + + err = 0; + +out: + dev_put(dev); + return err; +} + int rtnetlink_dump_all(struct sk_buff *skb, struct netlink_callback *cb) { int idx; @@ -457,32 +491,14 @@ static struct rtnetlink_link link_rtnetlink_table[RTM_MAX-RTM_BASE+1] = { - { NULL, NULL, }, - { NULL, NULL, }, - { NULL, rtnetlink_dump_ifinfo, }, - { NULL, NULL, }, - - { NULL, NULL, }, - { NULL, NULL, }, - { NULL, rtnetlink_dump_all, }, - { NULL, NULL, }, - - { NULL, NULL, }, - { NULL, NULL, }, - { NULL, rtnetlink_dump_all, }, - { NULL, NULL, }, - - { neigh_add, NULL, }, - { neigh_delete, NULL, }, - { NULL, neigh_dump_info, }, - { NULL, NULL, }, - - { NULL, NULL, }, - { NULL, NULL, }, - { NULL, NULL, }, - { NULL, NULL, }, + [RTM_GETLINK - RTM_BASE] = { .dumpit = rtnetlink_dump_ifinfo }, + [RTM_SETLINK - RTM_BASE] = { .doit = do_setlink }, + [RTM_GETADDR - RTM_BASE] = { .dumpit = rtnetlink_dump_all }, + [RTM_GETROUTE - RTM_BASE] = { .dumpit = rtnetlink_dump_all }, + [RTM_NEWNEIGH - RTM_BASE] = { .doit = neigh_add }, + [RTM_DELNEIGH - RTM_BASE] = { .doit = neigh_delete }, + [RTM_GETNEIGH - RTM_BASE] = { .dumpit = neigh_dump_info } }; - static int rtnetlink_event(struct notifier_block *this, unsigned long event, void *ptr) { ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH] increase MAX_ADDR_LEN (resend) 2002-12-31 23:11 ` [PATCH] increase MAX_ADDR_LEN Roland Dreier @ 2003-01-06 15:52 ` Roland Dreier 2003-01-06 23:29 ` David S. Miller 2003-01-07 8:58 ` [PATCH] increase MAX_ADDR_LEN David S. Miller 1 sibling, 1 reply; 29+ messages in thread From: Roland Dreier @ 2003-01-06 15:52 UTC (permalink / raw) To: davem; +Cc: linux-kernel Hi, I think this got lost in your inbox over New Year's (I send it December 31), so I'm resending. The patch included below is intended to increase MAX_ADDR_LEN to 32, which will make adding IP-over-InfiniBand support much simpler. I included the rtnetlink-based support for big hardware adresses that we discussed earlier. Specifically, the patch: 1. Adds ARPHRD_INFINIBAND to if_arp.h. This is extremely minor, but will make an IP-over-InfiniBand driver cleaner. 2. Increases MAX_ADDR_LEN to 32 (from 8). 3. Makes some small changes to net/core/dev.c to prevent the increased MAX_ADDR_LEN from overflowing when the SIOCGIFHWADDR and SIOCSIFHWADDR ioctls are used on interfaces with addr_len > 14 (the size of sa_data in struct sockaddr). 4. Adds RTM_SETLINK to <linux/rtnetlink.h> and adds a function do_setlink() to net/core/rtnetlink.c to handle this message from userspace. This lets userspace set the hardware address and broadcast address for interfaces with addr_len > 14. 5. Cleans up the initializer for link_rtnetlink_table[] so that it is much easier to read and change. There is some suspicious-looking use of MAX_ADDR_LEN in drivers/net/sungem.c and drivers/s390/net/lcs.c but I did not touch those files in this patch. This patch was created and tested with UML on kernel 2.5.47, but applies cleanly to Linus's current 2.5.53-BK (with a few offsets). Please apply this patch or let me know what needs to change for it to be applied. Thanks, Roland <roland@topspin.com> diff -Naur linux-2.5.47/include/linux/if_arp.h linux-2.5.47-max-addr/include/linux/if_arp.h --- linux-2.5.47/include/linux/if_arp.h Sun Nov 10 19:28:29 2002 +++ linux-2.5.47-max-addr/include/linux/if_arp.h Mon Dec 30 13:50:07 2002 @@ -40,6 +40,7 @@ #define ARPHRD_METRICOM 23 /* Metricom STRIP (new IANA id) */ #define ARPHRD_IEEE1394 24 /* IEEE 1394 IPv4 - RFC 2734 */ #define ARPHRD_EUI64 27 /* EUI-64 */ +#define ARPHRD_INFINIBAND 32 /* InfiniBand */ /* Dummy types for non ARP hardware */ #define ARPHRD_SLIP 256 diff -Naur linux-2.5.47/include/linux/netdevice.h linux-2.5.47-max-addr/include/linux/netdevice.h --- linux-2.5.47/include/linux/netdevice.h Sun Nov 10 19:28:31 2002 +++ linux-2.5.47-max-addr/include/linux/netdevice.h Mon Dec 30 12:05:42 2002 @@ -65,7 +65,7 @@ #endif -#define MAX_ADDR_LEN 8 /* Largest hardware address length */ +#define MAX_ADDR_LEN 32 /* Largest hardware address length */ /* * Compute the worst case header length according to the protocols diff -Naur linux-2.5.47/include/linux/rtnetlink.h linux-2.5.47-max-addr/include/linux/rtnetlink.h --- linux-2.5.47/include/linux/rtnetlink.h Sun Nov 10 19:28:29 2002 +++ linux-2.5.47-max-addr/include/linux/rtnetlink.h Tue Dec 31 14:38:13 2002 @@ -17,6 +17,7 @@ #define RTM_NEWLINK (RTM_BASE+0) #define RTM_DELLINK (RTM_BASE+1) #define RTM_GETLINK (RTM_BASE+2) +#define RTM_SETLINK (RTM_BASE+3) #define RTM_NEWADDR (RTM_BASE+4) #define RTM_DELADDR (RTM_BASE+5) diff -Naur linux-2.5.47/net/core/dev.c linux-2.5.47-max-addr/net/core/dev.c --- linux-2.5.47/net/core/dev.c Sun Nov 10 19:28:16 2002 +++ linux-2.5.47-max-addr/net/core/dev.c Mon Dec 30 12:06:06 2002 @@ -2062,7 +2062,7 @@ case SIOCGIFHWADDR: memcpy(ifr->ifr_hwaddr.sa_data, dev->dev_addr, - MAX_ADDR_LEN); + min(sizeof ifr->ifr_hwaddr.sa_data, (size_t) dev->addr_len)); ifr->ifr_hwaddr.sa_family = dev->type; return 0; @@ -2083,7 +2083,7 @@ if (ifr->ifr_hwaddr.sa_family != dev->type) return -EINVAL; memcpy(dev->broadcast, ifr->ifr_hwaddr.sa_data, - MAX_ADDR_LEN); + min(sizeof ifr->ifr_hwaddr.sa_data, (size_t) dev->addr_len)); notifier_call_chain(&netdev_chain, NETDEV_CHANGEADDR, dev); return 0; diff -Naur linux-2.5.47/net/core/rtnetlink.c linux-2.5.47-max-addr/net/core/rtnetlink.c --- linux-2.5.47/net/core/rtnetlink.c Sun Nov 10 19:28:33 2002 +++ linux-2.5.47-max-addr/net/core/rtnetlink.c Tue Dec 31 14:38:50 2002 @@ -220,6 +220,40 @@ return skb->len; } +static int do_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) { + struct ifinfomsg *ifm = NLMSG_DATA(nlh); + struct rtattr **ida = arg; + struct net_device *dev; + int err; + + dev = dev_get_by_index(ifm->ifi_index); + if (!dev) { + return -ENODEV; + } + + err = -EINVAL; + + if (ida[IFLA_ADDRESS - 1]) { + if (ida[IFLA_ADDRESS - 1]->rta_len != RTA_LENGTH(dev->addr_len)) { + goto out; + } + memcpy(dev->dev_addr, RTA_DATA(ida[IFLA_ADDRESS - 1]), dev->addr_len); + } + + if (ida[IFLA_BROADCAST - 1]) { + if (ida[IFLA_BROADCAST - 1]->rta_len != RTA_LENGTH(dev->addr_len)) { + goto out; + } + memcpy(dev->broadcast, RTA_DATA(ida[IFLA_BROADCAST - 1]), dev->addr_len); + } + + err = 0; + +out: + dev_put(dev); + return err; +} + int rtnetlink_dump_all(struct sk_buff *skb, struct netlink_callback *cb) { int idx; @@ -457,32 +491,14 @@ static struct rtnetlink_link link_rtnetlink_table[RTM_MAX-RTM_BASE+1] = { - { NULL, NULL, }, - { NULL, NULL, }, - { NULL, rtnetlink_dump_ifinfo, }, - { NULL, NULL, }, - - { NULL, NULL, }, - { NULL, NULL, }, - { NULL, rtnetlink_dump_all, }, - { NULL, NULL, }, - - { NULL, NULL, }, - { NULL, NULL, }, - { NULL, rtnetlink_dump_all, }, - { NULL, NULL, }, - - { neigh_add, NULL, }, - { neigh_delete, NULL, }, - { NULL, neigh_dump_info, }, - { NULL, NULL, }, - - { NULL, NULL, }, - { NULL, NULL, }, - { NULL, NULL, }, - { NULL, NULL, }, + [RTM_GETLINK - RTM_BASE] = { .dumpit = rtnetlink_dump_ifinfo }, + [RTM_SETLINK - RTM_BASE] = { .doit = do_setlink }, + [RTM_GETADDR - RTM_BASE] = { .dumpit = rtnetlink_dump_all }, + [RTM_GETROUTE - RTM_BASE] = { .dumpit = rtnetlink_dump_all }, + [RTM_NEWNEIGH - RTM_BASE] = { .doit = neigh_add }, + [RTM_DELNEIGH - RTM_BASE] = { .doit = neigh_delete }, + [RTM_GETNEIGH - RTM_BASE] = { .dumpit = neigh_dump_info } }; - static int rtnetlink_event(struct notifier_block *this, unsigned long event, void *ptr) { ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] increase MAX_ADDR_LEN (resend) 2003-01-06 15:52 ` [PATCH] increase MAX_ADDR_LEN (resend) Roland Dreier @ 2003-01-06 23:29 ` David S. Miller 0 siblings, 0 replies; 29+ messages in thread From: David S. Miller @ 2003-01-06 23:29 UTC (permalink / raw) To: roland; +Cc: linux-kernel From: Roland Dreier <roland@topspin.com> Date: 06 Jan 2003 07:52:57 -0800 Hi, I think this got lost in your inbox over New Year's (I send it December 31), so I'm resending. Not lost, I'm just still working on my backlog. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] increase MAX_ADDR_LEN 2002-12-31 23:11 ` [PATCH] increase MAX_ADDR_LEN Roland Dreier 2003-01-06 15:52 ` [PATCH] increase MAX_ADDR_LEN (resend) Roland Dreier @ 2003-01-07 8:58 ` David S. Miller 1 sibling, 0 replies; 29+ messages in thread From: David S. Miller @ 2003-01-07 8:58 UTC (permalink / raw) To: roland; +Cc: ak, linux-kernel From: Roland Dreier <roland@topspin.com> Date: 31 Dec 2002 15:11:31 -0800 There is some suspicious-looking use of MAX_ADDR_LEN in drivers/net/sungem.c and drivers/s390/net/lcs.c but I did not touch those files in this patch. ... Please apply this patch or let me know what needs to change for it to be applied. I applied the patch and added a #warning to the powerpc bits of sungem.c that look suspicious. Thanks. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] flush_cache_page while pte valid 2002-11-11 23:19 ` David S. Miller 2002-11-11 23:34 ` [PATCH] [RFC] increase MAX_ADDR_LEN Roland Dreier @ 2002-11-12 6:53 ` Hugh Dickins 2002-11-12 6:53 ` David S. Miller 1 sibling, 1 reply; 29+ messages in thread From: Hugh Dickins @ 2002-11-12 6:53 UTC (permalink / raw) To: David S. Miller; +Cc: akpm, dmccr, riel, linux-kernel On Mon, 11 Nov 2002, David S. Miller wrote: > From: Hugh Dickins <hugh@veritas.com> > Date: Mon, 11 Nov 2002 18:25:25 +0000 (GMT) > > On some architectures (cachetlb.txt gives HyperSparc as an example) > it is essential to flush_cache_page while pte is still valid: the > rmap VM diverged from the base 2.4 VM before that fix was made, > so this error has crept back into 2.5. > ... > (I wonder, what happens if userspace now modifies the page > after the flush_cache_page, before the pte is invalidated?) > > Thanks for catching this. > > On architectures that are affected (such as the mentioned HyperSPARC > chips), the cpu will take a trap and OOPS the kernel if the PTE is > invalidated before the cache flush is made. Thanks for shedding light on that; but I'm still wondering if there might be data loss if userspace modifies the page in the tiny window between correctly positioned flush_cache_page and pte invalidation? Hugh ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] flush_cache_page while pte valid 2002-11-12 6:53 ` [PATCH] flush_cache_page while pte valid Hugh Dickins @ 2002-11-12 6:53 ` David S. Miller 2002-11-12 14:49 ` Rik van Riel 2002-11-12 17:43 ` Hugh Dickins 0 siblings, 2 replies; 29+ messages in thread From: David S. Miller @ 2002-11-12 6:53 UTC (permalink / raw) To: hugh; +Cc: akpm, dmccr, riel, linux-kernel From: Hugh Dickins <hugh@veritas.com> Date: Tue, 12 Nov 2002 06:53:04 +0000 (GMT) Thanks for shedding light on that; but I'm still wondering if there might be data loss if userspace modifies the page in the tiny window between correctly positioned flush_cache_page and pte invalidation? The flush merely writes back the data, a copy-back operation, fully L2 cache coherent. All cpus will see correct data if an intermittant store occurs. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] flush_cache_page while pte valid 2002-11-12 6:53 ` David S. Miller @ 2002-11-12 14:49 ` Rik van Riel 2002-11-12 21:45 ` David S. Miller 2002-11-12 17:43 ` Hugh Dickins 1 sibling, 1 reply; 29+ messages in thread From: Rik van Riel @ 2002-11-12 14:49 UTC (permalink / raw) To: David S. Miller; +Cc: hugh, akpm, dmccr, linux-kernel On Mon, 11 Nov 2002, David S. Miller wrote: > From: Hugh Dickins <hugh@veritas.com> > Date: Tue, 12 Nov 2002 06:53:04 +0000 (GMT) > > Thanks for shedding light on that; but I'm still wondering if there > might be data loss if userspace modifies the page in the tiny window > between correctly positioned flush_cache_page and pte invalidation? > > The flush merely writes back the data, a copy-back operation, fully L2 > cache coherent. All cpus will see correct data if an intermittant > store occurs. The CPUs will, but the harddisk might not. We really need to get this right in the swapout path. regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Current spamtrap: <a href=mailto:"october@surriel.com">october@surriel.com</a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] flush_cache_page while pte valid 2002-11-12 14:49 ` Rik van Riel @ 2002-11-12 21:45 ` David S. Miller 0 siblings, 0 replies; 29+ messages in thread From: David S. Miller @ 2002-11-12 21:45 UTC (permalink / raw) To: riel; +Cc: hugh, akpm, dmccr, linux-kernel From: Rik van Riel <riel@conectiva.com.br> Date: Tue, 12 Nov 2002 12:49:50 -0200 (BRST) On Mon, 11 Nov 2002, David S. Miller wrote: > The flush merely writes back the data, a copy-back operation, fully L2 > cache coherent. All cpus will see correct data if an intermittant > store occurs. The CPUs will, but the harddisk might not. We really need to get this right in the swapout path. We do get it right, watch: 1) remove last mapping instance of page -> MM level cache flush pushes it permanently to ram in full view of DMA activity 2) remove last mapping, but new one appears as we swap out -> no problem we'll see one of several instances of the page and we'll need to reswap it out later anyways if someone currently writes to it I know this because I tested this extensively ages ago on sparc32 where it really mattered. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] flush_cache_page while pte valid 2002-11-12 6:53 ` David S. Miller 2002-11-12 14:49 ` Rik van Riel @ 2002-11-12 17:43 ` Hugh Dickins 2002-11-12 21:51 ` David S. Miller 1 sibling, 1 reply; 29+ messages in thread From: Hugh Dickins @ 2002-11-12 17:43 UTC (permalink / raw) To: David S. Miller; +Cc: akpm, dmccr, riel, linux-kernel On Mon, 11 Nov 2002, David S. Miller wrote: > From: Hugh Dickins <hugh@veritas.com> > Date: Tue, 12 Nov 2002 06:53:04 +0000 (GMT) > > Thanks for shedding light on that; but I'm still wondering if there > might be data loss if userspace modifies the page in the tiny window > between correctly positioned flush_cache_page and pte invalidation? > > The flush merely writes back the data, a copy-back operation, fully L2 > cache coherent. All cpus will see correct data if an intermittant > store occurs. Sorry, I still don't get it. If the flush_cache_page is doing something necessary, then won't a user access in between it and invalidating pte undo what was necessary? And if it's not necessary, why do we do it? (For better performance would be a very good reason.) But don't worry about me: I may well have some fundamental misconception which emailing back and forth will fail to resolve, would just waste your time and expose my ignorance. (I think Andrew has sometimes protested that "flush" can mean too many different things.) So I'm trying to understand it better from over here - but glad to see Rik seems to understand my concern. Hugh ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] flush_cache_page while pte valid 2002-11-12 17:43 ` Hugh Dickins @ 2002-11-12 21:51 ` David S. Miller 2002-11-12 22:59 ` Hugh Dickins 0 siblings, 1 reply; 29+ messages in thread From: David S. Miller @ 2002-11-12 21:51 UTC (permalink / raw) To: hugh; +Cc: akpm, dmccr, riel, linux-kernel From: Hugh Dickins <hugh@veritas.com> Date: Tue, 12 Nov 2002 17:43:40 +0000 (GMT) Sorry, I still don't get it. If the flush_cache_page is doing something necessary, then won't a user access in between it and invalidating pte undo what was necessary? And if it's not necessary, why do we do it? (For better performance would be a very good reason.) If there are other writable mappings of the page, we can't swap it out legally. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] flush_cache_page while pte valid 2002-11-12 21:51 ` David S. Miller @ 2002-11-12 22:59 ` Hugh Dickins 0 siblings, 0 replies; 29+ messages in thread From: Hugh Dickins @ 2002-11-12 22:59 UTC (permalink / raw) To: David S. Miller; +Cc: akpm, dmccr, riel, linux-kernel On Tue, 12 Nov 2002, David S. Miller wrote: > From: Hugh Dickins <hugh@veritas.com> > Date: Tue, 12 Nov 2002 17:43:40 +0000 (GMT) > > Sorry, I still don't get it. If the flush_cache_page is doing something > necessary, then won't a user access in between it and invalidating pte > undo what was necessary? And if it's not necessary, why do we do it? > (For better performance would be a very good reason.) > > If there are other writable mappings of the page, we can't swap > it out legally. But I'm worried about the case where this is the last writable mapping: it seems userspace (on another CPU) can still write to it in between the flush_cache_page and invalidation of the pte (on this CPU hoping to swap out that page). Hugh ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2003-01-07 8:58 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-11-11 18:25 [PATCH] flush_cache_page while pte valid Hugh Dickins 2002-11-11 18:35 ` Andrew Morton 2002-11-11 23:19 ` David S. Miller 2002-11-11 23:34 ` [PATCH] [RFC] increase MAX_ADDR_LEN Roland Dreier 2002-11-11 23:38 ` David S. Miller 2002-11-11 23:58 ` Roland Dreier 2002-11-12 0:18 ` Alan Cox 2002-11-12 0:01 ` Roland Dreier 2002-11-12 14:23 ` Alan Cox 2002-11-12 14:20 ` Folkert van Heusden 2002-11-12 15:14 ` Roland Dreier 2002-11-12 16:00 ` Alan Cox 2002-11-12 15:50 ` YOSHIFUJI Hideaki / 吉藤英明 2002-11-12 20:36 ` Roland Dreier 2002-11-12 22:31 ` David S. Miller 2002-11-12 22:39 ` Roland Dreier 2002-12-04 18:31 ` rtnetlink replacement for SIOCSIFHWADDR Roland Dreier 2002-12-04 20:11 ` Andi Kleen 2002-12-31 23:11 ` [PATCH] increase MAX_ADDR_LEN Roland Dreier 2003-01-06 15:52 ` [PATCH] increase MAX_ADDR_LEN (resend) Roland Dreier 2003-01-06 23:29 ` David S. Miller 2003-01-07 8:58 ` [PATCH] increase MAX_ADDR_LEN David S. Miller 2002-11-12 6:53 ` [PATCH] flush_cache_page while pte valid Hugh Dickins 2002-11-12 6:53 ` David S. Miller 2002-11-12 14:49 ` Rik van Riel 2002-11-12 21:45 ` David S. Miller 2002-11-12 17:43 ` Hugh Dickins 2002-11-12 21:51 ` David S. Miller 2002-11-12 22:59 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox