From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Siddhartha Jain" Subject: RE: Replicating directories - Intercepting write/modify system calls Date: Fri, 13 Feb 2004 14:29:16 +0530 Sender: linux-fsdevel-owner@vger.kernel.org Message-ID: References: <000001c3f208$edd70de0$0301a8c0@family> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mymail.netmagicians.com ([202.87.39.126]:49832 "HELO mymail.netmagicians.com") by vger.kernel.org with SMTP id S266831AbUBMJE1 (ORCPT ); Fri, 13 Feb 2004 04:04:27 -0500 To: In-Reply-To: <000001c3f208$edd70de0$0301a8c0@family> List-Id: linux-fsdevel.vger.kernel.org Thanks, your explanation was very succinct and to the point. > > >> Too complicated. Too invasive. > > > In what way? Will it slow down the FS too much? What other approach > > can be adopted to keep the implementation FS-independant and still > > track the changes in real-time. > > The changes you proposed in your original email would require > changes to too > many functions, files, and other segments of code and design. > Such changes would never be adopted by the Linux "powers that be." As such, you'd have > to maintain the code yourself and manually patch each new kernel with your > customized replication code. The goal, therefore, is to make changes that > get adopted into the official Linux kernel to save yourself future > maintenance work and allow others to both debug and benefit from > your code. > > A rule of thumb -- the change that accomplishes the task in the > least amount > of code wins. I totally agree. > > As for another approach, see my next point. > > >> Replicating the entire partition rather than just a specific directory > >> is more feasible. > > > How would I do that? Will the implementation still be FS-independant? > > To be brief, I don't know that it can be done -- the code may have to be > written first -- and no implementation will be FS independent. > > The VFS (Virtual File System) is merely a set of function pointers. A > function pointer when called simply says, "Use this other function." In > other words, a function pointer is a variable that stores which > function to > use, but a function pointer is not in and of itself a function. It cannot > do any grunt work of its own. Hence, implementing your proposal > at the VFS > level would be impossible because all the VFS does is call the appropriate > FS-specific function. Ok, to give an example (with ref to http://www.faqs.org/docs/kernel_2_4/lki-3.html) the function open() really calls upon sys_open in fs/open.c. sys_open looks like: asmlinkage long sys_open(const char __user * filename, int flags, int mode) { char * tmp; int fd, error; #if BITS_PER_LONG != 32 flags |= O_LARGEFILE; #endif tmp = getname(filename); fd = PTR_ERR(tmp); if (!IS_ERR(tmp)) { fd = get_unused_fd(); if (fd >= 0) { struct file *f = filp_open(tmp, flags, mode); error = PTR_ERR(f); if (IS_ERR(f)) goto out_error; fd_install(fd, f); } out: putname(tmp); } return fd; out_error: put_unused_fd(fd); fd = error; goto out; } Change it to include: 1. Check if the filename path matches any master definition in a config file. 2. If it does, append the (filename path - master path) and append the remaining to the mirror path. 3. Replicate every call within sys_open with the parameter mirror_file. Same goes for sys_write and other functions that modify a file (ioctl?). Ofcourse, what I should be doing is trying it out and seeing if it works instead of increasing mail traffic. But I do not know the repurcussions on other parts of the kernel and that is where I need your input. > > Your proposal would be easier to implement in C++ which uses virtual > functions rather than function pointers, but converting the Linux > kernel to > C++ is a whole different topic. FYI: the core kernel developers have > already outright refused to do that. C++?? Now now, I am humble being. I don't want to be another Linus. One's enough for the M$ and the like :) > > >> Does it need to be replicated in real-time, or would an > >> incremental synchronization be acceptable (i.e. synced > >> every 15 minutes or so)? > > > Synced every fifteen minutes is ok but anything that scans the > > whole partition/directory file by file would be nighmarish given > > that the mail toasters I am mirroring have tens of thousands of > > files. Therefore, IMHO, the solution needs to track changes as > > and when they happen to the file and mirror them immiediately. > > > > For eg, a rsync/mirrordir takes upto half-an-hour to sync about > > 100,000 files. Not to mention the 40% CPU load on the file server > > while the rsync runs [trimmed]. > > You can scan ~100,000 files to see if they have changed in less than 90 > seconds -- at least on a locally mounted partition. The part that takes > forever is the actual syncing of data. Yep, scanning might not take long. But the whole process and overheads are just not practical. I guess, rsync/mirrodir weren't written with this application in mind. > > See my next point for a continued explanation. > > >> Does it need to be a two-way synchronization, or is a one-way > >> synchronization acceptable? In other words, do changes on the > >> replica need to be synced back to the original? > > > For my purpose, one-way is fine. But two-way would be cool and > > in the greater good of humankind ;) > > Is this for load balancing, or a sophisticated way of doing a backup? > > If it's just for a backup, I'll tell you right now that your project isn't > worth the effort. > > If it's for load balancing, you'll have to do a > quasi-cost/benefit analysis > on whether or not real-time replication really results in a > balanced load or > if the network bandwidth and CPU cycles consumed by real-time replication > offsets the load balancing benefits. After all, one of the > pillars of load > balancing is to reduce the amount of work in the here and now, > and real-time > replication -- as opposed to syncing every 15 or so minutes -- is > counterproductive to that end. See, I have a NetApp filer. All mail toasters mount the same volume from the filer. Now, what if the filer dies - as in a hardware failure? I would have to bring up another file server and restore the last tape backup to it. The loss of mails between the last tape backup and disaster time are not acceptable. Another option is that I buy another filer and run (buy) NetApp's propreitary code to do the replication between the two filers. Unfortunately, I have money for neither. So what I can do is keep another Linux box as a filer. Export a volume as NFS to all the toasters. And the toasters replicate the NetApp filer mounted directory to the Linux-filer mounted directory. Initially, one way is good. Later, if I can get two-way to work, then I catch load-balance between the NFS filers with intelligent DNS. > > >> Are you willing to commission someone (i.e. pay them money) to > >> do this work? 8-D > > [paraphrased] Yep, I will give you a free mail account on hotmail.... > > Uh... gee, thanks, but I was hoping for something along the lines > of actual > cash or employment. You see, I'm a computer programmer with my BA in > Computer Science and even a 3.0 GPA, but no one's willing to hire > me without > 2-3 years of experience. (Catch 22. How do I get the experience > if no one > is willing to give it?) BTW, I could also be a system administrator; my > degree covered that too. It's just that I haven't gotten far > enough down my > career path to be fixed on one road or the other. I don't want to mock you. I know you live in the US or some developed nation. But if you want a job, I can offer you one. Its in India and we pay reasonanly well according to Indian standards. On what we pay, you can rent a place, buy a car, spend nights at a disc/pub/PS2/X-box and still save a bit. We are a small company that enjoys working with people who have new ideas and we encourage whatever cool stuff people want to do (as long as it makes some money for us). Basically, no corporate crap and policy and stuff. Check out www.netmagicsolutions.com > > > Btw, I am willing to do all the dirty work of wrestling with C code > > and testing but I need guidance. > > At the risk of sounding rude, without the motivation of actual cash or > employment, the most I'm willing to do is direct you to some good albeit > expensive books on the topic. Thanks for the same. I don't expect anyone to do my work for me. Goodluck with your little project. May it find its place in the official Linux kernel tree :)