From mboxrd@z Thu Jan 1 00:00:00 1970 From: Patrick Farrell Date: Tue, 16 Jun 2015 15:23:15 -0500 Subject: [lustre-devel] Lock ahead v1 Message-ID: <558085B3.1040400@cray.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: lustre-devel@lists.lustre.org Hello, I?ve been hard at work on lock ahead for some time, and there's been a notable change in the design. (I?m not going to recap lock ahead here ? If you?d like background, please check out the slides and/or video of my LUG talk: http://cdn.opensfs.org/wp-content/uploads/2015/04/Shared-File-Performance-in-Lustre_Farrell.pdf ; http://youtu.be/ITfZfV5QzIs ) I'm emailing here primarily to explain the change for those reviewing the patch (http://review.whamcloud.com/#/c/13564/). It has proved extremely difficult to make blocking asynchronous lock requests, which I originally wanted. If the lock requests could be blocking, then they could clear out existing locks on the file. However, there are a number of problems with asynchronous blocking requests, some of which I detailed in emails to this list. With help from Jinshan, I have an idea what to do to fix them, but the changes are significant and, it turns out, not really necessary for lock ahead. Here's why: The main problem with non-blocking lock requests is they will not clear out existing locks, so if there are any on the file, we will not get lock ahead locks granted. To avoid this situation, we will have the library take and release a (blocking) group lock when it first opens the file. This will clear out any existing locks on the file, making it ?clean? for the lock ahead requests. This (mostly) means we don't need blocking lock ahead requests. The lock ahead writing process for writing out a large file, then, looks like this: OPEN, GROUP_LOCK, GROUP_UNLOCK, LOCK_AHEAD (n blocks ahead), WRITE, WRITE, WRITE ? [track position of writes (IE, number of lock ahead locks remaining ahead of the IO), when lock ahead count is small?>] LOCK_AHEAD (n blocks ahead), WRITE, WRITE, WRITE? Etc. This also helps keep the lock count manageable, which avoids some performance issues. However, we need one more thing: Imagine if lock ahead locks are not created of the IO (due to raciness) or they are cancelled by a request from a node that is not part of the collective IO (for example, a user tries to read the file during the IO). In either case, the lock which results will be expanded normally. So it's possible for that lock to be extended to cover the rest of the file, and so it will block future lock ahead requests. That lock will be cancelled when a read or write request happens in the range covered by that lock, but that read/write request will be expanded as well - And we return to handing the lock back and forth between clients. The way to avoid this is to turn off lock expansion for anyone who is supposed to be using lock ahead locks. Their IO requests will normally use the lock ahead locks provided for them, but if the lock ahead locks aren't available (for reasons described above), the locks for these requests will not be expanded. This means that losing a race between IO and the lock ahead lock on a particular lock ahead request (or entire set of lock ahead requests) will never create a large lock, which would block future lock ahead requests. Additionally, if lock ahead is interrupted by a request from another client (preventing lock ahead requests by creating a large lock), the 'real' IO requests from the lock ahead clients will eventually cancel that large lock. Since the locks for those requests aren't expanded, the next set of lock ahead requests (which are out ahead of the IO) will work. Effectively, this means that if lock ahead is interrupted by a competing request or if it fails the race to be ready in time, it can avoid returning to the pathological case. Code implementing lock and this other ioctl to disable expansion is up for review here: http://review.whamcloud.com/#/c/13564/ The current version is essentially 'code complete' and ready for review. - Patrick Farrell