Loading...
Started Oct 1999 by Kanoj Sarcar <kanoj@sgi.com> The intent of this file is to have an uptodate, running commentary from different people about how locking and synchronization is done in the Linux vm code. vmlist_access_lock/vmlist_modify_lock -------------------------------------- Page stealers pick processes out of the process pool and scan for the best process to steal pages from. To guarantee the existance of the victim mm, a mm_count inc and a mmdrop are done in swap_out(). Page stealers hold kernel_lock to protect against a bunch of races. The vma list of the victim mm is also scanned by the stealer, and the vmlist_lock is used to preserve list sanity against the process adding/deleting to the list. This also gurantees existance of the vma. Vma existance is not guranteed once try_to_swap_out() drops the vmlist lock. To gurantee the existance of the underlying file structure, a get_file is done before the swapout() method is invoked. The page passed into swapout() is guaranteed not to be reused for a different purpose because the page reference count due to being present in the user's pte is not released till after swapout() returns. Any code that modifies the vmlist, or the vm_start/vm_end/ vm_flags:VM_LOCKED/vm_next of any vma *in the list* must prevent kswapd from looking at the chain. This does not include driver mmap() methods, for example, since the vma is still not in the list. The rules are: 1. To modify the vmlist (add/delete or change fields in an element), you must hold mmap_sem to guard against clones doing mmap/munmap/faults, (ie all vm system calls and faults), and from ptrace, swapin due to swap deletion etc. 2. To modify the vmlist (add/delete or change fields in an element), you must also hold vmlist_modify_lock, to guard against page stealers scanning the list. 3. To scan the vmlist (find_vma()), you must either a. grab mmap_sem, which should be done by all cases except page stealer. or b. grab vmlist_access_lock, only done by page stealer. 4. While holding the vmlist_modify_lock, you must be able to guarantee that no code path will lead to page stealing. A better guarantee is to claim non sleepability, which ensures that you are not sleeping for a lock, whose holder might in turn be doing page stealing. 5. You must be able to guarantee that while holding vmlist_modify_lock or vmlist_access_lock of mm A, you will not try to get either lock for mm B. The caveats are: 1. find_vma() makes use of, and updates, the mmap_cache pointer hint. The update of mmap_cache is racy (page stealer can race with other code that invokes find_vma with mmap_sem held), but that is okay, since it is a hint. This can be fixed, if desired, by having find_vma grab the vmlist lock. Code that add/delete elements from the vmlist chain are 1. callers of insert_vm_struct 2. callers of merge_segments 3. callers of avl_remove Code that changes vm_start/vm_end/vm_flags:VM_LOCKED of vma's on the list: 1. expand_stack 2. mprotect 3. mlock 4. mremap It is advisable that changes to vm_start/vm_end be protected, although in some cases it is not really needed. Eg, vm_start is modified by expand_stack(), it is hard to come up with a destructive scenario without having the vmlist protection in this case. The vmlist lock nests with the inode i_shared_lock and the kmem cache c_spinlock spinlocks. This is okay, since code that holds i_shared_lock never asks for memory, and the kmem code asks for pages after dropping c_spinlock. The vmlist lock can be a sleeping or spin lock. In either case, care must be taken that it is not held on entry to the driver methods, since those methods might sleep or ask for memory, causing deadlocks. The current implementation of the vmlist lock uses the page_table_lock, which is also the spinlock that page stealers use to protect changes to the victim process' ptes. Thus we have a reduction in the total number of locks. |