5: page table magic – deduplication and copy-on-write

n this homework, we implement two advanced virtual memory features in xv6. First, we introduce a deduplication system, wherein the kernel scans the system for duplicated pages and frees up memory by reusing identical physical memory page frames in several virtual memory locations.

This, however, introduces a major problem should a process try to write to one of these shared pages. Thus, the second part of our homework is to handle this problem using a copy-on-write approach: write-protect the shared pages, and make a new writable copy for any process that needs it.

The homework template Heading link

As usual, start from the homework template, in this case called hw5. The homework template provides some of the usual template functions, empty system calls and such, but more importantly, it introduces reference counting for physical memory pages.

This mostly lives in kalloc.c. Notice the new functions krelease() and kretain(), as well as a new array of struct frameinfo. Here, kretain() increases the reference count on a page (indicated using the virtual address of the kernel’s mapping of the physical frame, aka P2V(framenumber<<12), just like how kalloc() and kfree() do it. krelease() decreases the count, and if the count reaches zero, it calls kfree() on the page. kalloc() sets the reference count of a freshly allocated page to 1. Finally, all uses of kfree() throughout xv6 were replaced with krelease().

What’s primarily missing is the dedup and copyonwrite functionality, for which placeholders exist in vm.c.

Deduplication Heading link

The program dedup_reader allocates a lot of memory and fills it with lots of identical content. This uses up a lot of physical RAM. It then calls the new system call sys_dedup(), which identifies duplicate pages, and frees up most of the memory through virtual memory deduplication. The program then reads from the memory to make sure it still works as expected.

A correct solution finishes without crashing, and shows an increase in the number of free system pages commensurate with the size of the large allocation in the beginning of the program.

hint I would expect it to be easier and cleaner to do some of the work in kalloc.c, and some in vm.c.

Copy-on-Write Heading link

The program dedup_writer is similar to dedup_reader, except it then writes to some parts of the memory, checks that other parts of memory are unaffected, calls sys_dedup(), writes some more, and calls sys_dedup() again.

For this to work correctly, you need to write-protect your deduped page table entries, so that a page fault is triggered when a write occurs. This is then caught by a new case in the big trap.c switch statement, and which leads to the copyonwrite skeleton function in vm.c.

Requirements Heading link

The programs must both run without crashing or reporting errors. System free memory should be maximized by the dedup functionality: the reduction in memory usage for a correct program is similar to the size of the full allocation (10 MB) since we fill it with all the same data. Finally, no kernel memory leaks: total system free pages may not change between successive runs of the test programs.

Turn-in Heading link

Turn-in is the usual story. Follow this invitation link, then push your hw5 branch to your turn-in repository.