The original LDVTS Load Virtual Translation Status Instruction should be modified to enable an operating system to compute a translation from virtual to physical addresses in a fast and convenient way. The old specification: $Y+$Z contain a 64 bit translation key, where the last 3 bit have the following meaning: if equal to 000 delete the key value pair from all possible translation caches else replace the 3 bit protection code in all caches where the translation is pressent by the given 3 bit. The result $X will be set to 0 if the key was not present in any translation cache set to 1 if the key was present in the instruction translation cache set to 2 if the key was present in the data translation cache set to 3 if the key was present in both caches The purpose of this instruction is according to mmix-doc.w : The operating system needs a way to keep such caches up to date when pages are being allocated, moved, swapped, or recycled. The operating system also likes to know which pages have been recently used. The \.{LDVTS} instructions facilitate such operations. For the purpose of allocated moved swapped or recycled pages, the single case with three zero bit would be enough. It deletes an invalid translation from the cache and a correct translation is reloaded from memory, if and when it is needed. In the light of cache consistency, the other cases with these bits not zero, the instruction is efficient but dangerous. It permits the creation of a cache, that is inconsistent with the page tables kept in memory, by writing one value to memory and inserting an other key into the cache. The return value of 0, 1, 2 or 3 gives a limited amount of information about cache consistency. For the following case, access to the translation would be beneficial. To implement a TRAP instruction for the Fread function using a DMA capable disk drive. We proceed as follows: We determine from the file handle the sector number on disk and write this number to the corresponding data register of the disk. If the target buffer is less than one disk block we allocate a temporary buffer otherwise we transfer the block directly to the target buffer. For the target buffer, owned by the running process (and possibly the temporary buffer, owned by the operating system) we probably know only the virtual address, and we have to translate it to a physical address to be stored in the corresponding DMA register of the disk drive. At this point, even if hardware support for the translation is available, we have to resort to slow, software based translation using the page tables unless we have an instruction that allows the use of hardware based lookup (and the use of a translation cache). We therefore propose an augmented return value for the LDVTS instruction. proposal 1: Register~X is set to [0 16 bit][a 48-s bit][0 s-bit] if a translation exists or a negative value if none such translation exists. This return value is best for computing physical addresses. only an OR-instruction is neede to recombine it with the page offset How to use this instruction in the above case: assume the virtual address is in $0 we need the process number n and the page size s to produce a key: GET $1,rV SRU $2,$1,40 AND $2,$2,#FF s is in $2 SET $3,#1FF8 10 bit mask for n AND $1,$1,$3 n<<3 is in $1 SRU $3,$0,s SLU $3,$3,s clear the low bits OR $1,$3,$1 put in the process number n LDVTS $1,$1,0 ; for the last two instuctions also possible LDVTS $1,$3,$1 BN $1,nopysicaladdress now we have in $1 the translation. SET $3,1 SLU $3,$3,$2 SUB $3,$3,1 mask for page offset (s bit) AND $0,$0,$3 page offset in $0 OR $0,$0,$1 the complete physical address proposal 2: 1. We want a plain lookup of the translation either with or without having the value in the cache. 2. We want to retain the ability to delete a key/value pair from the cache. 3. We want the ability to force a reread of the page tables, either deleting or redefining the key/value pair. 4. Im not sure if we want the ability to set/modify the cache value directly. This is efficient but dangerous. A reread from the page tables immediately after changing the tables in memory should be efficient too, since the necessary data is still in cache. Alternatively just deleting the pair from the cache is fast and the lookup will be done if and when needed. Changing the cache without updating the memory (or havin the values in the data cache is dangerous, since different processors may need synchronized caches. 5. We want the information about the contents of the caches regarding the given value. For 5: There is enough room in the register $X. Only 48-s bit are used. Either the upper 16 bit or the lowest 13 bit could be used. Either way these bit can be cleared with a 16-bit immediate AND NOT instruction and extracted with either a shift or an AND instruction. Proposal: we use the lowest 2 bit. This is compatible with the old definition. Only an AND $X,$X,3 is needed to clear the translation. For 1, 2, and 3: Here we want to add additional information to the parameter $Y+$Z We could use the sign bit, nbut ist only one bit. We can use the s-13 bit, but s may be 13 so none are left. And we are stuck with using the 3 low bit. These can have the values from 0 to 7. We choose the following behaviour: 0 delete from both caches 1 check data VT cache, provide translation if present 2 check instruction VT cache, provide translation if present 3 lazy update both caches. if a matching entry is present in one of the caches, reread the translation from memory and replace the matching enties in both caches. 4 update of data VT cache. reread translation from memory and place into data VT cache. replace an existing instruction VT cache entry. 5 update of instruction VT cache. reread translation from memory and place into instruction VT cache. replace an existing data VT cache entry. 6 read data translation, read a translation from the data cache, if not present read the translation from memory and keep the result in the data VT cache 7 read instruction translation, read a translation from the instruction cache, if not present, read the translation from memory and keep the result in the instruction VT cache The result register~X is set to -1 if no valid translation was obtained, because the enties in the cache were deleted, or not present and the instruction did not call for a read from memory, or the reading from memory did not produce a valid translation. Otherwise register X will contain the translated address in this form [0 16 bit][a 48-s bit][0 (s-3)-bit][the 3 protection bit] These are typical usage patterns for the instruction: - After a page has been swaped out, use with 0. The key is deleted from both VT caches. A new translation is not cached, since it won't be needed in the near future. - After a new page with instructions has been assigned, use with 5 The translation of the given key will be determined from memory. The needed memory locations are in the data cache if this instruction is executed just after updating the page tables. The new value is placed in the instruction VT cache. If there was an (old) entry for the key in the data VT cache, this entry is replaced. Similar if a new page of data is assigned, use with 4 - If the physical address of a data buffer is needed, use with 6 This will use a cached translation if present. If not present, the translation will come from memory. This is the normal behaviour of, for example, a LDO or STO command. If the physical address of some instructioj is needed use with 7. - If a page that was used as a data page is changed into an instruction page (when modifying code, be sure to use the SYNCID instruction) use LDVTS with 5 if the instruction is executed soon. - If a page is moved in memory, and the operating system does not care whether this is an instruction or a data page and neither if the value is cached or not since it is unclear whether the program is just active or sleeping, use 3. This will update translations that are already cached but will not create new cache entries. It does not read memory unless needed. - If the operating system wants to check the content of the caches, without using memory access, it can use 1 or 2