MMIX LOGO

MMIX Bug Report SYNCD

Table of Content

Content

MMIXware Version

mmix-20110831

Bug Reported

Initial: 09/20/2011
Update: 11/6/2011

Author

Eiji Yoshiya

Description

The syncd instruction may lead to the following situation:
  • Dfiller holds Scache lock and waits for Dcache lock,
  • Clean holds Dcache lock and waits for Dflusher to complete,
  • Dfluhser waits for Scache lock.
  • and then, the syncd instruction never completes.
To reproduce this situation, run mmmix with plain.mmconfig and the following program.
000000000000:
        e0018000e6010001	seth	$1,0x8000	incml	$1,0x0001
        f200003efd000000	pushj	$0, 0x3e*4	swym	0,0,0
        e0008000e5000001	seth	$0,0x8000	incmh	$0,0x0001
        e3010110af000000	setl	$1,0x0110	stou	$0,$0,0
        2300002027010101	lda	$0,$0,0x20	subu	$1,$1,1
        4b01fffdfd000000	bnz	$1,@-0x3*4	swym	0,0,0
        e3030030fd000000	setl	$3,0x0030	swym	0,0,0
        270303014b03fffe	subu	$3,$3,1		bnz	$3,@-0x2*4
        e0018000e6010002	seth	$1,0x8000	incml	$1,0x0002
        f200002ef0000000	pushj	$0, 0x2e*4	jmp	@
000000000100:
        e0028000e5020001	seth	$2,0x8000	incmh	$2,0x0001
        e7020300fd000000	incl	$2,0x0300	swym	0,0,0
        8f000000b9070200	ldou	$0,$0,0		syncd	7,$2,0
        f8000000fd000000	pop	$0,0		swym	0,0,0
000000010000:
        0000000000000000
000000020000:
        0000000000000000
000100000000:
        0000000000000000

Then start the program in kernel mode:
mmmix> k
mmmix> 10000
mmmix> v1ff
mmmix> 1
...
mmmix> 1
...

The syncd instruction never completes.

Details

Cycle 8449
LSU1:2 executes ld instruction. Data is not in Dcache and Dcache is full. So LSU1:2 starts Dflusher, gives Scache lock to Dfiller, and starts Dfiller. (mmix-pipe.w, line 4982-4989)

Cycle 8450
* Dfluhser waits for Scache lock. (mmix-pipe.w, line 3860)

Cycle 8451
Dfiller starts Sfiller. (mmix-pipe.w, line 4035)

Cycle 8454
Sfiller reads data from memory, passes data to Dfiller, and schedules Dfiller in 10 cycles. (mmix-pipe.w, line 3951-3955)

Cycle 8464
Dfiller receives data from Sfiller, passes data to LSU1:2 and schedules LSU1:2 in 2 cycles. (mmix-pipe.w, line 3999-4003)

Cycle 8466
LSU1:2 receives data from Dfiller and then LSU1:2 completes ld instruction. (mmix-pipe.w, line 2674-2683)

Cycle 8468
Now syncd instruction reaches the hot seat; LSU2:2 executes syncd instruction and starts Clean coroutine. (mmix-pipe.w, line 6428-6435)

Cycle 8469
* Clean gets Dcache lock and waits for Dflusher to complete. (mmix-pipe.w, line 4154-4160, 4130-4133)

Cycle 8494
Sfiller completes reading cache block and schedules Dfiller at next cycle. (mmix-pipe.w, line 3964)

Cycle 8497
To copy data from Scache to Dcache,
* Dfiller waits for Dcache lock holding the Scache lock. (mmix-pipe.w, line 4011)

Lines with a leading * show that the meta-simulator goes into deadlock.

Proposed patch

Locking the Dcache in "clean" is dangerous. In this case, waiting for Dflusher while holding the Dcache lock causes deadlock. So, I changed mmix-pipe.w to release the Dcache lock before waiting for Dflusher.

The following is my patch:


--- ../mmix-20110831/mmix-pipe.w        2011-09-01 03:23:52.000000000 +0900
+++ mmix-pipe.w 2011-10-30 00:06:11.562500000 +0900
@@ -4130,7 +4130,7 @@
 Dclean: data->state=1;@+
   data->ptr_b=(void*)p;@+
   wait(Dcache->access_time);
-case 1:@+if (Dcache->flusher.next) wait(1);
+case 1:@+if (Dcache->flusher.next) goto Dclean_retry;
   flush_cache(Dcache,p,data->x.o.h==0);
   p->tag.h|=data->x.o.h;
   release_lock(self,Dcache->lock);
@@ -4161,6 +4161,16 @@
   }
   data->state=9;@+
   wait(Dcache->access_time);
+Dclean_retry:
+  release_lock(self,Dcache->lock);
+  if (data->i==sync) data->state=100;
+  else data->state=4;
+  wait(1);
+case 100:@+ if (Dcache->lock || (j=get_reader(Dcache)<0)) wait(1);
+  startup(&Dcache->reader[j],Dcache->access_time);
+  set_lock(self,Dcache->lock);
+  i=data->y.o.h, j=data->y.o.l;
+  goto Dclean_loop;

 @ @<Cases 5 through 9...@>=
 case 5:@+ if (self->lockloc) *(self->lockloc)=NULL, self->lockloc=NULL;

Discussion

This bug was fixed.

I did an extensive study of the source code to see what locks coroutines do hold and do wait for. It is probably best to bring the locks/coroutines in an partial order so that a owning lock A, a coroutine is allowed to wait for lock B only if B comes later in the partial order. (coroutines are like locks because you can wait on coroutine->next) I established the following order: (with Scache present)

  1. all coroutines for executing instructions
  2. write_from_wbuf
  3. wbuf_lock
  4. clean_lock
  5. clean_co
  6. I/Dcache reader
  7. I/Dcache lock
  8. I/Dcache flusher
  9. Scache lock
  10. I/Dcache filler
  11. Scache flusher
  12. mem_lock
  13. Scache filler
Then all but one wait(1) respect this order: Dcache filler (fill_from_S) waits for Dcache lock holding the Scache lock in mmix-pipe.w line 4011
  case 3: @inbuf|@>;
    data->state=4;@+wait(Scache->access_time);
  case 4:@+ if (c->lock) wait(1);
    set_lock(self,c->lock);
    Scache->lock=NULL; /* we had been holding that lock */
    load_cache(c,(cacheblock*)data->ptr_b);
    data->state=5;@+ wait(c->copy_in_time);
  case 5:@+if (cc) awaken(cc,1); /* second wakeup call */
    goto terminate;
It should release the Scache->lock before waiting, similar to the fill_from_mem routine. So the code should be:
  case 3: @inbuf|@>;
    data->state=4;@+wait(Scache->access_time);
  case 4:@+Scache->lock=NULL; /* we had been holding that lock */
    data->state=5;
  case 5:@+if (c->lock) wait(1);
    set_lock(self,c->lock);
    load_cache(c,(cacheblock*)data->ptr_b);
    data->state=6;@+ wait(c->copy_in_time);
  case 6:@+if (cc) awaken(cc,1); /* second wakeup call */
    goto terminate;
compare this to fill_from_mem:
  case 1: release_lock(self,mem_lock);
    data->state=2;
  case 2:@+if (c!=Scache) {
      if (c->lock) wait(1);
      set_lock(self,c->lock);
    }
    if (cc) awaken(cc,c->copy_in_time); /* the second wakeup call */
    load_cache(c,(cacheblock*)data->ptr_b);
    data->state=3;@+ wait(c->copy_in_time);
  case 3: goto terminate;
To release the Dcache lock in clean while waiting for the Dcache flusher as proposed by Eiji Yoshiya would eliminate the Dcache lock as a precondition to the Dcache flusher. Even without actually taking that lock, for instance write_from_wbuf in lines 4660 to 4675 assumes, that there was a wait on the Dcache lock in line 4601. So the lock should remain a precondition for the Dcache flusher. (@= actually could set this lock and release it immediately. So I assume the simulator is not "too lenient here".)

Please help to keep this site up to date! If you want to point out important material or projects that are not listed here, if you find errors or want to suggest improvements, please send email to email