Lecture 16: The Processor

CS/컴퓨터 구조

Lecture 16: The Processor - 5

arsenic-dev 2024. 12. 6. 04:48

경희대학교 김정욱 교수님의 컴퓨터 구조 수업을 기반으로 정리한 글입니다.

3. Control Hazards

Problem

Control Hazard는 branch hazard라고도 하며, Data Hazard와 마찬가지로 pipeline시 발생하는 문제이다.

즉, branch할 때 pipeline에서 발생하는 hazard가 control hazard이다.

▶ Control hazard

beq 조건 만족시 44, 48, 52줄의 instruction은 필요없게 된다.

하지만, 이때 pipeline시 조건을 만족하면 문제가 발생한다. (조건을 만족하지 않으면 문제가 발생하지 않는다.)

EX/MEM의 Zero signal에 기반하여, MEM/WB의 결과값인 branch or not (PC + 4)로 branch 여부가 결정된다.

브랜치 조건 만족시 and, or, add 신호 (나머지 신호)는 어떻게 될까?

정답은 버리면 된다. (no operation)

beq 만족 시 수행하면 안 되는 것이기에 값을 버리지 않고 그대로 가져갈 수 없다.

pipeline이 아닌, single cycle의 경우 조건 만족하지 않으면 그대로 가면 되고,

조건 만족하면 branch 주소로 이동해 instruction을 수행하면 되기에 아무런 문제가 되지 않는다.

Solution

Solution 1: Stall until branch direction is clear

branch direction이 확실해질 때까지 Stall하는 방법이다.

이 방법은 가장 naive한 방법으로 매우 느리기에 pipelineing의 의미가 사라진다.

Solution 2: Predict branch not taken

이미 계산된 PC + 4를 다음 instruction으로 예측하여 그대로 가는 방법이다.
그러나, 만약 branch가 taken되면 PC + 4 instruction은 버려져야 한다. (no operation)

not taken: branch X

branch not taken으로 그냥 찍고, 이미 결정난 PC + 4로 그대로 가는 것이다.

※ Branch (beq)에서 평균적으로 not taken이 47%, taken이 53%이다.

Solution 3: Predict branch taken

branch address의 instruction으로 예측하여 branch하는 방법이다.
그러나, 만약 branch가 not taken되면 branch address instruction은 버려져야 한다. (no operation)

taken: branch O

branch taken으로 그냥 찍고, branch address로 가는 것이다.

※ Solution 1 < Solution 2 < Solution 3, 이 세가지 Solution들은 모두 naive한 방법이다.

Solution 4: Execute both paths

2개의 CPU로 2개의 path 모두 실행한 후 후에 선택하 방법이다.
2개의 path: not taken (PC + 4) / taken (branch)

이 방법은 가장 돈이, 즉 CPU 자원이 많이 든다. (CPU 자원이 충분할 때 사용할 수 있는 Solution)

beq가 많을 경우 현실적으로 좋은 Solution은 아니다.

Solution 5: Delay of branches

branch의 결과가 나올 때까지 (3 cycle) branch와 관련 없는 instruction을 수행하는 방법이다. (Reordering)
delay하는 동안 nop instruction을 삽입한다.

가장 현명한 방법이다.

▶ Define branch to take place after a following instruction

Predict Solution의 경우 결과를 알 수 없기에 찍는 방식을 택한 것인데,

이에 반해 Reordering 방법은 결과를 알 수 있게 만든다.

이 Solution도 load-use data hazard 문제의 reordering solution과 마찬가지로,

datapath가 아닌 compiler에서 해결하는 방법이다.

위 예시에, reordering 할 수 있는 nop instruction은 3개까지만 가능하다.

만약 4개이면 'add $10, $2, $3'도 포함하게 되는데, 이는 branch와 관련이 있기에 branch할 경우 버려져 연산을 하다 만 것이 되기에 문제가 된다.

Example) Predict: branch is not taken -> 정답: taken

▶ 예시

▶ Reducing Branch Delay 1

not taken으로 찍었기 때문에 '40 beq' 다음 PC + 4인 '44 and'를 Fetch 한다.

그렇게 not taken으로 가다가 beq가 EX 된 후, taken으로 되어야 한다는 것을 알아차리면

다음 cycle에는 branch instrucion을 수행하도록 전달한다.

하지만, 이 경우에도 taken을 EX 단계에서 zero signal에 의해 늦게 알아차리게 되니,

두 번이나 쉬어야 한다는 문제가 있다. (이때, 한 cycle이 아니라 한 번이니 주의할 것)

※ IF -> ID -> EX -> MEM -> WB

※ 빨간색 X: no operation 해 값을 비워 버리면 된다. (Flush: 값을 0으로 초기화 함.)

▶ Reducing Branch Delay 2

이 경우 두 번 쉬어야 하는 문제를 1번만 쉬는 거로 개선한다.

방법은 EX 단계 말고 ID 단계에서 미리 알아차릴 수 있도록 하는 것이다.

EX 단계의 같은지 다른지 비교하는 연산(빼기)을 앞에다 위치시켜 zero signal을 앞에서 띄워줄 수 있도록 한다.

또한, Register의 Target Address를 계산하는 부분도 앞에다 위치시킨다. (앞에다 위치시켜도 문제되지 않음.)

-> Zero, ALU result를 output으로 내보내는 Add를 기존 EX 단계에서 ID 단계로 옮긴다.

※ Recognizing 할 때 not taken이 정답일 경우엔 그냥 가면 된다.

▶ Example - Reducing Branch Delay 2 실제 회로

Target Address Adder와 Register Comparator (비교기)를 ID 단계 옮긴다.

▶ The result: $1 and $3 are equal -> single bubble or nop

Dynamic Branch Prediction

과거의 값들을 보고 Branch 여부에 대해 좀 더 현명하게 예측하여 최소한으로 실패하자

Look up the address

이전 instruction들의 branch 여부를 본다. (history -> future)

▶ for문 -> 계속 branch (예시)

과거에도 branch 했으니 계속 branch를 할 것이라 예측하는 것이다.

Branch prediction buffer (1-bit)

Save previous result: branch taken (1) branch not taken (0)

※ 1-bit: 바로 한 개 이전 기억만 저장

Shortcomings (in Nested Loop, 단점)

Inner loop가 계속 branch taken (1) 되다가 어느 순간 빠져나올 때, 한 번의 miss -> 이때 branch not taken (0) 으로 변경
하지만 이때, outer loop가 실행이 되면 branch taken (1) 해야 하는데 branch not taken (0) 으로 되어 있어서 추가적인 miss

즉, Nested Loop에서 한 번 틀리면 한 번 더 틀리는 2번의 연속적인 miss가 발생할 수 있다.

▶ branch prediction buffer (1-bit)

Branch prediction buffer (2-bits)

버퍼를 2개로 늘려 2번 연달아 예측이 틀려야 상태가 바뀌도록 함 (완충 역할)

※ branch prediction buffer은 1-bit 보다 2-bit가 더 많이 사용된다.

▶ branch prediction buffer (2-bit)

※ 시작점은 파란색 or 빨간색 둘 중 하나이다.

※ 3-bits일 경우, 좀 더 좋을 수 있다. 물론, 2중 for문의 경우 2-bits로 충분히 커버가 가능하다.

1-bit predictor vs. 2-bits predictor

▶ 예시

※ 1-bit prediction의 정답 예측 비율이 50% 정도라면 사실상 그냥 찍는 것과 다를 바 없다.

Exception

프로그램 실행을 방해하는 unscheduled event
ⓛ Execution of an undefined instruction ② Arithmetic overflow

▶ Internal - Exception / External - Interrupt

control hazard 관점에서 Exception (예외 상황)이 발생하면 문제를 해결해 주는 곳으로 갔다가 돌아와야 한다.

다음 instruction에 대해서는 해결이 될 때까 쉬어야 한다.

때문에, Exception을 다루는 것도 control hazard의 일종이다.

Handling Exceptions in MIPS

Procedure

1. 문제가 생긴 instruction의 address(PC + 4)를 Exception Program Counter (EPC)에 저장

EPC: 예외 처리 후 돌아올 위치를 저장하는 32-bit register

2. exception의 원인을 Cause Register에 저장

Cause Register: exception의 원인을 기록하는 32-bit register
원인: undefined instruction -> 10 / arithmetic overflow -> 12

3. Jump to handler를 통해 Handler가 처리할 수 있도록 제어권을 넘기기

▶ Exception type에 따른 Address of Handler

Handling Exceptions in Pipeline

add $1, $2, $1

▶ 예시) Suppose overflow on add instruction in EX stage (ALU)

Procedure

1. overflow된 값이 $s1에 저장되는 것을 막기

2. Complete previous instructions

3. add instruction부터는 pipeline에서 Flush (비우다) -> EX.Flush

4. Exception Program Counter (EPC), Cause Register 설정

5. Jump to overflow handler (8000 0180)

handling 하여 문제를 해결한 이후엔 EPC에 저장한 PC + 4 값을 가지고 돌아와 이어서 flush 한 instrucion을 수행한다.

※ 2, 3은 동시에 이루어진다.

▶ datapath with controls to handle exceptions

EX 단계의 ALU에서 overflow가 detect 된다.

Handling Exceptions in Pipeline (Example)

▶ Suppose overflow on add instruction in EX stage (ALU)

Procedure

1. In clock cycle (CC) 6, EX stage (ALU)에서 $2와 $1 add operation 수행

2. 수행하는 단계에서 overflow 감지 -> 8000 0180이 PC에 강제 저장 (forced)

3. In clock cycle (CC) 7, add instruction을 포함한 이후 명령어들은 flushed

4. exception code의 첫 번째 instruction fetched

5. 문제 발생 다음 instruction (Address 50)은 EPC에 saved

▶ Procedure 1, 2

▶ Procedure 3 ~ 5

Concluding Remarks

Instruction Set Architecture (ISA) influences design of datapath and control

Datapath and control influences design of ISA

Pipeline

Pipelining was presented as reducing the clock cycle time of the simple single-cycle datapath

Hazards

Structure hazards, data hazards, control hazards

'CS > 컴퓨터 구조' 카테고리의 다른 글

Lecture 18: Memory Hierarchy - 2 (1)	2024.12.15
Lecture 17: Memory Hierarchy - 1 (0)	2024.12.07
Lecture 15: The Processor - 4 (0)	2024.12.03
Lecture 14: The Processor - 3 (1)	2024.12.01
Lecture 13: The Processor - 2 (0)	2024.11.29

현재글Lecture 16: The Processor - 5

arsenic 알쓰닉?

write-back, array, sql injection, unsorted list, Branch, csrf, Memory, stack, sorted list, computer architecture, 연관 관계, direct-mapped cache, http, amdahl's law, 일반화 관계, XSS, linked structure, queue, xor, Process,

Today :
Yesterday :

arsenic 알쓰닉?