designates my notes. / designates important.
After finishing the Bruce Land/Cornell FPGA/Verilog course, ECE5760 (2011), I found out that there is a new version available for 2017. I haven’t gone through the 2017 version yet, but assuming they are similar to the 2011 version, they will be worthwhile.
I did work my way through the first 30 or so videos of the VHDL tutorial from LBE Books while reading this. These cover a few of the topics skipped by this book, particularly Karnaugh Maps and Quine-McClusky minimization. As with anything, you will learn more by doing than reading. These are short and sweet and great practice fodder to try to implement yourself without looking back at the video.
This book is a good compromise between low level gate logic and higher level abstraction. Definitely not the first book you want to pick up on electrical engineering or VHDL, but if you have a solid understanding of EE and a software foundation to approach the VHDL component for, it is a worthwhile book.
We have now completed our foundational study of digital system design. We started with the basic elements of digital logic, gates and flip-flops, and showed how they can be used in circuits that meet given functional requirements. Given the complexity of requirements for most modern systems, we appealed to the principle of abstraction as a means of managing complexity. In particular, we use hierarchical composition to build blocks from the primitive elements, and systems from those blocks. By this means, we were able to reach the level of complete embedded systems, comprising processors, memories, I/O controllers, and accelerators, without becoming overwhelmed by the detailed interactions of the millions of transistors involved. Throughout our study, we also paid attention to the design methodology real-world effects that arise in digital circuits and the constraints that they imply. We showed how a disciplined design methodology helps us meet functional requirements while satisfying constraints. The study of digital systems in this book serves as a foundation for further studies in several areas.
Other references that may be of interest:
2008 Designers Guide to VHDL
A Guide to Debouncing, Jack G. Ganssle, The Ganssle Group, 2004, www.ganssle.com/debouncing.pdf. Presents empirical data on switch bounce behavior, and describes hardware and software approaches to debouncing.
OpenCores, www.opencores.org. From the website’s FAQ, “OpenCores is a loose collection of people who are interested in developing hardware, with a similar ethos to the free software movement.” The website hosts a repository of freely reusable core designs, many of which are compatible with the Wishbone bus.
Computers as Components: Principles of Embedded Computing System Design, Wayne Wolf, Morgan Kaufmann Publishers, 2005. Includes a discussion of accelerators in the context of embedded hardware and software design, with a video-processing accelerator as a case study.
Static and capacitive loading limits the fanout of a driver, that is, the number of inputs that can be connected to the output.
Propagation delay depends on delay within components, capacitive loading and wire delays. Flip-flops have setup and hold time windows and clock-to-output delays.
A behavioral model describes the function performed by a circuit. A structural model describes the circuit as an interconnection of components.
an n-bit code has 2 possible code words, so an n-bit code can represent information with up to 2^n values. Conversely, if we need to represent information with N values, we need at least ⎡log_2 N⎤ bits in our code. (The notation ⎡x⎤ is called the ceiling of x, and denotes the smallest integer that is greater than or equal to x.)
While it might make sense in some cases to use the shortest code, in other cases a longer code is better. A particular case of a non–minimal- length code is a one-hot code, in which the code length is the number of values to be encoded. Each code word has exactly one 1 bit with the remaining bits 0. The advantage of a one-hot code becomes clear when we want to test whether the encoded multibit signal represents a given value; we just test the single-bit signal corresponding to the 1 bit in the code word for that value.
ieee.numeric_std.all;
0: 0000
1: 0001
2: 0010
3: 0011
4: 0100
5: 0101
6: 0110
7: 0111
8: 1000
9: 1001
A: 1010
B: 1011
C: 1100
D: 1101
E: 1110
F: 1111
y <= resize(x, 8);
and
x <= resize(y, 4);
When we introduced the XNOR gate in Section 2.1.1, we mentioned that it is also called an equivalence gate, since its output is 1 only when its two inputs are the same. Thus, we can test for equality of two unsigned binary numbers using the circuit of Figure 3.11, called an equality comparator. In practice, an AND gate with many inputs is not workable, so we would modify this circuit to better suit the chosen implementation fabric. Better yet, we would express the comparison in a VHDL model and let the synthesis tool choose the most appropriate circuit from its library of cells.
To test whether a number x is greater than another number y, we can start by comparing the most significant bits, x_n-1 and y_n-1. If x_n-1 > y_n-1, we know immediately that x > y. Similarly, if x_n-1 < y_n-1, we know immediately that x<y. In both cases, the final result is completely determined by comparing just the most significant bits. If x_n-1 = y_n-1, the result depends on the remaining bits, and is true if and only if x_n-2…0 > y_n-2…0. We can now apply the same argument recursively, examining the next pair of bits, and, if they are equal, continuing to less significant bits. Note that x_i > y_i is only true for x_i = 1 and y_i = 0, that is, if x_i AND ~y_i is true. These considerations lead to the circuit of Figure 3.12, called a magnitude comparator.
shift_left(s, 2)
shift_right(s,2)
What values are represented by the 8-bit 2s-complement numbers 00110101 and
10110101?
Solution
The first number is:
1 x 2^5 + 1 x 2^4 + 1 x 2^2 + 1 x 2^0 = 32 + 16 + 4 + 1 = 53
The second number is;
-1 x 2^7 + 1 x 2^5 + 1 x 2^4 + 1 x 2^2 + 1 x 2^0 = -128 + 32 + 16 + 4 + 1 = -75
signal n1, n2 : integer -- implies an
range –2**7 to 2**7–1; -- 8-bit range
signal x, y : signed( 7 downto 0);
signal z : signed(11 downto 0);
signal z_sign : std_logic;
n1 <= to_integer(x);
n2 <= n1 + to_integer(y);
z <= to_signed(n2, z'length);
z_sign <= z(z'left);
For negative numbers, the sign bit is 1. We can extend an n-bit negative number to m bits by appending leading 1 bits.
In summary, for a 2s-complement signed integer, extending to a greater length involves replicating the sign bit to the left. This is called sign extension, and preserves the numeric value, be it positive or negative.
We can truncate by discarding the left-most bits, provided all of the discarded bits and the resulting sign bit are the same as the original sign bit.
We can express sign extension or truncation of a signed value in a VHDL model by using the resize operation.
signal x : signed( 7 downto 0);
signal y : signed(15 downto 0);
we can write the following assignment statement in an architecture to sign
extend the value of x and assign it to y:
y <= resize(x, y'length);
Similarly, we can write the following assignment to truncate the value of
y and assign it to x:
x <= resize(y, x'length);
signal v1, v2 : signed(11 downto 0);
signal sum : signed(12 downto 0);
we can add the two 12-bit values and get a 13-bit result using the assignment
sum <= resize(v1, sum'length) + resize(v2, sum'length);
signal x, y, z : signed(7 downto 0);
signal ovf : std_logic;
we can write the following assignments to derive the required sum and
overflow condition bit:
z <= x + y;
ovf <= ( not x(7) and not y(7) and z(7) ) or
( x(7) and y(7) and not z(7) );
signal v1, v2 : signed(11 downto 0);
signal diff : signed(12 downto 0);
we can calculate the 13-bit difference between the two 12-bit values using
the assignment
diff <= resize(v1, diff'length) – resize(v2, diff'length);
signal x, y, z : signed(7 downto 0);
signal ovf : std_logic;
we can write the following assignments to derive the required difference
and overflow condition bit:
z <= x – y;
ovf <= ( not x(7) and y(7) and z(7) ) or
( x(7) and not y(7) and not z(7) );
Example 3.18
What number is represented by the fixed-point binary number 01100010, assuming
the binary point is four places from the right?
solution
The number is 0110.00102
= 0x2^3 + 1x2^2 + 1x2^1 + 0x2^0 + 0x2^-1 + 0x2^-2 + 1x2^-3 + 0x2^-4
= 0 + 4 + 2 + 0 + 0 + 0 + 1/8 + 0 = 6.125_10
Example 3.19
What number is represented by the signed fixed-point
binary number 111101, assuming the binary point is four places from the right?
solution
The number is 11.11012
= -1x2^1 + 1x2^0 + 1x2^-1 + 1x2^-2 + 0x2^-3 + 1x2^-4
= -2 + 1 + 1/2 + 1/4 + 0 + 1/16 = -0.1875_10
signal fp_num : float(5 downto –10);
reg: process (clk) is
begin
if rising_edge(clk) then
if reset = '1' then
q <= '0';
elsif ce = '1' then
q <= d;
end if;
end if;
end process reg;
reg: process (clk, reset) is
begin
if reset = '1' then
q <= '0';
elsif rising_edge(clk) then
if ce = '1' then
q <= d;
end if;
end if;
end process reg;
Example 4.14: Develop a VHDL model of the complex multiplier datapath.
Solution: We will start with the entity declaration. It includes ports for the data inputs and outputs, as well as clock and reset inputs and an input to indicate the arrival of new data. We will return to the last of these inputs later.
library ieee;
use ieee.std_logic_1164.all, ieee.fixed_pkg.all;
entity multiplier is
port (clk, reset: in std_logic;
input_rdy: in std_logic;
a_r, a_i, b_r, b_i: in sfixed(3 downto –12);
p_r, p_i: out sfixed(7 downto –24) );
end entity multiplier;
architecture rtl of multiplier is
signal a_sel, b_sel,
pp1_ce, pp2_ce,
sub, p_r_ce, p_i_ce : std_logic;
signal a_operand, b_operand : sfixed(3 downto –12);
signal pp, pp1, pp2, sum : sfixed(7 downto –24);
begin
a_operand <= a_r when a_sel = '0' else a_i;
b_operand <= b_r when b_sel = '0' else b_i;
pp <= a_operand * b_operand;
pp1_reg : process (clk) is
begin
if rising_edge(clk) then
if pp1_ce = '1' then
pp1 <= pp;
end if;
end if;
end process pp1_reg;
pp2_reg : process (clk) is
begin
if rising_edge(clk) then
if pp2_ce = '1' then
pp2 <= pp;
end if;
end if;
end process pp2_reg;
sum <= pp1 + pp2 when sub = '0' else pp1 – pp2;
p_r_reg : process (clk) is
begin
if rising_edge(clk) then
if p_r_ce = '1' then
p_r <= sum;
end if;
end if;
end process p_r_reg;
p_i_reg : process (clk) is
begin
if rising_edge(clk) then
if p_i_ce = '1' then
p_i <= sum;
end if;
end if;
end process p_i_reg;
end architecture rtl;
1. Multiply a_r and b_r, and store the result in partial product register 1.
2. Multiply a_i and b_i, and store the result in partial product register 2.
3. Subtract the partial product register values and store the result in the
product real part register.
4. Multiply a_r and b_i, and store the result in partial product register 1.
5. Multiply a_i and b_r, and store the result in partial product register 2.
6. Add the partial product register values and store the result in the product
imaginary part register.
type multiplier_state is
(step1, step2, step3, step4, step5);
signal current_state : multiplier_state;
current_state <= step4;
type multiplier_state is (step1, step2, step3, step4, step5);
signal current_state, next_state : multiplier_state;
state_reg : process (clk, reset) is
begin
if reset = '1' then
current_state <= step1;
elsif rising_edge(clk) then
current_state <= next_state;
end if;
end process state_reg;
next_state_logic : process (current_state, input_rdy) is
begin
case current_state is
when step1 =>
if input_rdy = '0' then
next_state <= step1;
else
next_state <= step2;
end if;
when step2 =>
next_state <= step3;
when step3 =>
next_state <= step4;
when step4 =>
next_state <= step5;
when step5 =>
next_state <= step1;
end case;
end process next_state_logic;
output_logic : process (current_state) is
begin
a_sel <= '0'; b_sel <= '0'; pp1_ce <= '0'; pp2_ce <= '0';
sub <= '0'; p_r_ce <= '0'; p_i_ce <= '0';
case current_state is
when step1 =>
pp1_ce <= '1';
when step2 =>
a_sel <= '1'; b_sel <= '1'; pp2_ce <= '1';
when step3 =>
b_sel <= '1'; pp1_ce <= '1'; sub <= '1'; p_r_ce <= '1';
when step4 =>
a_sel <= '1'; pp2_ce <= '1';
when step5 =>
p_i_ce <= '1';
end case;
end process output_logic;
register-transfer level (RTL) view. The word “level” refers to the level of abstraction. Register-transfer level is more abstract than a gate-level view, but less abstract than an algorithmic view.
setup time = t_su
hold time = t_h
clock-to-output delay = t_co
propagation delay = t_pd
clock cycle time = t_c
t_co + t_pd + t_su < t_c
t_co + t_pd-s + t_pd-o + t_pd-c + t_su < t_c
Here, tpd-s is the propagation delay through the combinational subcircuit to drive the status signals, tpd-o is the propagation delay through the output logic to drive the control signals, and tpd-c is the propagation delay through the combinational subcircuit for a change in the control signal to affect the output data.
The path with the longest propagation delay is called the critical path. It determines the shortest possible clock cycle time for the system.
library ieee; use ieee.std_logic_1164.all;
entity debouncer is
port (clk, reset: in std_logic;
pb: in std_logic;
pb_debounced : out std_logic );
end entity debouncer;
architecture rtl of debouncer is
signal count500000 : integer range 0 to 499999;
signal clk_100Hz : std_logic;
signal pb_sampled
: std_logic;
begin
div_100Hz : process (clk, reset) is
begin
if reset = '1' then
count500000 <= 499999;
elsif rising_edge(clk) then
if clk_100Hz = '1' then
count500000 <= 499999;
else
count500000 <= count500000 – 1;
end if;
end if;
end process div_100Hz;
clk_100Hz <= '1' when count500000 = 0 else '0';
debounce_pb : process (clk) is
begin
if rising_edge(clk) then
if clk_100Hz = '1' then
if pb = pb_sampled then
pb_debounced <= pb;
end if;
pb_sampled <= pb;
end if;
end if;
end process debounce_pb;
end architecture rtl;
type RAM_4Kx16 is array (0 to 4095) of std_logic_vector(15 downto 0);
signal data_RAM : RAM_4Kx16;
data_RAM_flow_through : process (clk) is
begin
if rising_edge(clk) then
if en = '1' then
if wr = '1' then
data_RAM(to_integer(a)) <= d_in; d_out <= d_in;
else
d_out <= data_RAM(to_integer(a));
end if;
end if;
end if;
end process data_RAM_flow_through;
library ieee;
use ieee.std_logic_1164.all, ieee.numeric_std.all;
entity dual_port_SSRAM is
port (clk: in std_logic;
en1, wr1 : in std_logic;
a1: in unsigned(11 downto 0);
d_in1: in std_logic_vector(15 downto 0);
d_out1: out std_logic_vector(15 downto 0);
en2: in std_logic;
a2: in unsigned(11 downto 0);
d_out2: out std_logic_vector(15 downto 0) );
end entity dual_port_SSRAM;
architecture synth of dual_port_SSRAM is
type RAM_4Kx16 is array (0 to 4095) of std_logic_vector(15 downto 0);
signal data_RAM : RAM_4Kx16;
begin
read_write_port : process (clk) is
begin
if rising_edge(clk) then
if en1 = '1' then
if wr1 = '1' then
data_RAM(to_integer(a1)) <= d_in1; d_out1 <= d_in1;
else
d_out1 <= data_RAM(to_integer(a1));
end if;
end if;
end if;
end process read_write_port;
read_only_port : process (clk) is
begin
if rising_edge(clk) then
if en2 = '1' then
d_out2 <= data_RAM(to_integer(a2));
end if;
end if;
end process read_only_port;
end architecture synth;
DRAM = (asynchronous) dynamic RAM, needs refreshed (~ every 64ms)
SDRAM = synchronous dynamic RAM
library ieee; use ieee.numeric_std.all;
architecture ROM_based of seven_seg_decoder is
type ROM_array is array (0 to 31) of std_logic_vector(7 downto 1);
constant ROM_content : ROM_array :=
( 0 => "0111111", 1 => "0000110",
2 => "1011011", 3 => "1001111",
4 => "1100110", 5 => "1101101",
6 => "1111101", 7 => "0000111",
8 => "1111111", 9 => "1101111",
10 to 15 => "1000000",
16 to 31 => "0000000" );
begin
seg <= ROM_content(to_integer(unsigned(blank & bcd)));
end architecture ROM_based;
type ROM_512x20 is array (0 to 511) of std_logic_vector(19 downto 0);
constant data_ROM : ROM_512x20 := (X"00000", X"0126F", ...);
FPGA_ROM : process (clk) is
begin
if rising_edge(clk) then
if en = '1' then
d_out <= data_ROM(to_integer(a));
end if;
end if;
end process FPGA_ROM;
The development of IC technology beyond the LSI level led to very large scale integrated (VLSI) circuits.
We use the term application-specific integrated circuit, or ASIC, to refer to an IC manufactured for a particular application.
Another technique, use of differential signaling, is based on the idea of reducing a system’s susceptibility to interference. Rather than transmitting a bit of information as a single signal S, we transmit both the positive signal S_P and its negation S_N. At the receiving end, we sense the voltage difference between the two signals. If S_P
S_N is a positive voltage, then S is received as the value 1; if S_P - S_N is a negative voltage, then S is received as 0.
For the assumption of common-mode noise induction to hold, differential signals must be routed along parallel paths on a PCB. While this might suggest a problem with crosstalk between the two traces, the fact that the signals are inverses of each other means that they both change at the same time, and crosstalk effects cancel out.
library ieee;
use ieee.std_logic_1164.all, ieee.numeric_std.all;
entity gumnut is
port (clk_i: in std_logic;
rst_i: in std_logic;
inst_cyc_o: out std_logic;
inst_stb_o: out std_logic;
inst_ack_i: in std_logic;
inst_adr_o: out unsigned(11 downto 0);
inst_dat_i: in std_logic_vector(17 downto 0);
data_cyc_o: out std_logic;
data_stb_o: out std_logic;
data_we_o: out std_logic;
data_ack_i: in std_logic;
data_adr_o: out unsigned(7 downto 0);
data_dat_o: out std_logic_vector(7 downto 0);
data_dat_i: in std_logic_vector(7 downto 0)
);
end entity gumnut;
Show how to include an instance of the Gumnut core in a VHDL model of an embedded system with a 2K ϫ 18-bit instruction memory and a 256 ϫ 8-bit data memory.
Solution The ports in the entity declaration can interface with the control signals of a flow-through SSRAM and a ROM implemented using FPGA SSRAM blocks, as described in Sections 5.2.2 and 5.2.5. In our architecture for our embedded system, we include the necessary signals to connect to an instance of the Gumnut entity, and use the signals in processes for the instruction and data memories. The architecture is
architecture rtl of embedded_gumnut is
type ROM_2Kx18 is array (0 to 2047) of std_logic_vector(17 downto 0);
constant instr_ROM : ROM_2Kx18 := ( ... );
type RAM_256x8 is array (0 to 255) of std_logic_vector(7 downto 0);
signal data_RAM : RAM_256x8;
signal clk : std_logic;
signal rst : std_logic;
signal inst_cyc_o : std_logic;
signal inst_stb_o : std_logic;
signal inst_ack_i : std_logic;
signal inst_adr_o : unsigned(11 downto 0);
signal inst_dat_i : std_logic_vector(17 downto 0);
signal data_cyc_o : std_logic;
signal data_stb_o : std_logic;
signal data_we_o : std_logic;
signal data_ack_i : std_logic;
signal data_adr_o : unsigned(7 downto 0);
signal data_dat_o : std_logic_vector(7 downto 0);
signal data_dat_i : std_logic_vector(7 downto 0);
begin
CPU : entity work.gumnut
port map (clk_i => clk,
rst_i => rst,
inst_cyc_o => inst_cyc_o,
inst_stb_o => inst_stb_o,
inst_ack_i => inst_ack_i,
inst_adr_o => inst_adr_o,
inst_dat_i => inst_dat_i,
data_cyc_o => data_cyc_o,
data_stb_o => data_stb_o,
data_we_o => data_we_o,
data_ack_i => data_ack_i,
data_adr_o => data_adr_o,
data_dat_o => data_dat_o,
data_dat_i => data_dat_i
);
IMem : process (clk) is
begin
if rising_edge(clk) then
if inst_cyc_o = '1' and inst_stb_o = '1' then
inst_dat_i <= instr_ROM(to_integer(inst_adr_o(10 downto 0)));
inst_ack_i <= '1';
else
inst_ack_i <= '0';
end if;
end if;
end process IMem;
DMem : process (clk) is
begin
if rising_edge(clk) then
if data_cyc_o = '1' and data_stb_o = '1' then
if data_we_o = '1' then
data_RAM(to_integer(data_adr_o)) <= data_dat_o;
data_dat_i <= data_dat_o;
data_ack_i <= '1';
else
data_dat_i <= data_RAM(to_integer(data_adr_o));
data_ack_i <= '1';
end if;
end if;
end if;
end process DMem;
end architecture rtl;
the safest approach when designing control for tristate buses is to include a margin of dead time between different data sources driving the bus. A conservative approach is to defer enabling the next driver until the clock cycle after that in which the previous driver is disabled.
not all implementation fabrics provide tristate drivers. For example, many FPGA devices do not provide tristate drivers for internal connections, and only provide them for external connections with other chips. If we want to design a circuit that can be implemented in different fabrics with minimal change, it is best to avoid tristate buses.
library ieee;
use ieee.std_logic_1164.all;
entity sensor_controller is
port (clk_i, rst_i : in std_logic;
cyc_i, stb_i : in std_logic;
ack_o : out std_logic;
dat_o : out std_logic_vector(7 downto 0);
int_req : out std_logic;
int_ack : in std_logic;
sensor_in : in std_logic_vector(7 downto 0) );
end entity sensor_controller;
architecture rtl of sensor_controller is
signal prev_data, current_data : std_logic_vector(7 downto 0);
signal current_int_req : std_logic;
begin
data_regs : process (clk_i) is
begin
if rising_edge(clk_i) then
if rst_i = '1' then
prev_data <= "00000000";
current_data <= "00000000";
else
prev_data <= current_data;
current_data <= sensor_in;
end if;
end if;
end process data_regs;
int_state : process (clk_i) is
begin
if rising_edge(clk_i) then
if rst_i = '1' then
current_int_req <= '0';
else
case current_int_req is
when '0' =>
if current_data /= prev_data then
current_int_req <= '1';
end if;
when others =>
if int_ack = '1' then
current_int_req <= '0';
end if;
end case;
end if;
end if;
end process int_state;
dat_o <= current_data;
int_req <= current_int_req;
ack_o <= cyc_i and stb_i;
end architecture rtl;
Example 8.16 Develop a VHDL model for a real-time clock controller for the Gumnut processor. The controller has a 10μs time base derived from a 50MHz system clock, and an 8-bit output register for the value to load into the counter. A write operation to the output register causes the counter to be loaded. After the counter reaches 0, it reloads the value from the output register and requests an interrupt. The controller has an input register for reading the current count value. The counter also has a 1-bit control output register. When bit 0 of the register is 0, interrupts from the controller are masked, and when it is 1, they are enabled. The counter has a status register, in which bit 0 is 1 when the counter has reached 0 and been reloaded, or 0 otherwise. Other bits of the register are read as 0. Reading the register has the side effect of acknowledging a requested interrupt and clearing bit 0. The counter output and input registers are located at the base port address, and the control and status registers are at offset 1 from the base port address.
Solution The entity declaration for the controller has ports for the I/O bus, and uses the stb_i port for the decoded base port address:
library ieee;
use ieee.std_logic_1164.all, ieee.numeric_std.all;
entity real_time_clock is
port (clk_i, rst_i : in std_logic; -- 50 MHz clock
cyc_i, stb_i, we_i : in std_logic;
ack_o : out std_logic;
adr_i : in std_logic;
dat_i : in unsigned(7 downto 0);
dat_o : out unsigned(7 downto 0);
int_req : out std_logic );
end entity real_time_clock;
architecture rtl of real_time_clock is
constant clk_freq : natural := 50000000;
constant timebase_freq : natural := 100000;
constant timebase_divisor : natural := clk_freq / timebase_freq;
signal count_value : unsigned(7 downto 0);
signal trigger_interrupt : std_logic;
signal int_enabled, int_triggered : std_logic;
begin
counter : process (clk_i) is
variable timebase_count : natural range 0 to timebase_divisor – 1;
variable count_start_value : unsigned(7 downto 0);
begin
if rising_edge(clk_i) then
if rst_i = '1' then
timebase_count := 0;
count_start_value := "00000000";
count_value <= "00000000";
trigger_interrupt <= '0';
elsif cyc_i = '1' and stb_i = '1' and adr_i = '0' and we_i = '1' then
timebase_count := 0;
count_start_value := dat_i;
count_value <= dat_i;
trigger_interrupt <= '0';
elsif timebase_count = timebase_divisor — 1 then
timebase_count := 0;
if count_value = "00000000" then
count_value <= count_start_value;
trigger_interrupt <= '1';
else
count_value <= count_value — 1;
trigger_interrupt <= '0';
end if;
else
timebase_count := timebase_count + 1;
trigger_interrupt <= '0';
end if;
end if;
end process counter;
control_reg : process (clk_i) is
begin
if rising_edge(clk_i) then
if rst_i = '1' then
int_enabled <= '0';
elsif cyc_i = '1' and stb_i = '1' and adr_i = '1' and we_i = '1' then
int_enabled <= dat_i(0);
end if;
end if;
end process control_reg;
int_reg : process (clk_i) is
begin
if rising_edge(clk_i) then
if rst_i = '1' or (cyc_i = '1' and stb_i = '1' and adr_i = '1' and we_i = '0') then
int_triggered <= '0';
elsif trigger_interrupt = '1' then
int_triggered <= '1';
end if;
end if;
end process int_reg;
dat_o <= count_value when adr_i = '0' else "0000000" & int_triggered;
int_req <= int_triggered and int_enabled;
ack_o <= cyc_i and stb_i;
end architecture rtl;
Resistive pull-ups are modeled in VHDL using the ‘H’ std_logic value.
OpenCores, www.opencores.org. From the website’s FAQ, “OpenCores is a loose collection of people who are interested in developing hardware, with a similar ethos to the free software movement.” The website hosts a repository of freely reusable core designs, many of which are compatible with the Wishbone bus.
There are two main schemes for implementing parallelism in accelerators. The first of these is simply to replicate components that perform a given step so that they operate on different elements of data. The speedup achieved through replication, compared to using just a single component, is ideally equal to the number of times the component is replicated. This scheme suits applications in which steps can be performed independently on the different data elements.
The second scheme for implementing parallelism is to break a larger computational step into a sequence of simpler steps, and to perform the sequence in a pipeline, as shown in Figure 9.1. The pipeline stages perform their simple steps in parallel, each operating on a different data element or an intermediate result produced by the preceding stages. The overall computation by the pipeline for a given data element takes approximately the same time as a nonpipelined chain of components. However, provided we can supply data to the pipeline input and accept data at the pipe- line output on every clock cycle, the pipeline completes one computation every cycle. Thus, the speedup compared to the nonpipelined chain is ideally equal to the number of stages. This scheme suits applications that involve complex processing steps that can be broken down into simpler sequences with each step depending only on the results of earlier steps.
we can have replicated pipelines, giving the benefit of both schemes.
“width:33%; margin:1%;”)}}
type state_type is (state1, state2, state3, state4);
signal current_state, next_state : state_type;
state_reg : process (clock) is
begin
if rising_edge(clock) then
if reset = '1' then
current_state <= initial-state;
else
current_state <= next_state;
end if;
end if;
end process state_reg;
next_state_logic : process (current_state,
input-1, input-2, ...) is
begin
case current_state is
when state1 =>
if condition-1 then
next_state <= state-value;
elsif condition-2 then
next_state <= state-value;
...
else
next_state <= state-value;
end if;
when state2 =>
...
end case;
end process next_state_logic;
output_logic : process (current_state, input-1, input-2, ...) is
begin
case current_state is
when state1 =>
moore-output-1 <= value; moore-output-2 <= value; ...
if condition-1 then
mealy-output-1 <= value; mealy-output-2 <= value; ...
elsif condition-2 then
mealy-output-1 <= value; mealy-output-2 <= value; ...
...
else
mealy-output-1 <= value; mealy-output-2 <= value; ...
end if;
when state2 =>
...
end case;
end process output_logic;
fsm_logic : process (current_state, input-1, input-2, ...) is
begin
case current_state is
when state1 =>
if condition-1 then
next_state <= state-value;
mealy-output-1 <= value; mealy-output-2 <= value; ...
elsif condition-2 then
next_state <= state-value;
mealy-output-1 <= value; mealy-output-2 <= value; ...
...
else
next_state <= state-value;
mealy-output-1 <= value; mealy-output-2 <= value; ...
end if;
moore-output-1 <= value; moore-output-2 <= value; ...
when state2 =>
...
end case;
end process fsm_logic;
process-name : process (clock) is
begin
if rising_edge(clock) then
if enable = '1' then
if write = '1' then
data_ram(to_integer(address)) <= data-in;
data-out <= data-in;
else
data-out <= data_ram(to_integer(address));
end if;
end if;
end if;
end process process-name;
type rom_type is array (0 to 128) of unsigned(11 downto 0);
constant data_rom : rom_type := (X"000", X"021", X"1B3", X"7C0", ...);
data-out <= data_rom(to_integer(address));
OR
process-name : process (clock) is
begin
if rising_edge(clock) then
if enable = '1' then
data-out <= data_rom(to_integer(address));
end if;
end if;
end process process-name;