[转载].Tips to Optimize your Verilog HDL code.[Verilog]

发表于 10/29/2010 10:21:15 AM 阅读（5014）

转载自：http://www.inno-logic.com/resourcesTips.html

RTL Coding Techniques

Assignments: If multiple assign statements targeting the same wire then synthesis tool will display an error that a net is driven by more than one source

assign out1 = in1 | in2;

assign out1 = in3 & in4;

Conditional assignments: A proper conditional assignment will infer multiplexer. If the conditional assignment is not completed then it will infer latch

Assign out1 = (sel == 1’b0)?in1: in2;

Multiple non-blocking assignments in procedural block: If there are multiple non-blocking assignments inside always block targeting the same register then the last assignment is synthesized.
Using non-blocking assignments inside always block: In non-blocking assignments, all registers are updated at the end of the block. In blocking assignments, the registers are updated immediately.

Non-blocking assignment example: Synthesize three D-Flip-Flop

always @(posedge clk) begin

reg1 <= in0;

reg2 <= reg1;

reg3 <= reg2;

end

Blocking assignment example: Synthesize One D-Flip-Flop

always @(posedge clk) begin

reg1 = in0;

reg2 = reg1;

reg3 = reg2;

end

Functions: Functions are primarily used for combinatorial logic since they do not have timing construct. If the call to the function are used in different paths then the logic get replicated. The returned value of the function can be synthesized into D-Flip-Flop if it is declared as “reg” while inferring the function

reg [2:0] out1 ;

always @(posedge clk) begin

Out1 <= logic_func (in1, in2, in3);

end

Tasks: You can use timing constructs such as @ inside tasks but only simulation tools support it. The synthesis tool will ignore all the timing constructs inside the task. So, it is advisable not to use timing construct inside the task and use it for the combinatorial logic. If a task is used for common paths then the logic is reused otherwise logic is replicated for different paths.

Sequential logic: The sequential logic is divided into two parts: latch and flip-flops. Latch is level sensitive and flip-flop is edge sensitive. Typically, incomplete multiplexer logic end up synthesizing into a latch. The following are the two examples of D Flip-flop and D-latch:

D-latch:

always @(d, clk) begin

IF (clk == 1’b1) q <= d;

end

In verilog-2001, the above code can be implemented as:

always @(*) begin

IF (clk == 1’b1) q <= d;

end

D-Flip-flop:

always @(posedge clk) begin

q <= d;

end

Pros and cons of latch and Flip-Flop:
- Latch takes less area, consume less power, facilitate time borrowing or cycle stealing, not friendly with DFT tools
- Flip-flop takes more area, consumes more power, allow synchronous logic, friendly with DFT tools

If-else and case: If-else statement generally synthesize priority encoding logic. Although system verilog allow you to control priority encoder logic.

Example: unique if (in1) sel = 2’b01;

Else if (in2) sel = 2’b11;

……

If you use priority construct, instead of unique in system verilog then you again get priority encoder synthesized

Case statement is easy to read and can be synthesized into parallel or priority encoder logic. It is always a good practice to specify all the conditions in case statement or use default statement at the end.

Difference between “==” and “===” operators: The “==” are synthesizable while “===” operators are not synthesizable. If either of the operand in “==” has x or z the the result is always x while “===” compare x and z too. The same is true for “!=” and “!==” operators.
Mealy and Moore state machine: In Mealy machine, output depends upon the input as well as the current state. In Moore machine, output only depends upon the current state. Both are frequently used by the designers depending upon the need.
Binary, one-hot and gray encoding: Binary encoding require fewer flip flops and usually multiple transitions at the same time. In one-hot encoding require the same number of flip-flops as the number of states. Only one bit change at a time and rest all are zero. In gray coding, only one bit change at a time, but rest of the bits can be one or zero. Gray coding is popularly used when interfacing between two different clock domains. One more the example is that dual clock FIFO uses gray coding to avoid any mismatch between the post-layout simulation and pre-layout simulation

General Reusable coding practices

Register all the outputs of critical design blocks
Avoid path that traces through number of hierarchies and then return back to same hierarchy
Partition the design based on functional goals and the clock domains
Avoid instantiating technology specific modules
Use parameters and declare them at the top with meaningful names
Avoid internally generated clocks and resets
Avoid glue logic at the top level

Guidelines for Multiple clock Design

Avoid meta-stability: When a asynchronous signal coming as input from another clock domain, it is always a good idea to synchronize it using two Flip-Flops. By having two Flip-flops, the second flip-flop always make sure to capture the stable data. Some vendors provide this module as 2-stage synchronizer.
Data transfer across two different clock domains: Use dual clock FIFO to transfer the data from one clock domain to another domain. Make sure the dual clock FIFO uses gray-coding counter. If the width of the data on both sides is not same then use asymmetrical FIFO. For example, using asymmetrical FIFO, you can transfer data from 32-bit wide bus of one one clock domain to 8-bit wide bus of another clock domain. If the data speed is very

Common mistakes made in RTL code

Module with input but no outputs: It will synthesize into no logic since there is no output.
Inferring latch: It is very common for synthesis tool to infer latch due to incomplete if-else statement. Also, incomplete case statement or missing default in case statement also generates latches. The designers must be very careful while write RTL code for if-else or case blocks.
Combinatorial timing loops: These loops are created when output of combinatorial logic or gate is fed back to its input making a timing loop. This kind of loops unnecessary increase the number of cycles by infinitely going around the circle in the same path. These loops also cause a problem in testability. Most of the lint tools can detect these loops much early in design phase.
Incomplete sensitivity list in always block: It is important to have complete sensitivity list in always block for combinatorial logic such as multiplexer. The verilog-2001, provide a very clean solution by just typing the following statement instead of typing all the inputs in the sensitivity list. This will ensure that, the latches are not inferred for combinatorial logic.
Always @ (*)

Design for Area

Proper use of ‘ifdef and ‘else: You can use ‘ifdef and ‘else to divide the code for small or larger area. This will give a choice to synthesis tool to optimize your design either for smaller area or for larger area. You need to provide the area option directing the synthesis tool to either synthesize for small area or large area.

‘ifdef MIN_AREA

…..

…..

‘else

…..

…..

‘endif

Constant Propagation: The synthesis tools optimize the logic when constant value is propagating to the output.
Using generate statement: Verilog 2001 generate statement allow to either instantiating multiple modules without typing them so many times or instantiating modules conditionally. You can use if-else to conditionally instantiate the modules. Also, if you want to instantiate the same module multiple times then better use for loop. This will save you lot of time.

generate for (i=0; i < width; I = i+1) begin

and_or inst1 (out1[i], in1[i], in2[i]);

end endgenerate

Design for Static Timing

Identifying critical path in early design phase: Understanding and identifying the critical path specially when the end point is a input to d-flip-flop, and it is violating the setup and hold time requirements. Shortening the critical paths helps you to improve the frequency and ultimately the performance of the design.
Proper partitioning of the design: There are multiple ways to partition the designs. These ways must be discussed much early during architecture phase. Logical partitioning is very common approach. Typically designs divide the design into datapath, control, IO and memories. It is better to register the outputs of different blocks. This will help to reduce the long combinatorial paths and provide easy testability. Other partitioning techniques are goal based (speed or area), clock domain based, or reset based partitioning. Since many designs today use multiple clocks, it advisable to have separate modules for different clock domains. It will make your job much easier.
Pipelining the design: The pipelining is a processing of optimally place D-flip-flops in between the combinatorial logic without affecting the logic behavior. Many synthesis tools provide the pipeline capability either as base line feature or as upgraded feature. This feature is very critical especially for data path designs.

Design for Testability

Factors affecting testability
- Presence of tri-state logic
- Gated clock for Flip-flop
- Internally generated clock and reset
- Reset derived by output of another flip-flop
- Presence of latches
Avoid tri-state bus: The synthesis tools do not like tri-state buses and there are not testable. If you have to use tri-sate buses then to ensure testability, pass the enable of the tri-state bus through AND gate so that scan_enable signal can control the tri-state bus.
Derived Reset: To avoid losing any data in scan mode, add “OR” gate before the signal reach the reset of next Flip-flip. Add “scan_enable” signal to the other input of the “OR” gate. In test mode, asserting “sacn_enable” make sure that the asynchronous reset is disabled.
Derived clock: Use multiplexer logic at the output of the derived clock before it fed to the input of anoth flip-flip. Make the other input of mux as the primary clock and the select line as “scan_enable”. This will make sure that the primary clock is used during testability.
Gated clock: It is a very common practice for power sensitive designs. You basically disable the complete block by simply disabling the input clock by using AND gate. To ensure the design is testable, add OR gate after the AND gate and add scan_enable as another input to the OR gate in addition to the output to the AND gate.
Presence of latches: Since enable signal to latch is not a regular clock that is fed to the rest of the logic. To ensure testability, you need to use OR gate using “enable” and “scan_enable” signals as input and fed the output to the enable port of the latch.

Design for Low Power

The power is divided into two parts: static power and dynamic power. The static power is the result of leakage current in CMOS circuit which is directly related to process technology. The dynamic power is controlled by the switching activity inside the block. One way we can drastically reduce power is by optimizing the switching activity. This can be achieved by clock gating, optimizing the area, run high frequency signals through few hierarchies, only use as many flip-flop as required to store data, gating the input signals, choose gray coding instead of binary coding, use multiplexer instead of tri-state bus.
Other techniques that can also help you to reduce power consumption are to reduce the operating voltage, reduce operating frequency if possible, running I/O voltage different form core voltage, have shorter routes for power and timing sensitive logic, and reduce excessive loading.

Design for Verification

Messaging is a nice concept of displaying useful information during simulation. These messages can be used to convey wide range of severity levels.
Considerations for implementing messaging
- Identify the level of message in terms of severity
- Controllability of the message
- Timestamp of the message
- Message Identification
Different kind of severity levels
- INFO: Message is simply an information
- WARNING: Something unusual happen but no need to stop simulation
- ERROR: Indicates that something is wrong and you need to debug and find out the root cause and fix it
- FATAL: Displayed when there is a serious issue. Simulation will immediately terminate after displaying this message

What is BFM?
- Provides visibility into its communication processes at each level of abstraction
- Visibility into all configurable parameters
- Commanded to perform specified sequence of commands
Consideration for designing BFM
- Map the hierarchy of protocol or functionality into BFM
- Provide a built-in self check capability
- Specify all the key variables in one single file
- Ensure all configurable variables must have defaults
- Must provide a provision to filter the messages
- All the inputs to the tasks/commands within the BFM should be checked for legal ranges.
- Must provide a provision for message ID
Typical flow for designing BFM
- Specify the abstraction level at which the BFM is planned to be used
- Specify user level configuration parameters
- Specify the hierarchy of commands when they are functionally dependent on hierarchical fashion
- Specify the commands that user will be able to call within the BFM
- Specify the details of the message that BFM should convey
- Specify the interface ports that the BFM use to interact with the DFT
- Make sure to display the value of parameters before the simulation starts
- Provide a provision to log the BFM messages into a file or standard output
- Provide a checking mechanism to check the legal range of all the parameters and input value

Main functions of a bus monitor
- Protocol checking
  - Violation of protocol
  - Monitoring for X and Z values on signals
  - Design latencies between critical signals
  - Monitor timing violations such as setup and hold
- Transaction logging
  - Log the message with severity level, ID, and description
  - Provide the option to display them as standard output or in a file

Feedback

If you have any suggestion/feedback please email it to feedback@inno-logic.com

« 上一篇：[原创].怎样定制SRAM的Avalon接口IP，以供Nios II使用.[Memory][Nios II][Quartus II][SOPC Builder]

» 下一篇：[笔记].学习TimeQuest必看资料

安德鲁

[转载].Tips to Optimize your Verilog HDL code.[Verilog]

RTL Coding Techniques

General Reusable coding practices

Guidelines for Multiple clock Design

Common mistakes made in RTL code

Design for Static Timing

Design for Testability

Design for Low Power

Design for Verification

Feedback

作者

标签

安德鲁

[转载].Tips to Optimize your Verilog HDL code.[Verilog]

RTL Coding Techniques

General Reusable coding practices

Guidelines for Multiple clock Design

Common mistakes made in RTL code

Design for Static Timing

Design for Testability

Design for Low Power

Design for Verification

Feedback

作者

标签

关注微信公众号