Logic Synthesis
1.1 Introduction to Synthesis
assign y = a & b; is RTL for an AND gate.AND2_X4 U101 (.A(net1), .B(net2), .Y(net3)); — a specific AND gate with specific connections.Technology library (.lib / .db) — available cells and their properties
SDC constraints (.sdc) — timing requirements (clock, I/O delays)
UPF/CPF (optional) — low-power intent file
2. Converts logic to technology-independent gates (Generic Mapping)
3. Maps those to real cells from your library (Technology Mapping)
4. Optimizes: minimize delay on critical paths, reduce area, lower power (Optimization)
Mapped SDC (.sdc) — constraints ready for PD tool
Reports — timing (WNS/TNS), area (mm²), power (mW), DRC violations
DDC database — for incremental re-runs
1.2 Detailed Synthesis Flow
The synthesis flow transforms RTL into an optimized gate-level netlist through several distinct stages, each with specific goals and transformations.
Step 2 — Generic Mapping: Converts your design into a technology-independent intermediate form using GTECH (generic) gates — simple AND/OR/NOT/FF operations with no size or speed information yet. Boolean optimization happens here: constant propagation, dead code removal, logic simplification.
Step 3 — Technology Mapping: Now the tool looks at your target library (.lib file) and replaces each generic gate with an actual physical cell from that library. An AND2 becomes AND2_X4 (4x drive strength), a flip-flop becomes DFF_X1. This is where cell selection decisions are made.
Step 4 — Optimization: Iteratively improve the design. Fix timing violations by upsizing cells or restructuring logic. Reduce area by downsizing non-critical cells. Insert clock gating to save power. This phase runs many passes until WNS/TNS meets your target.
1.3 Synopsys Design Compiler (DC)
Design Compiler is the industry-standard synthesis tool from Synopsys. It supports hierarchical synthesis, compile strategies, and advanced optimization for timing, area, and power.
| Command | Purpose | Key Options |
|---|---|---|
| read_verilog | Read RTL source files | -sv (SystemVerilog), file list |
| elaborate | Build design hierarchy | -parameters, -lib_work |
| link | Resolve all design references | Must be called after elaborate |
| compile_ultra | Full compile with all optimizations | -no_autoungroup, -timing_high_effort |
| compile | Basic compile | -map_effort [low/med/high], -incremental |
| report_timing | Timing path reports | -max_paths N, -slack_lesser_than 0 |
| report_area | Area statistics | -hier (hierarchical breakdown) |
| report_power | Power analysis | -analysis_effort high |
| write_file | Output gate-level netlist | -format verilog -hierarchy -output |
| write_sdc | Write constraints file | -version 2.0 |
| set_dont_touch | Protect cells from optimization | Apply to specific instances |
| check_timing | Validate timing constraints | Reports unconstrained paths |
## ========================================================= ## DC Synthesis Script — sample_chip.tcl ## Project: sample_chip | Author: VLSI Engineer ## ========================================================= ## 1. Setup target/link libraries set target_library "saed32nm_tt1p05v25c.db" set link_library "* $target_library" set symbol_library "saed32nm.sdb" ## 2. Read RTL sources read_verilog -sv "../rtl/top.v ../rtl/core.v ../rtl/alu.v" ## 3. Elaborate and link design elaborate sample_chip link check_design ## 4. Apply timing constraints read_sdc "../constraints/sample_chip.sdc" ## 5. Set operating conditions set_operating_conditions "tt1p05v25c" ## 6. Compile with high effort compile_ultra -no_autoungroup -timing_high_effort_script ## 7. Reports report_timing -max_paths 10 -slack_lesser_than 0 -nosplit > rpt/timing.rpt report_area -hier > rpt/area.rpt report_power -analysis_effort high > rpt/power.rpt report_qor > rpt/qor.rpt ## 8. Write outputs write_file -format verilog -hierarchy -output "out/sample_chip_netlist.v" write_sdc -version 2.0 "out/sample_chip_mapped.sdc" write_file -format ddc -hierarchy -output "out/sample_chip.ddc" puts "=== Synthesis Complete ==="
## ========================================================= ## SDC Constraint File — sample_chip.sdc ## ========================================================= ## Clock definition create_clock -name CLK -period 5.0 -waveform {0 2.5} [get_ports clk] ## Clock uncertainty (jitter + skew) set_clock_uncertainty -setup 0.15 [get_clocks CLK] set_clock_uncertainty -hold 0.05 [get_clocks CLK] ## Clock transition set_clock_transition 0.1 [get_clocks CLK] ## Input delays (relative to CLK edge) set_input_delay -max 1.5 -clock CLK [get_ports data_in*] set_input_delay -min 0.2 -clock CLK [get_ports data_in*] ## Output delays set_output_delay -max 1.2 -clock CLK [get_ports data_out*] set_output_delay -min 0.1 -clock CLK [get_ports data_out*] ## Drive strength and load set_driving_cell -lib_cell BUFX4 [get_ports data_in*] set_load 0.05 [get_ports data_out*] ## False paths (async reset, test ports) set_false_path -from [get_ports rst_n] set_false_path -from [get_ports scan_en] ## Multicycle path (2-cycle computation) set_multicycle_path 2 -setup -from [get_cells mult_inst*] set_multicycle_path 1 -hold -from [get_cells mult_inst*] ## Max capacitance / transition constraints set_max_capacitance 0.2 [current_design] set_max_transition 0.4 [current_design]
1.4 Cadence Genus
Genus is Cadence's modern synthesis solution featuring concurrent optimization and a unified data model with Innovus for seamless handoff.
| Command | Purpose |
|---|---|
| read_hdl -language sv | Read SystemVerilog/Verilog/VHDL sources |
| elaborate | Elaborate and link design hierarchy |
| read_mmmc | Read multi-mode multi-corner view definition |
| syn_generic | Generic synthesis (technology-independent) |
| syn_map | Technology mapping to library cells |
| syn_opt | Incremental optimization (timing/area) |
| report timing | Report worst timing paths |
| report area | Report cell count and area |
| report power | Dynamic and leakage power |
| write_hdl | Write gate-level netlist |
| write_sdc | Write timing constraints |
| Feature | Synopsys DC | Cadence Genus |
|---|---|---|
| Vendor | Synopsys | Cadence |
| Script Language | TCL (dc_shell) | TCL / Innovus-compatible |
| Compile Command | compile_ultra | syn_opt |
| MMMC Support | Via scenario objects | Native via read_mmmc |
| PD Integration | ICC2 (write_icc2) | Innovus (write_db) |
| Physical Guidance | DC Topological | Physical Guidance Mode |
| Industry Usage | Dominant | Growing |
1.5 Timing Constraints (SDC)
This diagram shows the complete setup timing budget for a register-to-register path through an I/O port. All SDC constraint values map directly to regions on the waveform. The available combinational logic window = Period − input_delay − output_delay − setup_margin.
set_input_delay -max 1.5 -clock CLK [get_ports data_in*]This tells the tool: upstream logic takes 1.5 ns of the clock period before data is valid at our input port. This is NOT a constraint we impose — it's a description of the external world. The tool uses it to compute the remaining time budget for our internal combinational logic. Tighter input delay = less margin for your combo path.
set_output_delay -max 1.2 -clock CLK [get_ports data_out*]This says: the downstream chip needs our output to be valid 1.2 ns before its capture clock edge. The tool reserves this time from the end of the period. Together: available combo window = Period − input_delay − output_delay = 5.0 − 1.5 − 1.2 = 2.3 ns (before accounting for FF setup time and uncertainty).
| SDC Command | Analysis Type | What It Models | Example |
|---|---|---|---|
| create_clock | Both | Defines clock signal: period, waveform shape, source pin. Foundation of all timing analysis. | create_clock -period 5 -waveform {0 2.5} [get_ports clk] |
| create_generated_clock | Both | Clock derived from master clock (PLL output, divider). Must be declared for STA to analyze crossing paths. | create_generated_clock -divide_by 2 -source clk [get_pins div_reg/Q] |
| set_clock_uncertainty | Both | Models jitter + skew + margin. Setup reduces required time. Hold adds to minimum required time. | set_clock_uncertainty -setup 0.15 -hold 0.05 [get_clocks CLK] |
| set_clock_transition | Both | Models clock rise/fall slew at the source. Affects clock cell delays in tree analysis. | set_clock_transition 0.08 [get_clocks CLK] |
| set_input_delay -max | Setup | Latest external data arrival relative to clock. Reduces time budget for internal combo logic. | set_input_delay -max 1.5 -clock CLK [get_ports din*] |
| set_input_delay -min | Hold | Earliest external data arrival. Used for hold analysis. Without -min, hold on input paths is unconstrained. | set_input_delay -min 0.2 -clock CLK [get_ports din*] |
| set_output_delay -max | Setup | Time before next clock edge downstream chip needs our output stable. Eats into our combo budget. | set_output_delay -max 1.2 -clock CLK [get_ports dout*] |
| set_output_delay -min | Hold | Minimum time downstream chip needs output stable after our clock. Constrains minimum combo path. | set_output_delay -min 0.1 -clock CLK [get_ports dout*] |
| set_false_path | Disable | Removes path from STA entirely. For async resets, test ports, clock MUX select pins — paths never race functionally. | set_false_path -from [get_ports rst_n] |
| set_multicycle_path | Both | Allows N-cycle propagation. ALWAYS pair setup with hold correction (N-1). Missing hold fix → hold violations. | set_multicycle_path 2 -setup -from [get_cells mul*] set_multicycle_path 1 -hold -from [get_cells mul*] |
| set_clock_groups | Async | Tells STA not to analyze paths between unrelated clocks. Essential for correct CDC handling in STA. | set_clock_groups -asynchronous -group {CLK_A} -group {CLK_B} |
| set_driving_cell | Setup | Models external driver strength at input ports. Without this, input transitions are ideal (zero-resistance). Affects input timing accuracy. | set_driving_cell -lib_cell BUFX4 [get_ports din*] |
| set_load | Setup | Models output port capacitive load (downstream PCB trace, other chip input). Affects output transition and delay. | set_load 0.05 [get_ports dout*] |
set_clock_uncertainty -setup 0.15 to model total skew + jitter. This is pessimistic because skew is unknown.Post-CTS: Switch to
set_propagated_clock [all_clocks] in PrimeTime. The tool computes actual clock latencies through the synthesized clock tree. Only jitter uncertainty remains (typically 0.05–0.08 ns). This recovers significant timing margin — often 100–200 ps — that was previously modeled as skew pessimism.
1.6 Optimization Techniques
compile -map_effort high and set_max_area 0.compile_ultra -timing_high_effort_script.| Technique | Description | Benefit |
|---|---|---|
| Retiming | Move registers across combinational logic to balance pipeline stages | Timing |
| Constant Propagation | Replace signals that are always 0/1 with constants; simplify downstream logic | Area |
| Logic Restructuring | Rearrange tree structures (AND/OR) to reduce critical path depth | Timing |
| Ungroup Hierarchy | Flatten sub-modules to enable cross-boundary optimization | Timing |
| Path Grouping | Group critical paths for prioritized optimization effort | Timing |
| Multi-Vt Assignment | Use HVT cells in non-critical paths, LVT on critical paths | Power+Timing |
1.7 Quality of Results (QoR)
QoR is the overall measure of synthesis success across all objectives: timing, area, power, and design rule compliance. A good synthesis engineer tracks all four simultaneously — improving one often hurts another.
report_qor and see numbers like WNS = −0.28ns, TNS = −15.4ns. Here is what that means:WNS (Worst Negative Slack) = the single worst timing path in the design. −0.28ns means the most critical path is 0.28 nanoseconds too slow — data arrives 0.28ns after it needs to. This is the path you fix first.
TNS (Total Negative Slack) = the sum of all negative slacks across all violating paths. −15.4ns means if you added up all the violations, the total shortfall is 15.4ns of work to fix. A large TNS with a small WNS means many paths are slightly violated (broad problem). A large WNS with small TNS means one very bad path (focused problem).
Goal: WNS ≥ 0 AND TNS = 0 — every single timing endpoint must pass. Even one path at −0.001ns is a failure at sign-off.
| Issue | Symptom | Fix | Priority |
|---|---|---|---|
| Setup violations | WNS < 0 | Upsize cells, remove logic levels, add pipeline stage | P0 |
| Hold violations | WHS < 0 | Insert delay buffers on short paths | P0 |
| High leakage power | report_power shows high static | Replace LVT with HVT on non-critical paths | P1 |
| High dynamic power | Switching activity high | Enable clock gating, operand isolation | P1 |
| DRC violations | Max cap/trans violations | Buffer high-fanout nets, fix transitions | P0 |
| Large area | Area > target | set_max_area 0, use higher Vt cells | P2 |
Physical Design
2.1 Introduction to Physical Design
Physical Design converts a synthesized gate-level netlist into a manufacturing-ready GDS layout, determining the physical placement, power, clock, and routing of all cells.
2.2 Floorplanning
• A signal pad — for data inputs/outputs (bidirectional, input-only, or output-only)
• A power pad — for VDD (positive supply) and VSS (ground). Multiple VDD/VSS pads are used because each pad can only carry limited current.
In your SDC file,
get_ports refers to these pad signals. The set_input_delay and set_output_delay constraints model the timing from/to these pads.Think of it like a room (core) inside a building (die). The walls of the building hold the doors (I/O pads). The room is where all the furniture (logic) goes. The gap between the room walls and the building walls is used for corridors (power rings, routing channels) — this gap is called the core-to-IO margin (typically 20–50 µm).
• A fixed height (e.g., 12-track height in 7nm) — all cells in the same technology have the same height
• A variable width depending on complexity (a 4-input NAND is wider than a 2-input AND)
• VDD and VSS rails running along the top and bottom edges
Because all standard cells have the same height, they can be placed in rows like books on a shelf. The synthesis tool converts your Verilog RTL into thousands of standard cell instances. The PD tool then physically places them in the rows.
• SRAM — On-chip memory. Your CPU's cache or register file. Designed by memory compilers with optimized bit-cell layout.
• PLL (Phase-Locked Loop) — Clock generator circuit. Analog design, cannot be synthesized.
• ROM — Read-only memory for boot code or lookup tables.
• Analog IP — ADC, DAC, SerDes PHY — all analog, all hard macros.
Hard macros are placed first during floorplanning, before any standard cells. Their position determines how efficiently the remaining logic can be placed and routed.
Why? Because the macro's internal structure needs routing access around its edges (for signal and power connections). If standard cells are placed right up against the macro wall, the router has no room to route those connections — creating a routing deadlock.
It's like leaving a sidewalk around a building so people can walk to the entrance — if you park cars right up to the walls, nobody can get in.
Utilization = (Total std cell area) / (Core area) × 100%Why not 100%? Because you need space for:
• Routing channels (wires between cells)
• Clock buffers and power supply cells inserted during PD
• Filler cells and decap cells
• Spare cells for post-silicon ECO
Rule of thumb: 60–75% utilization is the sweet spot. Below 60% = die is wastefully large (costs more money per chip). Above 80% = routing becomes extremely congested and timing closure becomes very difficult or impossible.
Most designs target 1:1 (square) because it minimizes average wire length (which minimizes delay and power). Non-square shapes are used when:
• I/O pad constraints require a specific shape (e.g., a chip with many memory interfaces on one side)
• Large hard macros naturally push the aspect ratio
• The package dictates the die shape
Extreme aspect ratios (e.g., 3:1 — very tall and thin) cause problems: clock distribution becomes unbalanced, wire lengths increase, and some areas become routability bottlenecks.
Step 1: Core Area = 4.8 / 0.70 = 6.86 mm²
Step 2: For AR = 1.0 (square): Width = Height = √6.86 = 2.62 mm × 2.62 mm
Step 3: Add core-to-IO margin (say 40 µm each side): Die = (2.62 + 0.08) × (2.62 + 0.08) = 2.70 mm × 2.70 mm
Step 4: Verify: Do the hard macros (SRAMs) fit? If SRAM is 1.2 mm × 0.8 mm + halo, it needs ~1.25 mm × 0.85 mm footprint — this fits in the 2.62mm core.
| Parameter | Typical Value | What It Means | If You Get It Wrong |
|---|---|---|---|
| Core Utilization | 60–75% | Percentage of core area filled with standard cell logic. The rest is routing space + buffers. | >80%: router can't fit all wires → routing overflow → unrouteable design. <50%: die is larger than needed → higher cost per chip. |
| Aspect Ratio | 1:1 (square) | Core height divided by core width. 1.0 = perfect square. Controls the shape of the chip. | Extreme ratios (3:1) make clock distribution and power delivery much harder. I/O pad count may also force non-square shapes. |
| Core-to-IO margin | 20–50 µm | The gap between the outer edge of the core and the inner edge of the I/O pad ring. Used for power rings (VDD/VSS) and routing channels to connect pads to core logic. | Too narrow: power rings don't fit, I/O connections cannot be routed. Too wide: wastes die area. |
| Macro halo | 2–5 µm | The empty forbidden zone around each hard macro where NO standard cells are placed. Required to leave room for the macro's own routing connections. | Without halo: standard cells crowd the macro edges → router cannot access macro pins → open circuits in the layout (LVS failures). |
2.3 Power Planning
2.4 Placement
| Stage | Description | Key Metric |
|---|---|---|
| Global Placement | Distributes cells across core to minimize total wirelength. Cells may overlap temporarily. | HPWL (half-perimeter wire length) |
| Legalization | Moves cells to legal rows, removes overlaps, snaps to row grid. | Cell displacement from global |
| Detailed Placement | Local cell swaps and moves to improve timing and routing. | WNS improvement |
| Congestion Reduction | Spread cells in congested areas, use placement blockages. | Routing overflow % |
2.5 Clock Tree Synthesis (CTS)
set_clock_uncertainty. Post-CTS, skew is captured by propagated clock latencies, so only jitter+guardband remain.| Command | Tool | Purpose |
|---|---|---|
| ccopt_design | Innovus | Run CTS with concurrent optimization |
| set_ccopt_property | Innovus | Set CTS target skew, latency targets |
| clock_opt | ICC2 | Run clock tree optimization |
| set_clock_tree_options | ICC2 | Configure CTS parameters |
| report_clock_tree | Both | Report skew, latency, buffer count |
2.6 Routing
| DRC Rule | Definition | Violation Impact |
|---|---|---|
| Spacing | Minimum distance between same-metal parallel wires | Short circuit risk, manufacturing defects |
| Width | Minimum wire width per metal layer | Higher resistance → IR drop, EM failure |
| Via enclosure | Metal must extend beyond via by minimum amount | Broken via connection on manufacturing variation |
| Antenna | Limits ratio of metal area to gate oxide area | Gate oxide damage during plasma etch |
| Density | Min/max metal fill requirements per layer | CMP non-uniformity → dishing/erosion |
2.7 Physical Verification
2.8 PD Tool Knowledge
| Command | Purpose |
|---|---|
| read_db | Import design from Genus (unified data model) |
| init_design | Initialize design with LEF/DEF/SDC |
| floorPlan | Define die/core size and utilization |
| addRing / addStripe | Create power rings and stripes |
| place_design | Run global and detailed placement |
| ccopt_design | Concurrent CTS and optimization |
| routeDesign | Global + detailed routing |
| extractRC | Parasitic extraction (RC) |
| timeDesign | In-tool timing analysis |
| streamOut | Generate GDS II for tape-out |
| Command | Purpose |
|---|---|
| open_lib / open_block | Open design library and block |
| initialize_floorplan | Set die, core area, utilization |
| create_net_shape | Create power network shapes |
| connect_pg_net | Connect power/ground nets |
| place_opt | Placement with optimization |
| clock_opt | CTS with timing optimization |
| route_auto | Automatic global + detail routing |
| route_opt | Post-route optimization |
| write_gds | Output GDS stream |
| report_design | Design statistics and QoR |
| Feature | Cadence Innovus | Synopsys ICC2 |
|---|---|---|
| Synthesis Handoff | Genus (write_db → read_db) | DC (write_icc2) |
| CTS Command | ccopt_design | clock_opt |
| Script Format | TCL / Encounter-style | TCL / IC Compiler style |
| STA Integration | Tempus (native) | PrimeTime (GoldRoute) |
| EM/IR Analysis | Voltus | StarRC + RedHawk |
| DRC/LVS | Calibre in-design | Calibre / IC Validator |
| Market Position | Strong | Strong |
Static Timing Analysis
3.1 Introduction to STA
always @(posedge clk) blocks to flip-flop cells from the library.3.2 Timing Path Types
STA analyzes 4 fundamental path types in digital circuits. Every timing path has a startpoint (port or FF clock pin) and endpoint (FF data pin or output port).
3.3 Setup & Hold Slack Analysis
=========================================================== Path Type : max (Setup) Point Incr Path =========================================================== --- Input Port --- clock CLK (rise edge) 0.000 0.000 clock network delay (ideal) 0.500 0.500 FF1/CK 0.000 0.500 r --- Data Path (Launch) --- FF1/Q (DFF_X2/Q) 0.120 0.620 r U101/Y (AND2_X4/Y) 0.085 0.705 r U102/Y (OAI21_X2/Y) 0.110 0.815 f U103/Y (INV_X4/Y) 0.062 0.877 r U104/Y (BUF_X8/Y) 0.075 0.952 r FF2/D 0.000 0.952 r data arrival time 0.952 --- Capture Edge --- clock CLK (rise edge) 5.000 5.000 clock network delay (propagated) 0.510 5.510 FF2/CK 0.000 5.510 r library setup time -0.085 5.425 data required time 5.425 ----------------------------------------------------------- data required time 5.425 data arrival time -0.952 ----------------------------------------------------------- slack (MET) 4.473 ===========================================================
3.4 Clock Domain Crossing (CDC)
CDC occurs when a signal crosses from one clock domain to another. This creates a risk of metastability — the output of a flip-flop remains at an indeterminate voltage level for an unpredictable time if setup/hold requirements are violated during the crossing.
| CDC Violation Type | Description | Fix |
|---|---|---|
| Single-bit crossing (no sync) | Flip-flop driven by different clock without synchronizer | Add 2-FF synchronizer |
| Multi-bit bus crossing | Multiple bits cross independently — may sample incoherent values | Use gray code, handshake, async FIFO |
| Fast-to-slow domain | Source clock faster; receiving domain may miss pulses | Pulse stretcher + synchronizer |
| Reconvergence | Two paths from different domains merge — non-deterministic glitch | Re-synchronize before combining |
3.5 On-Chip Variation (OCV) & AOCV
Real silicon has spatial and temporal variation in process, voltage, and temperature (PVT). OCV models capture that cells on the same die can behave differently from each other.
TT — Typical-Typical. Nominal design point.
SS — Slow NMOS, Slow PMOS. Worst-case for setup timing.
| Method | Description | Accuracy | Pessimism |
|---|---|---|---|
| OCV (flat derating) | Apply fixed derate to all paths equally | Medium | High |
| AOCV (Advanced) | Derate based on depth (number of cells in path). Longer paths have more statistical averaging → less pessimism | High | Medium |
| POCV (Parametric) | Full statistical model using σ distributions for each cell. Most accurate | Highest | Low |
3.6 Multi-Mode Multi-Corner (MMMC)
Modern designs must meet timing across multiple operating modes (functional, scan, standby) AND multiple PVT corners simultaneously. MMMC analysis runs all combinations in one pass.
| Corner Name | Process | Voltage | Temp | Analysis Type | Purpose |
|---|---|---|---|---|---|
| func_slow | SS | 0.9V | 125°C | Setup | Worst-case functional timing (setup closure) |
| func_fast | FF | 1.1V | -40°C | Hold | Worst-case hold (fast paths cause hold violations) |
| func_typical | TT | 1.0V | 25°C | Both | Nominal analysis for power estimation |
| scan_slow | SS | 0.9V | 25°C | Setup | Scan shift timing at slow corner |
| hold_fast | FF | 1.2V | -40°C | Hold | Extreme hold analysis for ECO coverage |
3.7 Synopsys PrimeTime
PrimeTime (PT) is the industry-standard sign-off STA tool. It uses accurate parasitic data (SPEF) from the extracted layout for final timing certification.
| Command | Purpose |
|---|---|
| read_netlist | Read gate-level netlist from PD tool |
| read_sdc | Apply timing constraints (SDC) |
| read_parasitics | Load extracted parasitics (SPEF file) |
| set_operating_conditions | Set PVT corner for analysis |
| update_timing | Propagate timing through all paths |
| report_timing | Print timing paths (worst paths) |
| report_constraint | Report all violated constraints |
| check_timing | Validate constraint coverage (unconstrained paths) |
| report_global_timing | Summary: WNS, TNS, WHS, THS |
| pt_shell -file | Run PrimeTime in batch mode |
## PrimeTime Sign-off Script set_app_var search_path [". /tech/saed32nm/db"] set_app_var target_library "saed32nm_ss0p9v125c.db" set_app_var link_library "* $target_library" ## Read design read_netlist "./out/chip_final.v" link_design chip_top ## Constraints and parasitics read_sdc "./out/chip_final.sdc" read_parasitics -format spef "./out/chip.spef" ## PVT corner set_operating_conditions "ss0p9v125c" ## Enable OCV derating set_timing_derate -late 1.05 -cell_delay set_timing_derate -early 0.95 -cell_delay ## Update timing update_timing -full ## Reports report_timing -max_paths 20 -slack_lesser_than 0 > rpt/vio_setup.rpt report_timing -delay min -max_paths 20 -slack_lesser_than 0 > rpt/vio_hold.rpt report_constraint -all_violators > rpt/all_vio.rpt report_global_timing -significant_digits 3 > rpt/global.rpt check_timing > rpt/check.rpt
3.8 Cadence Tempus
| Feature | Synopsys PrimeTime | Cadence Tempus |
|---|---|---|
| Industry Status | Gold Standard Sign-off | Challenger / Growing |
| MMMC | Via scenario manager | Native MMMC (view definitions) |
| ECO Flow | PT-ECO + write_changes | Native ECO (eco_opt_design) |
| Innovus Integration | Via StarRC/Signoff | Seamless (same data model) |
| POCV Support | Yes (POCV derating) | Yes (SOCV) |
| Primary Use | Sign-off timing | In-design + sign-off |
3.9 Timing Closure Techniques
| Technique | Method |
|---|---|
| Cell Upsizing | Replace slow cell with larger drive strength version (X4 → X8) |
| Buffer Insertion | Split long wire into shorter segments with buffers |
| Logic Restructuring | Reduce logic depth on critical path by rearranging gate tree |
| Floorplan Change | Move source/sink cells closer to reduce wire delay |
| Retiming | Move registers to balance pipeline stages |
| Frequency Reduction | Last resort: lower clock frequency (increase period) |
| Technique | Method |
|---|---|
| Buffer Insertion | Insert delay buffers (delay cells) on short paths to add delay |
| Cell Downsizing | Replace fast (LVT) cell with slower (HVT) version |
| Wire Stretching | Make path wire longer to add RC delay |
| Clock Skewing | Intentionally skew clock to give more hold margin |
Interview Prep & Quick Reference
Synthesis Interview Questions (Top 30)
Inputs: RTL code (Verilog/VHDL/SystemVerilog), technology library (.lib/.db), timing constraints (.sdc), design rules.
Outputs: Gate-level netlist (.v), mapped SDC, timing/area/power reports, DDC database.
compile_ultra: Advanced optimization including retiming, adaptive body biasing, path-based analysis. Enables -no_autoungroup (prevents flattening) and -timing_high_effort_script. Significantly better QoR at the cost of longer runtime. Used in production flows.
- Cell delay tables (input transition vs output load)
- Setup/hold times for sequential cells
- Leakage and dynamic power values
- Area in technology units
- Pin capacitances, max fanout, max transition limits
- Function description (Boolean)
Examples:
- Asynchronous reset/set ports:
set_false_path -from [get_ports rst_n] - Scan test mode paths (active only during test, not functional operation)
- Paths between mutually exclusive clocks that never switch simultaneously
- Configuration pins written once at startup
Example: A multiplier that takes 2 clock cycles:
set_multicycle_path 2 -setup -from [get_cells mult_inst/reg*]set_multicycle_path 1 -hold -from [get_cells mult_inst/reg*]The -hold must be explicitly set to (N-1) to avoid hold violations introduced by the relaxed setup. Failure to set hold correction is a very common bug.
Implementation: Synthesis tools insert ICG (Integrated Clock Gating) cells which are AND/OR-latch combinations that suppress the clock edge cleanly without glitches. Reduces dynamic power by 20–40% in typical designs.
TNS (Total Negative Slack): Sum of all negative slacks across all endpoints. Indicates the total amount of timing work needed. WNS=0 but TNS<0 means many marginal paths.
WHS (Worst Hold Slack): The most negative hold slack. Indicates the worst hold violation. Must also be ≥ 0. Fixed by inserting delay buffers on short paths.
Example: If Stage 1 has 3ns of logic and Stage 2 has 1ns, retiming moves a register to equalize ~2ns each, doubling achievable frequency. The tool handles the mathematical transformation automatically. Enabled via
compile_ultra in DC.- Module hierarchy is constructed
- Parameters/generics are resolved to constants
- FSMs are identified and optionally encoded
- Registers, memories, operators (+, *, >>) are mapped to GTECH primitives
- Design rule checks (unconnected ports, latches vs FFs) are performed
check_design command after elaboration reports any issues.set_dont_touch prevents DC from optimizing, resizing, or removing a specific cell or net. Use cases:- Protect manually sized critical cells from being downsized
- Preserve specific clock buffers needed for DFT
- Protect cells needed for post-silicon debug/observation points
- Guard hand-placed analog boundary interface cells
link_library: Libraries used to RESOLVE module references during linking. Includes "*" (current design) + all .db files. Needed so DC can find instantiated sub-modules and external IPs. A cell can be in link_library but not target_library — it gets resolved but DC won't use it for new cells.
ungroup flattens a sub-module into its parent, removing the hierarchical boundary. This allows DC to optimize logic across that boundary (e.g., constant propagation from parent into child, logic sharing between siblings).Use when: Sub-module boundaries prevent critical optimization. In
compile_ultra, the -no_autoungroup flag disables DC's automatic ungrouping. Manual ungrouping is done before compile: ungroup -all -flatten. Tradeoff: loses hierarchy for debug and incremental compile benefits.In synthesis: After compile,
insert_dft and preview_dft commands handle scan. The SDC must set false paths on scan paths (set_false_path -from [get_ports scan_en]). Scan adds ~5–10% area overhead.set_max_area 0 tells Design Compiler to minimize area as much as possible (target = 0 means "minimize"). DC will aggressively use smaller cells, share logic, and apply area recovery techniques after meeting timing. Setting this to 0 doesn't mean area will be 0 — it's a directive to minimize. Without this command, DC may leave unused area if timing is met. Always set after timing constraints are applied so timing takes priority.LVT (Low Vt): Fast switching, but high leakage. Used on critical timing paths.
SVT (Standard Vt): Balanced. General use cells.
HVT (High Vt): Slow, but very low leakage. Used on non-critical paths to reduce standby power.
Strategy: Use LVT to fix WNS on critical paths; replace non-critical LVT cells with HVT to recover power. DC can perform multi-Vt optimization automatically when multiple .lib corners are provided.
check_design on a GTECH netlist catches structural issues before committing to technology mapping.set_clock_uncertainty adds a timing margin to account for:- Jitter: Cycle-to-cycle variation in clock edge arrival (PLL jitter)
- Skew: Spatial variation in clock arrival (before CTS; post-CTS uses propagated clocks)
- Margin: Extra guardband for post-silicon variation
Post-CTS: Usually only jitter+margin, as skew is captured in propagated clock latencies. Setup and hold have separate uncertainty values.
group_path -name critical_paths -critical_range 0.5 -weight 5Higher weight = more optimization effort. Useful to tell DC to focus on specific paths without spending runtime on already-met paths.
analyze + elaborate (two-step):
analyze -format verilog -library WORK [file list]elaborate top_moduleThe two-step approach is preferred for large hierarchical designs because analyze compiles each file to an intermediate form, and elaborate builds the hierarchy. This allows reuse of analyzed modules and better error isolation. Also enables explicit parameter override during elaborate.
Flip-flop is inferred when: output is assigned only on a clock edge (
always @(posedge clk)).Latch is inferred when: output is assigned inside a level-sensitive always block AND not all conditions assign the output (incomplete if/case).
Example latch inference:
always @(en or d) if (en) q = d; // q holds when en=0 → LATCHLatches are generally undesirable in synthesis (timing hard to analyze). Fix: Use flip-flops with explicit reset, or make if/case statements complete with else/default.
compile -incremental) re-optimizes only the portions of the design that violate constraints, leaving already-met portions unchanged. It is faster than a full compile and is used:- After making small ECO changes to the netlist
- After constraint changes affecting only a subset of paths
- In a second-pass optimization after an initial compile
check_timing validates that all paths in the design are covered by timing constraints. It reports:- Unconstrained paths: Flip-flops or ports with no clock or timing constraint → timing not analyzed → potential sign-off risk
- Loops: Combinational loops (no register) which cause infinite path delays
- No-clock endpoints: FFs without an associated clock
Propagated clock: After CTS, the actual clock network delay is computed from the clock source through every buffer/inverter to each FF's clock pin. The tool uses real propagated delays — more accurate, removes pessimism of ideal clock uncertainty.
set_propagated_clock [all_clocks] switches to propagated mode in PrimeTime post-CTS.set_driving_cell -lib_cell BUFX4 [get_ports data_in*]set_load: Specifies the capacitive load on output ports (models the off-chip load). Example:
set_load 0.05 [get_ports data_out*]Both are necessary for accurate I/O timing analysis. Without them, input/output timing will be optimistic.
Flow: Run RTL or gate-level simulation → dump SAIF → read in DC/PT for power analysis:
read_saif -input sim.saif -instance top. More accurate switching data = more accurate power optimization decisions.Latch (level-sensitive): Transparent when clock is high (or low). Data can "time-borrow" through the latch during the transparent phase, borrowing time from the next cycle. This makes STA significantly more complex — the tool must perform "time-borrowing" analysis. Latches in pipelines can improve throughput but require careful constraint handling with set_latch_time and cycle_time constraints.
create_generated_clock -name CLK_DIV2 -source [get_ports clk_in] -divide_by 2 [get_pins clkdiv_reg/Q]Generated clocks are essential for STA to correctly analyze paths crossing from the master to generated domain. Without declaring them, those paths are unconstrained. Generated clocks also inherit uncertainty from their master unless explicitly overridden.
Synthesis tools detect loops via
check_design and report them as errors. Loops must be fixed before synthesis can complete. Common causes: feedback mux without enable register, asynchronous handshake signals coded incorrectly in RTL.Pipeline optimization: Adds NEW pipeline stages (registers) to reduce combinational depth at the cost of increased latency. This is an architectural decision made at RTL level, not done automatically by synthesis.
Key difference: Retiming is synthesis-level; pipelining is architectural. Both improve timing but retiming is transparent to function while pipelining increases output latency.
Physical Design Interview Questions (Top 30)
Target: 60–75% for most designs. Lower (<50%) wastes die area and increases cost. Higher (>80%) causes routing congestion, difficulty placing buffers, and degraded routability. Memory-heavy designs may use 40–60% because large SRAMs occupy significant area.
Core area: The interior region where standard cells and macros are placed. It is surrounded by the I/O ring. Core area = Die area − I/O ring area − margins.
The core-to-die margin accommodates power rings, I/O pad connections, and design rule keepouts. Utilization is measured relative to the core area, not die area.
Effects:
- Cells receiving lower VDD switch slower → increased cell delay → potential setup violations
- Severe IR drop can prevent cells from switching at all → functional failure
- Dynamic IR drop (transient) from simultaneous switching of many cells
Fix:
- Widen the wire to reduce current density (J = I/A)
- Add parallel wires (increase cross-section)
- Add more vias (reduce via current density)
- Reduce switching frequency or activity
Skew = T_clk_capture − T_clk_launch
Acceptable values:
- Local skew (adjacent FFs): < 30–50 ps
- Global skew (across chip): < 100–200 ps
Detailed Routing: Works within the global routing assignment to produce exact wire coordinates, widths, vias, and layer assignments. Must satisfy all DRC rules. The actual GDSII-ready metal geometries are the output.
- Spacing violation: Two wires on the same metal layer are closer than the minimum spacing rule
- Width violation: A wire is narrower than the minimum width for that metal layer
- Via enclosure violation: Metal doesn't extend enough beyond the via in all directions
- Antenna violation: Metal attached to gate has too high area ratio (damages oxide during fab)
- Density violation: Metal fill percentage outside foundry-specified min/max range
Errors caught:
- Open circuits: A connection exists in schematic but is missing/broken in layout
- Short circuits: Two nets that should be separate are connected in layout
- Extra devices: Layout has transistors not in schematic
- Missing devices: Schematic has cells not present in layout
Macro placement guidelines:
- Place at die edges or corners to minimize routing blockage in the center
- Align to row boundaries if possible
- Add a "halo" or keepout around each macro (no std cells within 2–5µm)
- Consider macro pin accessibility — pins should face the routing channels
- Group related macros (e.g., all SRAMs near their controllers)
- Maintain N-well continuity across the row (required for correct transistor operation)
- Connect power rails (VDD/VSS straps run through standard cell rows)
- Provide decoupling capacitance (some filler cells include capacitors)
- Ensure minimum density requirements for metal layers
Antenna ratio = Metal area connected to gate / Gate oxide area
Fixes:
- Jump up to higher metal layer (top layer is added last, less exposure)
- Insert antenna diodes at gate inputs (discharge the accumulated charge)
- Use antenna-aware routing (route to higher layer early)
Timing impact:
- Crosstalk delta delay: Aggressor switching in same direction as victim → speeds up victim (improves setup, worsens hold). Opposite direction → slows down victim (worsens setup).
- Crosstalk noise/glitch: On a quiet net, coupling from aggressor creates a voltage spike that may cause a logic error if the net is near a switching threshold.
Detailed Placement: After legalization, cells are in legal positions but timing may be degraded. Detailed placement does local cell swaps, single-row and multi-row moves to improve timing and reduce wirelength while maintaining legality.
Types:
- Hard blockage: No cells placed at all. Used around macros, analog circuits, special structures.
- Soft blockage: Discourages placement but allows it if necessary for congestion relief.
- Partial blockage: Only buffers and inverters (low-level cells) allowed — commonly used around macro halos.
- Route blockage: Blocks routing (not placement) on specific metal layers in a region.
SPEF is needed because wire delays depend heavily on actual metal resistance and capacitance, which are only known after physical layout. Pre-route timing uses estimated wire loads (WLM) which can be 20–30% off. Sign-off STA uses SPEF for accurate, real timing. Without SPEF, timing sign-off is unreliable.
The placer uses early wire length estimation and constraint data to prioritize cell proximity for critical nets. Without timing-driven placement, a pure wirelength minimizer might spread critical cells apart, degrading timing after routing when actual wire RC is seen. Most modern placers (Innovus, ICC2) do timing-driven placement by default.
Level shifters are required when a signal crosses between two power domains at different voltages. They translate signal levels: a signal valid at 0.6V/1.2V in domain A must be converted to the 0.8V/1.8V levels of domain B. Without level shifters, the receiver sees incorrect logic levels, causing functional failure. Level shifters must be inserted in the netlist during synthesis/PD with proper UPF (Unified Power Format) flow.
Without tap cells, parasitic PNP/NPN transistors in the CMOS structure can turn on, creating a low-resistance path from VDD to VSS (latchup), permanently damaging the chip. Foundry rules specify maximum tap cell pitch (typically 20–50µm). They are placed in every standard cell row at regular intervals.
Fixes:
- Reduce placement density (lower utilization) in congested areas
- Add routing blockages on congested layers to force rerouting
- Move macros to open routing channels
- Add extra metal layers via process upgrades
- Use high-fanout net synthesis to break up congested drivers
- Adjust floorplan to redistribute logic
This requires the layout to be "colorable" — adjacent wires must be assigned to different masks (colors). DRC checks for double patterning conflicts (two adjacent same-color wires that should be different colors). Routing tools must be double-patterning-aware and ensure no conflicts.
Uses:
- Visual guide during floorplanning to estimate wire congestion and length
- Identify poor floorplan choices (macros creating long flylines across the chip)
- Estimate wirelength for timing budgeting
- Excessive wire capacitance → slow transition → timing violation
- Single wire spanning entire chip → routing congestion
Synopsys ICC2: Tightly integrated with DC (write_icc2), PrimeTime for timing sign-off, and IC Validator for DRC. Uses hierarchical database (.dlib).
Cadence Innovus: Tight integration with Genus (write_db/read_db), Tempus for in-design STA, and Calibre in-design. Known for Concurrent Optimization (CCOpt) for CTS.
Both support advanced features (multi-patterning, advanced node DRC, power analysis). Choice depends on existing tool stack and foundry PDK support.
- Power domain definitions (which cells are in each domain)
- Supply voltages for each domain
- Power state definitions (ON, OFF, low-power)
- Level shifter and isolation cell requirements
- Power switching cell locations
CTO (Clock Tree Optimization): A post-CTS step that fine-tunes the existing clock tree — adjusting buffer sizes, changing net routes, and tweaking the tree topology to improve skew, latency, and clock power without fully rebuilding the tree. Used after post-route optimization when incremental clock improvement is needed. In Innovus: ccopt_design covers both CTS and CTO.
Without adequate fill: CMP removes too much metal in sparse areas (dishing) or leaves too much in dense areas (erosion) → non-uniform heights → via formation failures → reliability problems.
Fill is inserted after routing using fill tools. It must not electrically connect to any signal but must meet min/max density rules on each layer within specified check windows.
Post-route optimization uses actual extracted parasitics (RC from real wires). It is slower and must maintain DRC cleanliness with every change. Changes are limited (ECO-mode: only add/resize buffers/inverters, minimal perturbation to avoid DRC). Sign-off timing happens post-route.
Row orientation alternates (flipped in Y) so adjacent rows share power rails, reducing the number of power straps needed. Some cells can only be placed in certain orientations (e.g., cells with specific Nwell connections). Placement tools handle orientation automatically per-row.
- STA: Zero setup AND hold violations across all MMMC corners
- DRC: Zero design rule violations (Calibre DRC clean)
- LVS: Layout vs. Schematic clean (zero shorts/opens)
- IR Drop: All cells receive sufficient voltage (static + dynamic)
- EM: All metal/via segments below current density limits
- Antenna: All gates meet antenna ratio rules
- ESD: ESD protection structures verified
- Supply instantaneous current to switching cells without waiting for current from the power pads (which have long RC path)
- Reduce dynamic IR drop by providing local charge
- Filter high-frequency noise on the power supply
STA Interview Questions (Top 30)
Differences from dynamic simulation:
- STA covers 100% of paths; simulation covers only exercised paths
- STA is fast (minutes); full simulation can take days
- STA cannot find functional bugs; simulation can
- STA is deterministic given constraints; simulation depends on input vectors
- STA uses library models; simulation uses detailed transistor behavior
Hold time (T_h): The minimum time AFTER the active clock edge that data must remain stable. If data changes within this window, the FF may capture the new value instead of the intended value.
Both are characteristics of the flip-flop cell from the technology library, measured at specific operating conditions. They represent fundamental timing requirements of the storage element.
Required Arrival Time:
= Clock period + Capture clock latency − Clock uncertainty (setup) − Setup time of FF
Actual Arrival Time:
= Launch clock edge time + Launch clock latency + CK→Q delay + Combinational delay
Slack ≥ 0: Setup MET (timing passes)
Slack < 0: Setup VIOLATED (must fix)
Example: Required = 4.8ns, Arrival = 3.5ns → Slack = +1.3ns (MET with 1.3ns margin)
Setup violation: Data arrives too late → FF is asked to capture before data is stable → metastability → wrong Q output (random). Functional failure at speed.
Hold violation: Data changes too soon after clock → FF captures new value when old value was expected → wrong Q output. A hold violation is particularly dangerous because it causes failure at ALL frequencies — it's not a speed problem, it's a structural problem that causes failure even at low frequency.
Both must be zero violations at sign-off in every MMMC corner.
Data arrival time = T_clk_source + T_launch_clk_latency + T_CKtoQ + T_combo_logic
Typical values: 50–200ps depending on cell drive strength and load. Larger, faster cells have smaller CK-to-Q. It also depends on output load capacitance (higher load → longer CK-to-Q).
Synchronizers (2-FF chains) help by providing extra time for the FF output to resolve before being used. The probability of metastability causing failure decreases exponentially with the resolution time given. Mean Time Between Failures (MTBF) increases exponentially with the number of synchronizer stages.
OCV matters because STA corner analysis (SS, TT, FF) assumes the whole chip is at one corner. In reality, the launch path might be slow while the capture path is fast (or vice versa), creating additional timing margin loss. OCV derating adds guardband by making launch paths pessimistically slow and capture paths pessimistically fast (or vice versa for hold).
AOCV (Advanced OCV): Applies a smaller derating factor to longer paths (more cells) because statistical averaging reduces the probability of all cells simultaneously being at the worst case. Shorter paths get higher derating. This reduces over-pessimism in long paths, recovering timing margin and avoiding unnecessary ECO effort. The derate table is a function of path depth (cell count).
- func_slow: SS, 0.9V, 125°C — Setup check for functional mode
- func_fast: FF, 1.1V, -40°C — Hold check for functional mode
- scan_slow: SS, 0.9V, 25°C — Scan shift timing
- hold_extreme: FF, 1.2V, -55°C — Worst-case hold
- Jitter (period jitter): Cycle-to-cycle variation in clock period from the PLL/crystal
- Skew: Spatial variation in clock arrival times (pre-CTS only; post-CTS uses actual propagated latencies)
- Uncertainty margin: Additional guardband for modeling limitations
set_clock_uncertainty -setup 0.15 -hold 0.05CRPR removes this double pessimism by identifying the common portion of the clock path and applying derating only to the diverging portions. This can recover significant timing margin (50–200ps) especially in designs with long shared clock networks.
- Cell arc: Input→Output delay within a cell (e.g., A→Y in AND2)
- Net arc: Wire delay from cell output to next cell input (RC delay)
- Setup/hold arc: Constraint arcs on FF data vs clock pins
- Clock arc: CK→Q propagation arc of a flip-flop
read_parasitics -format spef filename.spef loads the extracted wire RC parasitics from the post-layout extraction tool (StarRC, QRC). Without this, PrimeTime uses ideal wires or estimated loading (from SDC set_load), which is inaccurate.After loading SPEF, wire delays are computed from actual metal resistance and capacitance (R×C delay), giving accurate net delays. Sign-off timing MUST use SPEF parasitics. The SPEF file must match the design netlist exactly (same net names). Mismatches cause warnings and incorrect timing.
Hold violations occur when the DATA PATH is too fast (short logic path) and the CLOCK arrives late at the capture FF.
Therefore, hold analysis uses the FAST corner (FF process, high voltage, low temperature) which makes data paths fast and can make hold more critical. This is the reverse of setup analysis which uses the SLOW corner. That's why MMMC must check setup at slow corner AND hold at fast corner simultaneously.
check_timing validates constraint completeness and reports:- Unconstrained endpoints: FF data/output ports with no timing path from a clock — path not analyzed by STA
- No-clock FFs: Registers with no associated clock definition
- Partial path constraints: Input_delay covers only -max but not -min (or vice versa)
- Loop detection: Combinational loops
- Multiple clocks: Endpoints with multiple clock paths (may need set_false_path or set_clock_groups)
PBA (Path-Based Analysis): Re-analyzes specific critical paths using the actual input transition experienced by each cell on that specific path. More accurate, less pessimistic — removes false worst-case combinations. Much slower (only applied to a subset of near-critical paths). Used to "rescue" paths that look violated in GBA but actually pass when analyzed properly.
Input transition time (slew): How fast the input signal switches (rise/fall). A slower input → longer cell propagation delay.
Output load capacitance: Total capacitance the cell drives (input caps of fanout cells + wire cap). Higher load → longer output transition and higher cell delay.
The 2D table is NLDM (Non-Linear Delay Model). STA tools interpolate within the table to compute accurate delays for the specific transition and load seen at each cell in the design.
- Output transition (slew) becoming too slow
- Downstream cell delays increasing
- Possible functional failure if slew is extremely slow
- Insert buffers to split the high-fanout net
- Upsize the driving cell to a higher drive strength
- Reduce wire length (physical proximity of sinks)
Hold uncertainty: Applied to increase the minimum data arrival time required to meet hold. It tightens hold (makes hold harder to meet). Typically 50ps.
The asymmetry is because hold uncertainty models jitter that shortens the clock cycle for the capture edge, while setup uncertainty models jitter that either shortens or lengthens. Pre-CTS uses larger uncertainty; post-CTS switches to propagated clocks with only jitter uncertainty remaining.
Used: After routing is complete for final sign-off. The parasitics precisely capture the resistance and capacitance of every metal wire and via, giving timing accuracy within 5% of silicon measurement.
Back-annotation reveals new violations not seen pre-route (because estimated wires underestimated actual wire capacitance). These violations require post-route ECO fixes with minimal netlist perturbation.
They must be carefully applied because:
- Over-generous false paths hide real timing violations
- Wrong multicycle path settings (missing hold correction) create hold violations
- Incorrectly specified endpoints leave real functional violations unchecked
- Timing exceptions survive synthesis to PD to sign-off — errors propagate through the entire flow
set_max_delay: Still analyzes the path for timing, but uses the specified delay as the timing constraint instead of the default (clock period). For paths that need to meet a specific delay that's different from the clock period (e.g., async paths that must complete within 10ns regardless of clock).
Key difference: set_false_path means "never check this." set_max_delay means "check this, but use this constraint."
Prioritization strategy:
- Fix the WNS (worst) path first — largest magnitude violation
- Use ECO minimize-impact mode (minimize cell moves)
- Iterate in small batches (fix 20 paths, re-analyze, fix next 20)
- Monitor TNS trend — decreasing TNS = making progress
- Separate setup and hold fixes (hold buffer insertion can slow setup)
At advanced nodes (<65nm): Below a threshold voltage, temperature inversion occurs — at low Vdd, transistors can be SLOWER at low temperature than high temperature because subthreshold current becomes significant. This means the traditional slow corner (SS, 125°C) may no longer be worst-case timing; SS at -40°C may be worse.
Impact: Need to check timing at multiple temperature points. Some foundries provide separate library corners for this. Ignoring temperature inversion at advanced nodes can lead to post-silicon timing failures.
SI delta delay: Coupling from aggressor wires causes victim wire delay to increase or decrease. STA includes SI analysis in sign-off by computing the worst-case delay considering all possible aggressor switching combinations.
SI noise analysis: Checks if crosstalk-induced voltage glitches on quiet nets can cause logic errors. The noise immunity of the receiving cell must exceed the peak noise voltage.
SI analysis requires layout parasitics including coupling capacitance (SPEF with coupling) — simple ground capacitance models are insufficient for SI-accurate timing.
Recovery time: Minimum time the async signal must be deasserted BEFORE the active clock edge. Analogous to setup time — if async reset is released too close to the clock, the FF may not properly respond to the clock. Checked with set_max_delay or special recovery constraints in SDC.
Removal time: Minimum time the async signal must remain asserted AFTER the active clock edge. Analogous to hold time. These are library-characterized values that must be checked if async resets are used in a synchronous design.
set_input_delay -max: Latest time data can arrive at the port relative to clock. Used for setup analysis of the first internal register that captures this input.set_input_delay -min: Earliest time data arrives. Used for hold analysis (ensures data doesn't arrive so early that it violates hold at the capturing FF).set_output_delay -max: Latest time data must be stable at output before next clock edge (for the downstream receiver's setup).set_output_delay -min: Earliest time data must be stable (for downstream receiver's hold).All four values (-max/-min for input/output) must be specified for complete I/O timing coverage.
STA computes the statistical distribution of path delay (sum of independent Gaussian cell delays → Gaussian path delay by central limit theorem). Slack is then expressed as a sigma value — e.g., "path meets timing at 3σ".
Benefits: Most accurate OCV model, removes pessimism from flat/AOCV derating. Used in advanced (<7nm) nodes where OCV is very significant. Requires POCV characterization data from the foundry library.
Proper handling:
- set_clock_groups -asynchronous: Tells the STA tool to not analyze paths between these clock domains. The crossing is handled by synchronizers in the design.
- CDC analysis (separate tool: Mentor CDC, Cadence JasperGold): Verifies correct synchronization structures are present
- The synchronizer itself is analyzed with appropriate timing constraints
Flow:
- PT analyzes sign-off netlist with SPEF, finds violations
fix_eco_timing -setupandfix_eco_timing -holdgenerate cell changes (upsize/insert buffers)- Changes written to
eco_changes.tcl - Innovus/ICC2 reads changes, places/routes ECO cells
- RC re-extracted, PT re-runs analysis
- Iterate until clean
Formula Cheatsheet
VLSI Glossary
Interactive Waveform Lab
An interactive digital timing waveform viewer. Toggle signals, animate the waveform, and inject setup/hold violations to see how they appear in practice.
- Click Play to animate the waveform timeline
- Toggle individual signals using the colored buttons
- Click Introduce Violation to inject a setup or hold violation
- The violation window appears highlighted in red on the waveform
- Click Reset to restore clean waveforms
- CLK — 5ns period, 50% duty cycle (200MHz)
- D — Data changes asynchronously relative to CLK
- Q — Captured on rising edge of CLK (after CK-to-Q delay)
- RESET — Active-low async reset; clears Q immediately
- Setup window — Red zone before capture edge where D must be stable
Physical Verification (PV)
Think of it as the final quality inspection before manufacturing. A single unresolved DRC violation → foundry rejects your file. A single LVS open → chip has a broken wire → dead chip. PV sign-off is non-negotiable.
6.1 PV Flow Overview
Physical Verification consists of several distinct checks, each targeting a different failure mode. They must all be run at sign-off, typically in this order:
6.2 DRC — Design Rule Check
DRC verifies that your layout geometry satisfies every manufacturing rule in the foundry's PDK. These rules exist because the lithography and etching processes have physical limits — too-small features simply cannot be manufactured reliably.
| Rule Category | What It Checks | Why It Exists | Typical Fix |
|---|---|---|---|
| Minimum Width | Wire width ≥ Wmin per metal layer | Too-thin wires break during CMP or have excessive resistance / EM risk | Widen the wire; router usually handles this automatically |
| Minimum Spacing | Gap between same-layer shapes ≥ Smin | Lithography cannot resolve too-small gaps → shorts between wires | Increase routing track separation; re-route in congested area |
| Via Enclosure | Metal must extend beyond via edge by min amount on all sides | Overlay (misalignment) in fab could expose via without metal contact | Use larger via enclosure design rule; ensure auto-router uses correct rules |
| Via Coverage | Minimum number of vias on high-current nets | Single via has limited current capacity; EM requires multiple vias | Replace single-cut vias with via arrays; use via doubling ECO |
| Notch Rule | Internal notch (concave corner) ≥ Nmin | Narrow notches print incorrectly — corners round off → shape deformation | Fill small notches; ensure polygon merging after fill insertion |
| Area Rule | Minimum enclosed polygon area | Tiny isolated shapes may not print or etch completely | Remove floating metal shapes; merge small disconnected polygons |
| Extension Rule | Active/poly must extend beyond diffusion edge | Transistor channel defined by overlap; insufficient extension = no transistor | Standard cell library handles this; flagged in custom analog layout |
| Density Rules | Min/max metal fill % per window per layer | CMP planarization requires uniform metal density across the wafer | Run fill insertion tool (Calibre Fill); remove excess fill if over-dense |
| Double-Patterning | Adjacent same-mask shapes must be separable into 2 colors | At <20nm, single lithography cannot print minimum pitch → 2 exposures needed | Assign colors using DP-aware router; fix coloring conflicts |
| Poly Spacing to Diff | Minimum distance between poly gate and nearby diffusion | Gate coupling to adjacent diffusion can cause leakage or latchup | Handled by standard cell design; appears in custom layout |
# Calibre DRC batch run calibre -drc \ -hier \ # hierarchical mode (faster, uses cell caching) -turbo 16 \ # 16 CPU threads parallel -64 \ # 64-bit mode for large designs -runset ./drc_runset.svrf \ # DRC rule deck from foundry PDK -gds ./out/chip_final.gds \ # input layout -top chip_top # top-level cell name # Key sections in the Calibre runset (.svrf): DRC RESULTS DATABASE "drc.results" ; output DB DRC SUMMARY REPORT "drc_summary.rpt" ; human-readable summary DRC MAXIMUM RESULTS 1000 ; stop at 1000 per rule (debug mode) LAYOUT SYSTEM GDSII # Check results grep "RULE" drc_summary.rpt | sort -k3 -rn | head -20 # Shows top 20 rules with most violations — fix these first
6.3 LVS — Layout vs. Schematic
LVS extracts a netlist from your physical layout (by tracing metal connectivity and identifying transistors) and compares it against your reference schematic/netlist. Any mismatch is a critical bug that would cause chip failure.
| LVS Error | What It Means | Root Cause | How to Debug |
|---|---|---|---|
| Open Net | A connection present in the schematic is missing in the layout — net is broken | Missing wire segment, broken via, net not routed, missing metal fill connection | Highlight the net in layout viewer. Find the discontinuity. Add missing wire/via segment. |
| Short Circuit | Two nets that should be separate are electrically connected in layout | Routing DRC waiver created a short, accidentally connected polygons, missing wire cut | Identify which two nets are shorted. Find where they touch. Remove the connection or add a cut. |
| Device Mismatch | Device exists in schematic but not in layout (or vice versa) | Cell not placed, wrong cell reference, flatten/unflatten issue, macro not properly instantiated | Compare instance counts. Find missing instance in layout. Check hierarchy mapping. |
| Port Mismatch | Port name or type doesn't match between layout and schematic | Wrong pin label on layout port, renaming in synthesis not propagated to layout, case mismatch | Check label text on layout pins vs netlist port names. Calibre is case-sensitive. |
| Unconnected Port | A port declared in the netlist has no connection in the layout | I/O pad not connected to core, power domain port not properly tied, spare gate left floating | Find the port in the layout. Verify it has a metal label and is connected to the correct net. |
| Parameter Mismatch | Device dimensions differ between layout and schematic (W/L, capacitor value) | Standard cell used wrong size, analog cell manually edited without updating schematic | Check transistor W/L in layout vs SPICE netlist. Typically only affects analog blocks. |
# Calibre LVS run calibre -lvs \ -hier \ -turbo 16 \ -64 \ -runset ./lvs_runset.svrf \ -gds ./out/chip_final.gds \ -top chip_top \ -netlist ./out/chip_netlist.v # reference from synthesis # LVS report sections to check: # 1. CIRCUIT COMPARISON RESULTS — Overall PASS/FAIL # 2. SHORTS — nets merged that shouldn't be # 3. OPENS — nets split that should be connected # 4. UNMATCHED NETS — present in one side only # 5. UNMATCHED INSTANCES — devices missing # LVS clean confirmation in report: # "CORRECT" → clean # "INCORRECT" → failures exist # Quick grep for errors: grep -E "INCORRECT|SHORTS|OPENS|Unmatched" lvs_summary.rep
6.4 ERC — Electrical Rule Check
ERC catches electrical issues that DRC and LVS miss. A layout can be DRC-clean and LVS-clean but still have electrical errors that cause chip malfunction.
| ERC Check | What It Detects | Consequence if Missed |
|---|---|---|
| Floating Gate | MOSFET gate connected to nothing (floating net) | Gate floats to indeterminate voltage → random switching behavior. Very common ERC error in early PD. |
| Floating Well | N-well or P-well not connected to VDD/VSS | Well floats → transistors biased incorrectly → latchup risk, parametric failures |
| VDD/VSS Short | Power and ground nets connected together | Direct short circuit → chip draws excessive current → burns out immediately on power-up |
| Input Not Driven | Logic input pin with no driver | Input floats → oscillation, metastability, excessive power consumption |
| Output Contention | Two outputs driving the same net simultaneously | Short circuit between drivers → device damage, incorrect logic level |
| ESD Violation | I/O pad has insufficient ESD protection structure | ESD event during handling destroys input gate oxide → dead chip before it even runs |
| Latchup Violation | Tap cells too far from active region (>50µm) | Parasitic SCR triggers → VDD-to-VSS latchup → chip permanently damaged |
6.5 Antenna Check
During plasma etching in fabrication, metal connected to gate terminals accumulates charge. The antenna ratio is the cumulative metal area divided by the gate area. Exceeding the foundry limit damages the thin gate oxide — permanently degrading or destroying the transistor.
2. Insert antenna diode: Place a reverse-biased diode (anode to net, cathode to VSS) at the gate. During fab the diode conducts the plasma current safely to ground before oxide damage occurs.
6.6 Metal Fill & Density Rules
CMP (Chemical Mechanical Polishing) planarizes each metal layer. Non-uniform metal density causes uneven polishing: sparse areas "dish" (metal removed excessively) and dense areas retain more. Both cause via formation failures and reliability issues.
| Parameter | Typical Range | Effect of Violation |
|---|---|---|
| Minimum metal density | 20–30% per check window | Dishing: metal recedes below ILD surface → via misses metal → open circuit |
| Maximum metal density | 70–80% per check window | Erosion: ILD polished away → shorts between layers, increased leakage |
| Check window size | 50×50 µm – 200×200 µm | Foundry-defined. Smaller windows = tighter local control |
| Fill shape min size | ≥ Wmin per layer | Too-small fill shapes violate width rules themselves |
| Fill to signal spacing | ≥ 2× normal spacing | Fill too close to signal → coupling capacitance → SI issues |
# Run Calibre fill (after routing, before final DRC) calibre -drc -hier -runset fill_runset.svrf # fill_runset.svrf key options: LAYOUT SYSTEM GDSII LAYOUT PATH "chip_prefill.gds" DRC RESULTS DATABASE "fill_out.gds" ; GDS with fill added # Fill insertion is non-electrical — it must NOT connect to any signal net # Most foundry fill decks insert floating unconnected metal polygons # Some advanced PDKs insert connected fill for better SI (optional) # After fill: re-run DRC to verify: # 1. Fill shapes themselves don't create new DRC violations # 2. Fill-to-signal spacing rules satisfied # 3. Density targets met on all layers
6.7 PV Tool Knowledge
Calibre from Siemens EDA (formerly Mentor Graphics) is the industry-standard sign-off verification tool. Virtually all foundry PDKs are certified for Calibre. If your Calibre DRC is clean, the foundry accepts your GDS.
| Calibre Mode | Command | Purpose |
|---|---|---|
| DRC | calibre -drc -hier -runset drc.svrf | Design rule verification against foundry rules |
| LVS | calibre -lvs -hier -runset lvs.svrf | Layout vs schematic comparison |
| PEX/RCX | calibre -xrc -rcx -runset rcx.svrf | Parasitic RC extraction → generates SPEF |
| Fill | calibre -drc -hier -runset fill.svrf | Insert dummy metal fill to meet density rules |
| ERC | calibre -erc -runset erc.svrf | Electrical connectivity and latchup checks |
| PERC | calibre -perc -runset perc.svrf | ESD and latchup reliability analysis |
| DFM | calibre -dfm -runset dfm.svrf | Design-for-manufacturing: yield improvement checks |
| Litho Check | calibreLitho -verify | Optical proximity correction / lithography simulation |
calibredrv -m gds -gui. Features: highlight errors in layout, zoom to violation, batch-fix mode, error count by rule, and cross-probe to schematic for LVS.
Synopsys IC Validator (ICV) is Synopsys's native sign-off verification tool, tightly integrated with StarRC (parasitic extraction) and ICC2. Growing adoption especially in designs using the full Synopsys tool chain.
| ICV Mode | Command | Purpose |
|---|---|---|
| DRC | icv -drc -i chip.gds -c icv_drc.rs | Design rule verification |
| LVS | icv -lvs -i chip.gds -s netlist.v | Layout vs schematic |
| Fill | icv -fill -i chip.gds -c fill.rs | Density fill insertion |
| ERC | icv -erc -i chip.gds | Electrical rule check |
| In-design DRC | icc2_shell> check_drc | DRC inside ICC2 during routing — catch violations early |
| Feature | Calibre (Siemens) | ICV (Synopsys) |
|---|---|---|
| Foundry Certification | Gold Standard — all foundries | Certified at major foundries |
| Tape-out Acceptance | Universally accepted | TSMC, Samsung, GF certified |
| PD Integration | In-design: Innovus + ICC2 | Native in ICC2 |
| Parasitic Extraction | Calibre xRC/PEX | StarRC (separate tool) |
| GUI Viewer | Calibre RVE | Custom Error Browser |
| Speed (large designs) | Excellent (hierarchical) | Excellent (hierarchical) |
| Rule Deck Language | SVRF / TVF | RSDB / SVRF compatible |
6.8 Physical Verification — Interview Questions
LVS (Layout vs Schematic): Extracts connectivity from layout and compares to reference netlist. Ensures the manufactured chip WILL match the intended circuit.
ERC (Electrical Rule Check): Checks for floating nodes, power/ground violations, ESD issues. Ensures the chip WILL WORK electrically.
All three must be 100% clean for tape-out. No exceptions. A single unresolved error means the foundry rejects the submission or the chip is at risk.
Two fixes:
- Layer jump: Route the wire through a higher metal layer before connecting to the gate. Higher layers are deposited later → less plasma exposure time → less charge accumulation. This is the preferred fix as it adds no area.
- Antenna diode: Insert a reverse-biased diode (anode on the net, cathode to VSS) at the gate input. During fab, accumulated charge safely bleeds to ground through the diode. Adds small area (~0.5–1 std cell)
Common causes:
- Two wires on the same metal layer touching (spacing DRC violation that was waived)
- Via connecting two unrelated nets through the same via hole
- Accidentally connected power/signal during manual ECO
- Missing via cut between two nets running over each other
Legitimate use cases:
- Spacing violations in ESD clamp cells — intentionally tight by design, foundry-approved cell
- Density violations in seal ring or pad ring areas — these regions have special rules
- Known violations inside foundry-provided hard IP (black box) — foundry-guaranteed correct
- Increases wire capacitance → slower transitions → increased propagation delay
- Adds coupling between fill and signal → minor crosstalk noise
- Can shift timing by 2–5% on metal-dense designs
- Run fill insertion BEFORE final sign-off STA (not after). STA must include fill parasitics.
- Foundry fill rules specify minimum fill-to-signal spacing — this limits coupling impact.
- Some flows use "timing-aware fill" which avoids placing fill near critical nets.
- StarRC/Calibre xRC re-extraction after fill captures the additional capacitance in SPEF.
Required at:
- 28nm: some critical layers
- 20nm/16nm: M1, M2, via layers
- 10nm/7nm: Most metal layers, Fin definition, contact layers
Relationship to STA:
- Routing completes → GDS generated
- Calibre xRC reads GDS → produces chip.spef
- PrimeTime reads chip.spef via
read_parasitics - Wire delays computed from real RC → accurate back-annotated timing
- Sign-off STA with real parasitics must pass before tape-out
- Sort by rule, count descending:
grep "RULE" drc_summary.rpt | sort -k3 -rn. The top rule might account for 40,000 violations from one root cause. - Fix the top rule first: Understand why it's occurring. Is it a routing configuration issue? A macro halo not set? A missing fill constraint?
- Batch fix vs point fix: If 10,000 violations are "M2 spacing" due to track pitch, fix the router configuration and re-route — don't fix them one by one.
- Re-run DRC after each major fix: Cascade effects — fixing spacing might introduce new width violations.
- Isolate by region: If violations cluster in one area, focus there. Use Calibre's "check window" to run DRC on a sub-region during debug.
Important nuance: LVS-clean does NOT guarantee the design is functionally correct. It only guarantees the layout faithfully implements the netlist. If the netlist itself has a bug (wrong logic, timing violation, incorrect constraint), LVS will still pass. That's why functional verification (simulation), STA, and LVS are all independently required — they catch different classes of errors. An LVS-clean, STA-clean chip can still fail functionally if the RTL logic was wrong.
- Mechanically seals the chip edge against moisture ingress (prevents corrosion)
- Guards against plasma-induced damage at the die edge during dicing
- Provides a stress buffer between the die bulk and the scribe line
How to Prepare — Career Roadmap
This section is your end-to-end guide to entering and advancing in VLSI engineering. Whether you're a student targeting your first role or an experienced engineer moving into a specialized domain, follow this structured path. Advice written from the perspective of what hiring managers and senior engineers actually look for.
7.1 VLSI Domains — Which One Is For You?
Skills needed: SystemVerilog, microarch, timing-aware RTL coding.
Companies: Intel, AMD, ARM, Qualcomm, Apple, NVIDIA (design teams).
Skills needed: TCL scripting, DC/Genus, SDC, timing analysis, QoR optimization.
Companies: Samsung, MediaTek, Marvell, Broadcom.
Skills needed: Innovus/ICC2, floorplanning, CTS, routing DRC, ECO.
Companies: TSMC, GlobalFoundries, fabless design houses, Apple silicon.
Skills needed: PrimeTime, SPEF, MMMC, OCV/AOCV, ECO flows.
Companies: Any semiconductor company with tape-out responsibility.
Skills needed: Calibre DRC/LVS, SVRF rule decks, GDS debugging, Calibre xRC.
Companies: Foundry customers, TSMC design enablement, IP companies.
Skills needed: SystemVerilog, UVM, SVA, Questa/VCS, formal tools.
Companies: All major semiconductor companies.
7.2 Learning Roadmap — Fresher to Professional
7.3 Essential Tools — What to Learn and How
| Domain | Industry Tool | Free/Open Alternative | How to Practice |
|---|---|---|---|
| Synthesis | Synopsys DC / Cadence Genus | Yosys (open source) | Synthesize your Verilog designs with Yosys. Understand liberty files. Write SDC constraints manually. Compare area reports. |
| Place & Route | Cadence Innovus / Synopsys ICC2 | OpenROAD via OpenLane2 | Use OpenLane with Sky130 PDK. Run full RTL-to-GDS on a small design (UART, I2C, simple CPU). Examine each output. |
| STA | Synopsys PrimeTime / Cadence Tempus | OpenSTA (inside OpenLane) | Read timing reports from OpenSTA. Understand slack calculation. Introduce timing violations manually and fix them. |
| Simulation | Synopsys VCS / Cadence Questa | Verilator / Icarus Verilog | Write testbenches. Simulate your RTL. View waveforms in GTKWave. Practice writing self-checking testbenches. |
| Physical Verification | Calibre DRC/LVS | Magic VLSI / KLayout | Open Sky130 GDS in KLayout. Inspect metal layers. Run built-in DRC checks. Understand what each layer represents. |
| Parasitic Extraction | Calibre xRC / Synopsys StarRC | OpenRCX (inside OpenROAD) | Run OpenRCX on a placed-and-routed design. Examine the SPEF output. Understand how RC values affect timing. |
| Waveform Viewing | Synopsys DVE / Cadence SimVision | GTKWave | View VCD dumps from Verilator/Icarus simulation. Practice reading waveforms, adding cursors, measuring timing. |
| Layout Editing | Cadence Virtuoso / Synopsys L-Edit | Magic VLSI / KLayout | Draw simple standard cells in Magic. Understand how transistors form. See the connection between schematic and layout. |
# Install OpenLane2 (requires Docker or Nix) pip install openlane # Create a minimal design config mkdir my_design && cd my_design cat > config.json << 'EOF' { "DESIGN_NAME": "my_alu", "VERILOG_FILES": "src/alu.v", "CLOCK_PORT": "clk", "CLOCK_PERIOD": 10, "FP_CORE_UTIL": 40, "PL_TARGET_DENSITY": 0.4 } EOF # Run complete RTL-to-GDS flow openlane config.json # Outputs you'll find in runs/RUN_*/: # synthesis/ → gate-level netlist (.v) # floorplan/ → DEF with core/IO defined # placement/ → placed cells DEF # cts/ → clock tree built DEF # routing/ → fully routed DEF + GDS # signoff/ → timing reports (OpenSTA) # signoff/ → DRC results (KLayout/Magic) # View final GDS in KLayout: klayout runs/RUN_latest/final/gds/my_alu.gds
7.4 Interview Preparation Plan — 8 Weeks
| Week | Topic Focus | What to Study | Practice Task |
|---|---|---|---|
| Week 1 | Digital Fundamentals | Setup/hold time, metastability, clock domains, timing diagrams, flip-flop operation | Draw timing diagrams by hand. Explain setup violation to a friend without notes. |
| Week 2 | Synthesis Concepts | RTL-to-netlist flow, SDC constraints (create_clock, set_input/output_delay, false_path, multicycle_path), QoR metrics (WNS/TNS) | Write a complete SDC file for a simple design from memory. Run Yosys synthesis on a small Verilog module. |
| Week 3 | STA Deep Dive | Setup/hold slack formulas, 4 path types, timing reports, OCV/AOCV, propagated clock, MMMC corners | Manually calculate setup slack for a given circuit. Read a full PrimeTime report and identify violations. |
| Week 4 | Physical Design | Floorplan formulas (utilization, AR), IR drop, CTS (skew/latency/uncertainty), routing DRC rules, Innovus vs ICC2 | Run OpenLane on a UART or I2C controller. Examine floorplan DEF, routing layers, DRC results. |
| Week 5 | Physical Verification | DRC categories, LVS flow, antenna violations, metal fill/density, Calibre commands, ERC checks | Open a Sky130 GDS in KLayout. Identify metal layers. Find a DRC violation and understand which rule it breaks. |
| Week 6 | Advanced Topics | CDC (2FF synchronizer, set_clock_groups), low power (clock gating, multi-Vt, UPF), timing closure ECO flow | Write a CDC synchronizer in Verilog. Simulate it with an asynchronous signal crossing. Verify no metastability. |
| Week 7 | Mock Interviews | Work through all 90 Q&As in this guide. Time yourself. Answer out loud, not just in your head. | Do 3 mock interviews with a peer or use a mirror. Record yourself. Identify weak areas and go back to Week 2–6. |
| Week 8 | Company-Specific Prep | Research target company's products. Know their process node (e.g., TSMC 5nm, Samsung 3nm). Read recent conference papers from their engineers. | Prepare 3–5 intelligent questions to ask the interviewer. Show you understand their specific domain challenges. |
7.5 What Interviewers Actually Evaluate
- Can explain why, not just what. "Why does hold analysis use the fast corner?" shows deep understanding.
- Hands-on experience — even with open-source tools. Running OpenLane end-to-end beats "I studied PD in class."
- Correct use of units and numbers. "Skew under 50ps," not "skew should be small."
- Knows limits and tradeoffs. "Increasing drive strength fixes setup but increases power and may cause hold violations."
- Asks clarifying questions before answering — demonstrates engineering mindset.
- Admits uncertainty honestly: "I haven't used Genus directly but DC concepts are the same — let me explain my DC knowledge."
- Connects concepts: "DRC-clean layout is needed before Calibre xRC extraction which feeds sign-off STA."
- Memorizing answers without understanding. Interviewers probe with follow-up questions — memorized answers collapse immediately.
- "I know the theory but haven't used the tools." Every VLSI job requires tool proficiency. Use open-source tools to fill this gap.
- Getting confused between setup and hold. This is the most basic STA concept — if you mix them up, the interview ends.
- Not knowing which corner is used for setup vs hold analysis. This comes up in almost every STA interview.
- Saying "I would just rerun synthesis" to fix a post-route timing violation. Late-stage fixes must be ECO-based — no full rerun.
- Cannot explain what SPEF is and why it's needed. This is fundamental to any STA sign-off conversation.
- Treating LVS-clean as the same as functionally correct. Interviewers know this is a common misconception.
Right answer: "Setup time is the minimum duration that data must be stable at the flip-flop input before the active clock edge, so the FF can reliably capture it. Hold time is the minimum duration data must remain stable after the clock edge. Violating setup causes data to arrive late — the FF may not capture the correct value. Violating hold causes data to change too quickly — the FF may capture the new value instead of the intended one. Critically, hold violations cause failures at all frequencies, not just high speed — they're structural, not a speed problem. That's why they're fixed with delay buffer insertion rather than clock frequency reduction."
7.6 Essential Books, Courses & Resources
- Weste & Harris — CMOS VLSI Design (the bible)
- Rabaey et al. — Digital Integrated Circuits
- Patterson & Hennessy — Computer Organization & Design
- Bhatnagar — Advanced ASIC Chip Synthesis (DC-specific)
- Elmore — RC delay modeling papers
- efabless.com — Free chip tapeout with Sky130
- openroad.tools — Open-source RTL-to-GDS
- vlsiuniverse.com — VLSI interview prep
- Synopsys SolvNetPlus — Official DC/PT docs
- Cadence Training — Innovus/Genus tutorials
- IEEE Xplore — DAC, ICCAD, CICC papers
- MIT 6.004 — Computation Structures (free)
- Coursera: HDL & FPGA — Entry RTL practice
- Build a RISC-V CPU in Verilog, synthesize it
- Tape-out on Sky130 via efabless chipIgnite
- Contribute to OpenROAD — visible open-source work
- ICCAD Contests — Register for student competitions
7.7 A Day in the Life — By Role
| Time | Activity | Tools Used |
|---|---|---|
| 9:00 AM | Check overnight Innovus place-and-route run results. Review DRC violation count trend and timing summary. | Innovus GUI, log parser scripts |
| 9:30 AM | Team stand-up: report WNS/TNS status, blocking issues (congested area, unresolvable DRC in macro boundary) | Confluence, Jira |
| 10:00 AM | Debug 3 specific DRC violations near a macro corner that have resisted automatic fixing. Manually re-route 2 wires. | Innovus ECO route, DRC GUI |
| 11:30 AM | Review CTS results. Skew is 220ps — above 200ps target. Adjust ccopt settings, re-run CTS on critical clock domain. | Innovus ccopt_design |
| 1:30 PM | Run post-route STA to check setup/hold after morning ECO changes. Two new hold violations introduced by yesterday's buffer insertion. | Tempus, timing reports |
| 2:30 PM | Fix hold violations by inserting delay buffers on 2 short paths. Re-run route_opt for those nets. | Innovus ecoAddDelay |
| 3:30 PM | Meeting with STA team to align on acceptable WNS margin at this stage of the project. | – |
| 4:00 PM | Write TCL script to automate tomorrow's overnight run: place → CTS → route → extractRC → STA → DRC. Submit to compute farm. | TCL, LSF job scheduler |
| 5:00 PM | Update project tracking spreadsheet. Document today's changes. Review tomorrow's schedule. | Excel, Confluence |
| Time | Activity | Tools Used |
|---|---|---|
| 9:00 AM | Review overnight PT sign-off results across 5 MMMC corners. Identify which corners still have violations. | PrimeTime, report parser |
| 9:45 AM | WNS is -0.04ns at func_slow corner on one clock domain. Identify the critical path — reg-to-reg through a wide adder. | PT report_timing |
| 10:15 AM | Run PT-ECO to generate fix suggestions (upsize 3 cells, insert 1 buffer). Review suggestions for reasonableness. | PT fix_eco_timing |
| 10:45 AM | Send ECO script to PD team for implementation in Innovus. Coordinate on expected turnaround. | Email, Jira ticket |
| 11:30 AM | Review hold corner (func_fast) — clean. Review scan corner (scan_slow) — 2 violations. Update tracking spreadsheet. | PrimeTime, Excel |
| 1:30 PM | New SPEF delivered from PD team after yesterday's route ECO. Run full PT update on all 5 corners. ~2 hour runtime. | PrimeTime update_timing |
| 3:30 PM | Results back: func_slow now +0.02ns (CLEAN). Scan corner improved to -0.01ns — one path remains. Document. | PrimeTime, Confluence |
| 4:00 PM | Debug the remaining scan violation — it's an MCP (multicycle_path) that has wrong hold correction. Fix SDC, re-run. | PrimeTime, SDC editor |
| 5:00 PM | Submit final overnight run with updated SDC. Send status email to project lead: "func_slow CLEAN, scan_slow -0.01ns in progress" | LSF, email |
| Time | Activity | Tools Used |
|---|---|---|
| 9:00 AM | Review overnight Calibre DRC results. 142 violations remain (down from 2,400 last week). Classify by rule. | Calibre RVE, shell scripts |
| 9:30 AM | Top rule: M3_SPACING.2 — 47 violations, all in one macro boundary region. Root cause: macro halo not set correctly. | Calibre RVE, Innovus |
| 10:00 AM | Fix: Adjust macro halo in Innovus, re-run routing around that macro. Generates new GDS for next DRC run. | Innovus, script |
| 11:00 AM | LVS run completed overnight — 3 opens found. Debug: all 3 are on VDD tie-off cells that weren't properly connected after fill insertion. | Calibre RVS, layout viewer |
| 11:45 AM | Fix: Add missing metal connections in Innovus. Verify fix with quick LVS on the affected nets only. | Innovus ECO, Calibre partial LVS |
| 1:30 PM | Antenna check: 8 violations remain. All on NAND gate inputs with long M1 wires. Add jumper vias to M3 for 6 of them; insert 2 antenna diodes for the others. | Innovus antenna fixer, Calibre |
| 3:00 PM | Submit full DRC/LVS/Antenna run to compute farm with new GDS. Expected 4 hours runtime. | LSF compute farm |
| 3:30 PM | Prepare tape-out checklist. Verify all IP blocks have current DRC waivers. Coordinate with fab liaison on GDS delivery window. | Confluence checklist, email |
| 5:00 PM | Update PV sign-off dashboard. Send status: "DRC: 142→TBD tonight, LVS: 3 opens fixed, Antenna: 8→2 fixes sent to PD" | Dashboard, email |
| Time | Activity | Tools Used |
|---|---|---|
| 9:00 AM | Review overnight compile_ultra run. WNS = -0.28ns, TNS = -15.4ns. 47 violating endpoints. Area 4.2 mm². | DC, QoR scripts |
| 9:30 AM | Identify top 5 violating paths — all through the FP multiply unit. Discuss with RTL team: can MCP be applied? | DC report_timing, email |
| 10:00 AM | RTL team confirms multiplier is 2-cycle. Add MCP to SDC. Re-run compile_ultra incremental on that path group. | DC compile -incremental |
| 11:00 AM | New WNS = -0.06ns, TNS = -0.9ns. Good progress. Remaining violations are in the interconnect arbiter. | DC, report_qor |
| 1:30 PM | Try path group with higher weight on the arbiter timing paths. Also try ungroup on the arbiter sub-module to allow cross-boundary optimization. | DC group_path, ungroup |
| 2:30 PM | Check max-cap violations — 12 found on high-fanout reset net. Set don't_touch on clock buffers. Insert buffer tree on reset. | DC set_max_fanout, compile |
| 3:30 PM | Run power analysis with SAIF from simulation. Dynamic power = 380mW — 15% over target. Apply clock gating and increase HVT cell usage on non-critical paths. | DC, power_compiler |
| 4:30 PM | Submit overnight full compile_ultra run with updated SDC and power optimizations. Write synthesis run notes for the team. | LSF, Confluence |
| 5:00 PM | Write handoff email to PD team with current netlist, mapped SDC, and QoR summary noting areas of concern for floorplanning. |
7.8 Skills Proficiency Matrix
Rate yourself honestly against this matrix. Target "Intermediate" in your primary domain and "Awareness" in adjacent domains before your first interview. "Advanced" in your domain is the 3–5 year mark.
| Skill Area | Awareness | Intermediate (Hire-ready) | Advanced (3–5yr) |
|---|---|---|---|
| Verilog / SV | Can read RTL code. Knows gates, FFs, always blocks. | Writes synthesizable RTL. Understands latch vs FF inference. Codes FSMs correctly. | Writes parameterized, reusable RTL. Knows synthesis implications of every construct. |
| Synthesis / SDC | Knows flow: RTL → netlist. Knows create_clock exists. | Writes complete SDC. Runs DC. Interprets QoR. Understands WNS/TNS. | Tunes compile strategies. Multi-Vt optimization. compile_ultra deep settings. |
| STA | Can define setup/hold time. Knows slack formula. | Reads PT timing reports. Understands MMMC. Knows OCV derating. Can close timing with ECO. | POCV, CRPR, PBA vs GBA. Develops full MMMC corner methodology for a project. |
| Physical Design | Knows PD flow stages. Can explain utilization and skew. | Can floorplan a block. Runs Innovus/OpenROAD full flow. Understands DRC and IR drop. | Closes timing at advanced nodes. Owns CTS strategy. Designs power grid from scratch. |
| Physical Verification | Knows DRC/LVS purpose. Can identify a spacing violation. | Runs Calibre DRC/LVS. Debugs top violation types. Understands antenna and fill. | Owns tape-out PV sign-off. Writes Calibre runset modifications. DP-aware verification. |
| TCL Scripting | Can read/modify existing TCL scripts. Knows variables, loops, procs. | Writes TCL flow scripts from scratch. Parses timing reports. Automates batch jobs. | Writes complex flow automation, QoR parsers, automatic ECO generators in TCL/Python. |
| Low Power | Knows clock gating saves power. Knows LVT has more leakage. | Sets up multi-Vt optimization in synthesis. Understands UPF domains and level shifters. | Designs full multi-voltage UPF architecture. Owns power sign-off (Voltus/RedHawk). |