SECTION 01

Logic Synthesis

1.1 Introduction to Synthesis

🧱 Start Here — What Is RTL?

RTL (Register Transfer Level) is the way engineers describe digital hardware in code — using languages like Verilog or VHDL. You write what the circuit does (assign outputs = inputs AND enable, always @posedge clk...) without specifying exactly which physical transistors to use. Think of it like a recipe — RTL is the recipe, silicon transistors are the actual ingredients. Synthesis is the chef that takes that recipe and builds the real thing from whatever physical components are available in the foundry's library.

💡 Why Can't We Just Send RTL to the Foundry?

A foundry manufactures silicon using specific transistor sizes (5nm, 7nm, 28nm…). They have a library of pre-designed, pre-tested cells (AND gates, flip-flops, buffers) called the Standard Cell Library. Your RTL code says "I want an adder" but the foundry needs to know exactly which cells to connect together, how many transistors, and what wire connections to make. Synthesis bridges this gap — it translates your behavioral intent into real, physical gate instances from the foundry's library.

Key Terms — Defined Before We Go Further

🔵

RTL (Register Transfer Level)

Verilog/VHDL code describing circuit behavior at the level of registers and data transfers. Tells you what to compute, not how to implement it in transistors. Example: assign y = a & b; is RTL for an AND gate.

🔲

Standard Cell

A pre-designed, pre-verified logic building block from the foundry library — AND2, OR3, NAND2, DFF (flip-flop), BUF, INV, MUX2. Every cell has a fixed height (fits in a row), known delay, area, and power. Synthesis maps your RTL into thousands of these cells.

🗂️

Gate-Level Netlist

The output of synthesis — a file listing every standard cell instance and every wire connection between them. It's still text (Verilog), but instead of behavioral code, it says things like: AND2_X4 U101 (.A(net1), .B(net2), .Y(net3)); — a specific AND gate with specific connections.

📚

Technology Library (.lib)

The catalog of all available cells at a specific process node and PVT condition. Contains: cell delay vs load tables, setup/hold times, leakage power, area in µm². The synthesis tool uses this to pick the right cell for each function and estimate timing. Think of it as the ingredient nutrition label.

📋

SDC (Synopsys Design Constraints)

A file where you tell the synthesis tool your timing requirements: how fast the clock is, when data arrives at inputs, when outputs must be ready. Without SDC, synthesis has no idea what "fast enough" means and will produce a correct but potentially very slow netlist.

📐

QoR (Quality of Results)

How good the synthesis output is, measured across multiple dimensions: timing (WNS/TNS), area (mm² or cell count), power (mW), and DRC violations. A good synthesis engineer optimizes all four simultaneously — improving one often hurts another.

📥

Inputs to Synthesis

RTL source files (.v / .sv / .vhd) — the behavioral description
Technology library (.lib / .db) — available cells and their properties
SDC constraints (.sdc) — timing requirements (clock, I/O delays)
UPF/CPF (optional) — low-power intent file

⚙️

What Synthesis Does

1. Parses and understands your RTL code (Elaboration)
2. Converts logic to technology-independent gates (Generic Mapping)
3. Maps those to real cells from your library (Technology Mapping)
4. Optimizes: minimize delay on critical paths, reduce area, lower power (Optimization)

📤

Outputs of Synthesis

Gate-level netlist (.v) — your RTL translated to real gates
Mapped SDC (.sdc) — constraints ready for PD tool
Reports — timing (WNS/TNS), area (mm²), power (mW), DRC violations
DDC database — for incremental re-runs

RTL-to-GDS Complete Chip Design Flow

RTL TO GDS II DESIGN FLOW

📌 Key Concept

Synthesis sits at the boundary of front-end and back-end design. The quality of synthesis directly impacts all downstream physical design steps — poor synthesis results in congestion, timing closure problems, and increased power.

1.2 Detailed Synthesis Flow

The synthesis flow transforms RTL into an optimized gate-level netlist through several distinct stages, each with specific goals and transformations.

SYNTHESIS STAGE-BY-STAGE FLOW

🔍 What Happens at Each Stage

Step 1 — Elaboration: The tool reads your Verilog files and "understands" your design — like a compiler parsing code. It figures out what each module does, how they connect, and what kind of logic is needed (registers, adders, FSMs).

Step 2 — Generic Mapping: Converts your design into a technology-independent intermediate form using GTECH (generic) gates — simple AND/OR/NOT/FF operations with no size or speed information yet. Boolean optimization happens here: constant propagation, dead code removal, logic simplification.

Step 3 — Technology Mapping: Now the tool looks at your target library (.lib file) and replaces each generic gate with an actual physical cell from that library. An AND2 becomes AND2_X4 (4x drive strength), a flip-flop becomes DFF_X1. This is where cell selection decisions are made.

Step 4 — Optimization: Iteratively improve the design. Fix timing violations by upsizing cells or restructuring logic. Reduce area by downsizing non-critical cells. Insert clock gating to save power. This phase runs many passes until WNS/TNS meets your target.

📋 Elaboration

Parses HDL source files, resolves module hierarchy, identifies registers, FSMs, and datapath elements. Builds an internal design representation (GTECH netlist) using generic logic cells independent of technology.

🗺️ Technology Mapping

Maps GTECH gates to cells from the target technology library (.lib). Uses pattern matching and tree-covering algorithms to find optimal cell selections that meet timing and area targets.

1.3 Synopsys Design Compiler (DC)

Design Compiler is the industry-standard synthesis tool from Synopsys. It supports hierarchical synthesis, compile strategies, and advanced optimization for timing, area, and power.

Key DC Commands

Command	Purpose	Key Options
read_verilog	Read RTL source files	-sv (SystemVerilog), file list
elaborate	Build design hierarchy	-parameters, -lib_work
link	Resolve all design references	Must be called after elaborate
compile_ultra	Full compile with all optimizations	-no_autoungroup, -timing_high_effort
compile	Basic compile	-map_effort [low/med/high], -incremental
report_timing	Timing path reports	-max_paths N, -slack_lesser_than 0
report_area	Area statistics	-hier (hierarchical breakdown)
report_power	Power analysis	-analysis_effort high
write_file	Output gate-level netlist	-format verilog -hierarchy -output
write_sdc	Write constraints file	-version 2.0
set_dont_touch	Protect cells from optimization	Apply to specific instances
check_timing	Validate timing constraints	Reports unconstrained paths

Sample DC Synthesis Script (.tcl)

TCL — DC Synthesis Script

## =========================================================
## DC Synthesis Script — sample_chip.tcl
## Project: sample_chip | Author: VLSI Engineer
## =========================================================

## 1. Setup target/link libraries
set target_library    "saed32nm_tt1p05v25c.db"
set link_library      "* $target_library"
set symbol_library    "saed32nm.sdb"

## 2. Read RTL sources
read_verilog -sv "../rtl/top.v ../rtl/core.v ../rtl/alu.v"

## 3. Elaborate and link design
elaborate    sample_chip
link
check_design

## 4. Apply timing constraints
read_sdc "../constraints/sample_chip.sdc"

## 5. Set operating conditions
set_operating_conditions "tt1p05v25c"

## 6. Compile with high effort
compile_ultra -no_autoungroup -timing_high_effort_script

## 7. Reports
report_timing  -max_paths 10 -slack_lesser_than 0 -nosplit > rpt/timing.rpt
report_area    -hier                                              > rpt/area.rpt
report_power   -analysis_effort high                              > rpt/power.rpt
report_qor                                                         > rpt/qor.rpt

## 8. Write outputs
write_file -format verilog -hierarchy -output "out/sample_chip_netlist.v"
write_sdc  -version 2.0                             "out/sample_chip_mapped.sdc"
write_file -format ddc -hierarchy -output  "out/sample_chip.ddc"

puts "=== Synthesis Complete ==="

Sample SDC Constraint File

SDC — Timing Constraints

## =========================================================
## SDC Constraint File — sample_chip.sdc
## =========================================================

## Clock definition
create_clock -name CLK -period 5.0 -waveform {0 2.5} [get_ports clk]

## Clock uncertainty (jitter + skew)
set_clock_uncertainty -setup 0.15 [get_clocks CLK]
set_clock_uncertainty -hold  0.05 [get_clocks CLK]

## Clock transition
set_clock_transition  0.1 [get_clocks CLK]

## Input delays (relative to CLK edge)
set_input_delay  -max 1.5 -clock CLK [get_ports data_in*]
set_input_delay  -min 0.2 -clock CLK [get_ports data_in*]

## Output delays
set_output_delay -max 1.2 -clock CLK [get_ports data_out*]
set_output_delay -min 0.1 -clock CLK [get_ports data_out*]

## Drive strength and load
set_driving_cell  -lib_cell BUFX4 [get_ports data_in*]
set_load 0.05 [get_ports data_out*]

## False paths (async reset, test ports)
set_false_path -from [get_ports rst_n]
set_false_path -from [get_ports scan_en]

## Multicycle path (2-cycle computation)
set_multicycle_path 2 -setup -from [get_cells mult_inst*]
set_multicycle_path 1 -hold  -from [get_cells mult_inst*]

## Max capacitance / transition constraints
set_max_capacitance 0.2 [current_design]
set_max_transition  0.4 [current_design]

1.4 Cadence Genus

Genus is Cadence's modern synthesis solution featuring concurrent optimization and a unified data model with Innovus for seamless handoff.

Key Genus Commands

Command	Purpose
read_hdl -language sv	Read SystemVerilog/Verilog/VHDL sources
elaborate	Elaborate and link design hierarchy
read_mmmc	Read multi-mode multi-corner view definition
syn_generic	Generic synthesis (technology-independent)
syn_map	Technology mapping to library cells
syn_opt	Incremental optimization (timing/area)
report timing	Report worst timing paths
report area	Report cell count and area
report power	Dynamic and leakage power
write_hdl	Write gate-level netlist
write_sdc	Write timing constraints

DC vs Genus Comparison

Feature	Synopsys DC	Cadence Genus
Vendor	Synopsys	Cadence
Script Language	TCL (dc_shell)	TCL / Innovus-compatible
Compile Command	compile_ultra	syn_opt
MMMC Support	Via scenario objects	Native via read_mmmc
PD Integration	ICC2 (write_icc2)	Innovus (write_db)
Physical Guidance	DC Topological	Physical Guidance Mode
Industry Usage	Dominant	Growing

1.5 Timing Constraints (SDC)

Clock & I/O Timing Waveform — SDC Constraints Visualized

This diagram shows the complete setup timing budget for a register-to-register path through an I/O port. All SDC constraint values map directly to regions on the waveform. The available combinational logic window = Period − input_delay − output_delay − setup_margin.

📐 Input Delay Explained

set_input_delay -max 1.5 -clock CLK [get_ports data_in*]

This tells the tool: upstream logic takes 1.5 ns of the clock period before data is valid at our input port. This is NOT a constraint we impose — it's a description of the external world. The tool uses it to compute the remaining time budget for our internal combinational logic. Tighter input delay = less margin for your combo path.

📐 Output Delay Explained

set_output_delay -max 1.2 -clock CLK [get_ports data_out*]

This says: the downstream chip needs our output to be valid 1.2 ns before its capture clock edge. The tool reserves this time from the end of the period. Together: available combo window = Period − input_delay − output_delay = 5.0 − 1.5 − 1.2 = 2.3 ns (before accounting for FF setup time and uncertainty).

Complete SDC Command Reference

SDC Command	Analysis Type	What It Models	Example
create_clock	Both	Defines clock signal: period, waveform shape, source pin. Foundation of all timing analysis.	create_clock -period 5 -waveform {0 2.5} [get_ports clk]
create_generated_clock	Both	Clock derived from master clock (PLL output, divider). Must be declared for STA to analyze crossing paths.	create_generated_clock -divide_by 2 -source clk [get_pins div_reg/Q]
set_clock_uncertainty	Both	Models jitter + skew + margin. Setup reduces required time. Hold adds to minimum required time.	set_clock_uncertainty -setup 0.15 -hold 0.05 [get_clocks CLK]
set_clock_transition	Both	Models clock rise/fall slew at the source. Affects clock cell delays in tree analysis.	set_clock_transition 0.08 [get_clocks CLK]
set_input_delay -max	Setup	Latest external data arrival relative to clock. Reduces time budget for internal combo logic.	set_input_delay -max 1.5 -clock CLK [get_ports din*]
set_input_delay -min	Hold	Earliest external data arrival. Used for hold analysis. Without -min, hold on input paths is unconstrained.	set_input_delay -min 0.2 -clock CLK [get_ports din*]
set_output_delay -max	Setup	Time before next clock edge downstream chip needs our output stable. Eats into our combo budget.	set_output_delay -max 1.2 -clock CLK [get_ports dout*]
set_output_delay -min	Hold	Minimum time downstream chip needs output stable after our clock. Constrains minimum combo path.	set_output_delay -min 0.1 -clock CLK [get_ports dout*]
set_false_path	Disable	Removes path from STA entirely. For async resets, test ports, clock MUX select pins — paths never race functionally.	set_false_path -from [get_ports rst_n]
set_multicycle_path	Both	Allows N-cycle propagation. ALWAYS pair setup with hold correction (N-1). Missing hold fix → hold violations.	set_multicycle_path 2 -setup -from [get_cells mul] set_multicycle_path 1 -hold -from [get_cells mul]
set_clock_groups	Async	Tells STA not to analyze paths between unrelated clocks. Essential for correct CDC handling in STA.	set_clock_groups -asynchronous -group {CLK_A} -group {CLK_B}
set_driving_cell	Setup	Models external driver strength at input ports. Without this, input transitions are ideal (zero-resistance). Affects input timing accuracy.	set_driving_cell -lib_cell BUFX4 [get_ports din*]
set_load	Setup	Models output port capacitive load (downstream PCB trace, other chip input). Affects output transition and delay.	set_load 0.05 [get_ports dout*]

💡 Pro Tip — Pre-CTS vs Post-CTS Uncertainty

Pre-CTS: Use set_clock_uncertainty -setup 0.15 to model total skew + jitter. This is pessimistic because skew is unknown.
Post-CTS: Switch to set_propagated_clock [all_clocks] in PrimeTime. The tool computes actual clock latencies through the synthesized clock tree. Only jitter uncertainty remains (typically 0.05–0.08 ns). This recovers significant timing margin — often 100–200 ps — that was previously modeled as skew pessimism.

1.6 Optimization Techniques

📐

Area Optimization

Logic sharing, constant folding, dead code elimination, cell downsizing. Minimize cell count and wire length. Use compile -map_effort high and set_max_area 0.

⏱️

Timing Optimization

Critical path restructuring, cell upsizing, buffer insertion, logic duplication for fanout reduction. Fix negative slack paths. Use compile_ultra -timing_high_effort_script.

⚡

Power Optimization

Clock gating insertion, operand isolation, multi-threshold voltage assignment (HVT/SVT/LVT), data activity propagation via switching activity files (SAIF).

Advanced Optimization Techniques

Technique	Description	Benefit
Retiming	Move registers across combinational logic to balance pipeline stages	Timing
Constant Propagation	Replace signals that are always 0/1 with constants; simplify downstream logic	Area
Logic Restructuring	Rearrange tree structures (AND/OR) to reduce critical path depth	Timing
Ungroup Hierarchy	Flatten sub-modules to enable cross-boundary optimization	Timing
Path Grouping	Group critical paths for prioritized optimization effort	Timing
Multi-Vt Assignment	Use HVT cells in non-critical paths, LVT on critical paths	Power+Timing

1.7 Quality of Results (QoR)

QoR is the overall measure of synthesis success across all objectives: timing, area, power, and design rule compliance. A good synthesis engineer tracks all four simultaneously — improving one often hurts another.

📊 Reading QoR Numbers — What Do They Mean?

After synthesis, you run report_qor and see numbers like WNS = −0.28ns, TNS = −15.4ns. Here is what that means:

WNS (Worst Negative Slack) = the single worst timing path in the design. −0.28ns means the most critical path is 0.28 nanoseconds too slow — data arrives 0.28ns after it needs to. This is the path you fix first.

TNS (Total Negative Slack) = the sum of all negative slacks across all violating paths. −15.4ns means if you added up all the violations, the total shortfall is 15.4ns of work to fix. A large TNS with a small WNS means many paths are slightly violated (broad problem). A large WNS with small TNS means one very bad path (focused problem).

Goal: WNS ≥ 0 AND TNS = 0 — every single timing endpoint must pass. Even one path at −0.001ns is a failure at sign-off.

WNS

Worst Negative Slack

The single most critical path. Must be ≥ 0 for timing sign-off. This is what you fix first — it sets your maximum achievable frequency.

TNS

Total Negative Slack

Sum of ALL negative slacks across all endpoints. Indicates total work remaining. Must be exactly 0 at sign-off. Large TNS = many violations to fix.

WHS

Worst Hold Slack

Most critical hold violation. Hold failures happen at ALL frequencies — they are structural problems fixed with delay buffers, not by lowering frequency.

QoR Improvement Checklist

Issue	Symptom	Fix	Priority
Setup violations	WNS < 0	Upsize cells, remove logic levels, add pipeline stage	P0
Hold violations	WHS < 0	Insert delay buffers on short paths	P0
High leakage power	report_power shows high static	Replace LVT with HVT on non-critical paths	P1
High dynamic power	Switching activity high	Enable clock gating, operand isolation	P1
DRC violations	Max cap/trans violations	Buffer high-fanout nets, fix transitions	P0
Large area	Area > target	set_max_area 0, use higher Vt cells	P2

SECTION 02

Physical Design

2.1 Introduction to Physical Design

🔧 What Is Physical Design and Why Does It Exist?

After synthesis you have a gate-level netlist — a text file listing cells and connections. But the foundry cannot manufacture a text file. They need a GDS II file — a precise geometric description of every shape of every metal layer at exact X,Y coordinates, measured in nanometers. Physical Design is the entire process of going from that netlist to GDS: figuring out where every cell physically sits on the silicon, how to distribute power to all of them, how to route every wire connecting them, and verifying it all meets manufacturing rules. PD is what transforms your design from "a list of logic" to "a physical chip that can be manufactured."

Physical Design converts a synthesized gate-level netlist into a manufacturing-ready GDS layout, determining the physical placement, power, clock, and routing of all cells.

PD Flow — Click each step to expand details

STEP 1 Floorplanning

▶

Define chip boundary (die area), place macros, I/O pins, and establish power domains. Sets utilization and aspect ratio constraints that guide all subsequent steps.

STEP 2 Power Planning

▶

Create VDD/VSS rings around the core and stripes across the die. Ensure low IR drop and EM-safe current densities throughout the power network.

STEP 3 Placement

▶

Place standard cells within the core area. Global placement minimizes wirelength. Detailed placement legalizes to rows. Timing-driven placement optimizes critical paths.

STEP 4 Clock Tree Synthesis (CTS)

▶

Build a balanced clock distribution network minimizing skew (difference in clock arrival times at flip-flops) and insertion delay. Uses buffers and inverters to drive all clocked elements.

STEP 5 Routing

▶

Connect all cell pins using metal interconnects. Global routing assigns regions. Detail routing assigns actual wires. Must satisfy all DRC rules (spacing, width, via enclosure).

STEP 6 Sign-Off Verification

▶

Run STA with parasitic extraction, DRC (design rule check), LVS (layout vs schematic), and IR drop analysis. All checks must pass before tape-out.

2.2 Floorplanning

🧱 Start Here — What Is a Chip Physically Made Of?

Before floorplanning makes sense, you need to understand what actually sits on a piece of silicon. A chip is a layered sandwich: a silicon substrate at the bottom, transistors built on top of it, then alternating layers of metal wires and insulating oxide on top. All of these layers together form the die. Floorplanning is the step where you decide where on that silicon each piece of logic goes.

Anatomy of a Chip Die — Every Term Explained

Every Floorplan Term — Explained From Scratch

① Die

The die (also called a chip) is a single rectangular piece of silicon cut from a wafer. Hundreds of identical dies are fabricated simultaneously on one 300mm wafer, then sawn apart. The die includes everything — pads, power rings, seal ring, and the core logic. Die area is measured in mm². Larger die = more expensive (cost scales roughly with area²).

② I/O Pads

I/O pads are the connection points between your chip and the outside world (PCB board, other chips). They live around the perimeter of the die. Each pad has:

• A signal pad — for data inputs/outputs (bidirectional, input-only, or output-only)
• A power pad — for VDD (positive supply) and VSS (ground). Multiple VDD/VSS pads are used because each pad can only carry limited current.

In your SDC file, get_ports refers to these pad signals. The set_input_delay and set_output_delay constraints model the timing from/to these pads.

③ Core Area vs Die Area

The core area is the inner rectangle where all your synthesized logic (standard cells + macros) lives. The die area is the full silicon including the pad ring around it.

Think of it like a room (core) inside a building (die). The walls of the building hold the doors (I/O pads). The room is where all the furniture (logic) goes. The gap between the room walls and the building walls is used for corridors (power rings, routing channels) — this gap is called the core-to-IO margin (typically 20–50 µm).

④ Standard Cell

A standard cell is a pre-designed, pre-characterized logic gate (AND, OR, NAND, flip-flop, buffer, mux, etc.) from the technology library. Every standard cell has:

• A fixed height (e.g., 12-track height in 7nm) — all cells in the same technology have the same height
• A variable width depending on complexity (a 4-input NAND is wider than a 2-input AND)
• VDD and VSS rails running along the top and bottom edges

Because all standard cells have the same height, they can be placed in rows like books on a shelf. The synthesis tool converts your Verilog RTL into thousands of standard cell instances. The PD tool then physically places them in the rows.

⑤ Hard Macro

A hard macro is a large, pre-designed block with a fixed physical layout — you cannot synthesize it or change its internals. Common examples:

• SRAM — On-chip memory. Your CPU's cache or register file. Designed by memory compilers with optimized bit-cell layout.
• PLL (Phase-Locked Loop) — Clock generator circuit. Analog design, cannot be synthesized.
• ROM — Read-only memory for boot code or lookup tables.
• Analog IP — ADC, DAC, SerDes PHY — all analog, all hard macros.

Hard macros are placed first during floorplanning, before any standard cells. Their position determines how efficiently the remaining logic can be placed and routed.

⑥ Macro Halo (Keepout Zone)

A macro halo is an exclusion zone around each hard macro where standard cells cannot be placed. Typical size: 2–5 µm on all sides.

Why? Because the macro's internal structure needs routing access around its edges (for signal and power connections). If standard cells are placed right up against the macro wall, the router has no room to route those connections — creating a routing deadlock.

It's like leaving a sidewalk around a building so people can walk to the entrance — if you park cars right up to the walls, nobody can get in.

⑦ Utilization

Utilization = how full is your core area with actual logic?

Utilization = (Total std cell area) / (Core area) × 100%

Why not 100%? Because you need space for:
• Routing channels (wires between cells)
• Clock buffers and power supply cells inserted during PD
• Filler cells and decap cells
• Spare cells for post-silicon ECO

Rule of thumb: 60–75% utilization is the sweet spot. Below 60% = die is wastefully large (costs more money per chip). Above 80% = routing becomes extremely congested and timing closure becomes very difficult or impossible.

⑧ Aspect Ratio

Aspect ratio = Core height / Core width. An aspect ratio of 1.0 means a perfect square core.

Most designs target 1:1 (square) because it minimizes average wire length (which minimizes delay and power). Non-square shapes are used when:
• I/O pad constraints require a specific shape (e.g., a chip with many memory interfaces on one side)
• Large hard macros naturally push the aspect ratio
• The package dictates the die shape

Extreme aspect ratios (e.g., 3:1 — very tall and thin) cause problems: clock distribution becomes unbalanced, wire lengths increase, and some areas become routability bottlenecks.

Utilization — Visualized

Formulas with Worked Example

Core Utilization

Utilization = (Total Standard Cell Area) / (Core Area) × 100%

Core Area from Target Utilization

Core Area = Total Cell Area / Target Utilization

Aspect Ratio

AR = Core Height / Core Width (1.0 = square)

📐 Worked Example — How to Size a Floorplan

Given: Synthesis reports total cell area = 4.8 mm². Target utilization = 70%. Preferred square core.

Step 1: Core Area = 4.8 / 0.70 = 6.86 mm²
Step 2: For AR = 1.0 (square): Width = Height = √6.86 = 2.62 mm × 2.62 mm
Step 3: Add core-to-IO margin (say 40 µm each side): Die = (2.62 + 0.08) × (2.62 + 0.08) = 2.70 mm × 2.70 mm
Step 4: Verify: Do the hard macros (SRAMs) fit? If SRAM is 1.2 mm × 0.8 mm + halo, it needs ~1.25 mm × 0.85 mm footprint — this fits in the 2.62mm core.

Floorplan Parameters — With Full Explanation

Parameter	Typical Value	What It Means	If You Get It Wrong
Core Utilization	60–75%	Percentage of core area filled with standard cell logic. The rest is routing space + buffers.	>80%: router can't fit all wires → routing overflow → unrouteable design. <50%: die is larger than needed → higher cost per chip.
Aspect Ratio	1:1 (square)	Core height divided by core width. 1.0 = perfect square. Controls the shape of the chip.	Extreme ratios (3:1) make clock distribution and power delivery much harder. I/O pad count may also force non-square shapes.
Core-to-IO margin	20–50 µm	The gap between the outer edge of the core and the inner edge of the I/O pad ring. Used for power rings (VDD/VSS) and routing channels to connect pads to core logic.	Too narrow: power rings don't fit, I/O connections cannot be routed. Too wide: wastes die area.
Macro halo	2–5 µm	The empty forbidden zone around each hard macro where NO standard cells are placed. Required to leave room for the macro's own routing connections.	Without halo: standard cells crowd the macro edges → router cannot access macro pins → open circuits in the layout (LVS failures).

2.3 Power Planning

🔌 Start Here — Why Does Every Cell Need VDD and VSS?

Every single logic gate needs two power connections: VDD (positive supply, e.g. 1.0V) and VSS (ground, 0V). Without these, no transistor can switch. A chip with 10 million cells all simultaneously drawing current needs a power delivery highway network. Power planning builds this network. Get it wrong → cells starve for power → they slow down → timing failures.

🔴

VDD — Power Supply Rail

The positive voltage rail. At 28nm ≈ 0.9–1.05V, at 5nm ≈ 0.65–0.8V. Every cell's PMOS transistors connect here. When VDD wire has resistance, current causes a voltage drop — cells at the far end see less than VDD → slower switching → potential timing violations.

🔵

VSS — Ground Rail

The 0V reference. Every cell's NMOS transistors return current to VSS. Must also be low-resistance — a "bouncing" VSS from high return currents can cause ground bounce noise that flips logic states erroneously (functional failure!).

🏗️

The Power Delivery Hierarchy

Current path: PCB → Package pins → Solder bumps/Bond wires → I/O pads on die edge → Power rings (thick metal rings around core) → Power stripes (wide wires criss-crossing the core on upper metals) → Power rails inside std cell rows → Individual cell VDD/VSS pins. Each step adds resistance.

⚡

Decap Cells

Decoupling capacitor cells placed between VDD and VSS in empty spaces. They act as local charge reservoirs — when many cells switch simultaneously and demand a sudden surge of current, the decaps supply it instantly without waiting for current to travel from far-away pads. Reduces dynamic IR drop peaks.

IR Drop — Voltage Lost Along the Wire

V_drop = I (current drawn by cells) × R_metal (wire resistance) → Cell sees VDD − V_drop

Wire Resistance — Why Wider = Better

R = ρ × L / (W × T) where L=length, W=width, T=thickness, ρ=metal resistivity. Double W → half R → half IR drop

Electromigration — Maximum Safe Current Density

J = I / A must be < J_max from Black's equation: J_max = A × e^(-Ea/kT). Exceed this → wire fails in product lifetime

⚠️ Consequence of IR Drop — A Real Example

If VDD drops 10% (1.0V → 0.9V) in a hot corner of the chip, transistors in that region become ~20% slower. Paths that barely meet 5ns timing now take 6ns → new setup violations appear at sign-off that weren't visible in pre-IR-drop STA. This is why IR drop is a mandatory sign-off check — STA without IR drop is not sign-off quality.

Power Grid Topology

⚡ IR Drop

Voltage reduction along the power rail due to resistive metal. Static IR drop: DC current × metal resistance. Dynamic IR drop: transient switching currents cause instantaneous dip. Must keep < 5–10% of VDD.

🌊 Electromigration (EM)

Gradual movement of metal atoms due to electron flow (current density). Causes open circuits over time. Limit: J < Jmax for each metal segment. Wider wires or via arrays reduce EM risk.

2.4 Placement

📍 Start Here — What Exactly Gets Placed?

After synthesis, you have a gate-level netlist — a list of cells (AND2, DFF, BUF...) and wires connecting them. But they have no physical location yet — it's like having all the components of a city but no map showing where each building goes. Placement decides the X,Y coordinates of every standard cell inside the core area. This decision is critical: cells that are logically connected should be physically close → shorter wires → less resistance and capacitance → faster timing → less power. Bad placement = long wires everywhere = timing closure becomes nearly impossible.

What Is a Standard Cell Row? (The Grid Cells Sit In)

📏 Standard Cell Rows — The Shelf System

The core area is divided into horizontal strips called rows, all of the same height (determined by the technology node, e.g. 0.27µm tall at 28nm). Every standard cell has exactly this height, so all cells snap perfectly into rows — like books on a shelf. Each row has VDD and VSS power rails running horizontally through it. Cells in adjacent rows are flipped upside down so they share the power rails between rows — this halves the number of power stripes needed. Cells must be placed aligned to the row grid AND to a horizontal placement site (typically 0.09µm pitch). Any deviation = legalization violation.

Before Placement

After Legal Placement

Stage	Description	Key Metric
Global Placement	Distributes cells across core to minimize total wirelength. Cells may overlap temporarily.	HPWL (half-perimeter wire length)
Legalization	Moves cells to legal rows, removes overlaps, snaps to row grid.	Cell displacement from global
Detailed Placement	Local cell swaps and moves to improve timing and routing.	WNS improvement
Congestion Reduction	Spread cells in congested areas, use placement blockages.	Routing overflow %

2.5 Clock Tree Synthesis (CTS)

⏰ Start Here — The Clock Distribution Problem

Your chip has one clock source (e.g. a PLL output) but potentially millions of flip-flops that all need that clock signal. You cannot connect one wire from the source to every FF — a single wire driving 1,000,000 FFs would have astronomical capacitance → extremely slow transitions → the clock would barely toggle. Also, if the wire is very long, the signal takes different amounts of time to reach FFs at different corners of the die → clock skew — some FFs see the clock edge nanoseconds after others, which breaks timing. CTS solves this by building a tree of buffers that fans out the clock progressively, like a branching river delta, ensuring every FF gets a clean, fast clock edge at approximately the same time.

💡 What Is a Clock Buffer and Why Is It Needed?

A clock buffer is a standard cell with one input (the incoming clock) and one output (a buffered copy of the clock). Its job: take a weak, slightly degraded clock signal and reproduce it as a clean, strong signal that can drive many downstream loads. Without buffers: one wire from the PLL driving 100,000 FFs would have total capacitance of ~50pF → the clock signal would have a 5ns rise time → the "edge" would be a slow ramp instead of a sharp transition → setup and hold times cannot be met → chip fails. Clock buffers are inserted every few cells in the tree to keep the clock transition times under ~100ps at every FF.

Key CTS Concepts — From Scratch

🌳

What Is a Clock Tree?

A tree of clock buffers starting at the clock source and branching out to all flip-flop clock pins. Each level buffers the signal and drives the next level. A typical design might have 4–8 levels of buffering. The root drives 2–4 branches, each branch drives more sub-branches, eventually reaching individual FFs. The tree ensures signal integrity (clean transitions) and balances arrival times.

📐

Clock Skew

The difference in clock arrival time between any two flip-flops. If FF_A receives the clock edge at 1.00ns and FF_B at 1.25ns, the skew is 0.25ns. Skew matters because: positive skew relaxes setup but tightens hold; negative skew tightens setup. CTS target: local skew < 50ps, global skew < 200ps.

📏

Clock Insertion Delay (Latency)

The time from the clock source to a flip-flop's clock pin, through all the buffers and wires of the clock tree. Typical values: 0.3–1.5ns depending on design size and node. Latency itself doesn't cause problems — it's the difference in latency between FFs (skew) that causes timing issues.

🎯

Clock Uncertainty

A timing margin added to clock edges to account for: Jitter (cycle-to-cycle variation from PLL noise, typically 50–100ps), Skew (modeled as uncertainty pre-CTS), and extra guardband. Applied in SDC as set_clock_uncertainty. Post-CTS, skew is captured by propagated clock latencies, so only jitter+guardband remain.

Skew

Δ in clock arrival between FFs

Target: <50ps (local), <200ps (global)

Latency

Clock insertion delay

Source → FF clock pin delay

Uncertainty

Jitter + skew margin

Applied as timing margin in STA

Key CTS Commands

Command	Tool	Purpose
ccopt_design	Innovus	Run CTS with concurrent optimization
set_ccopt_property	Innovus	Set CTS target skew, latency targets
clock_opt	ICC2	Run clock tree optimization
set_clock_tree_options	ICC2	Configure CTS parameters
report_clock_tree	Both	Report skew, latency, buffer count

2.6 Routing

🔗 Start Here — What Is Routing?

After placement, you know where every cell sits, but the cells are still disconnected — like buildings in a city with no roads between them. Routing draws the actual metal wires that connect every cell pin to every other cell pin according to the netlist. A modern chip might have 50–200 million net connections to route. The wires must: (1) actually connect what the netlist says, (2) satisfy all foundry DRC rules, (3) minimize wire length (affects timing and power), and (4) not cause crosstalk noise. This is done in two phases: global routing (plan the routes) and detailed routing (draw the actual shapes).

Metal Layers — Why Multiple Layers Exist

🏗️ Why Do Chips Have Multiple Metal Layers?

If you had only one metal layer, wires couldn't cross without shorting — like a city with only one road that can never have intersections. Multiple metal layers (M1, M2, M3… up to M15+ at advanced nodes) solve this: each layer's wires run in one direction (alternating horizontal/vertical), and vias connect between layers wherever a wire needs to change layers or connect to another wire. Lower metals (M1, M2) have thin, tight-pitch wires for local connections. Upper metals (M8+) have wide, coarse wires for global signals, power, and clock distribution — they carry more current and span longer distances.

What Is a Via? How Do Layers Connect?

🔩

Via — The Vertical Connection

A via is a small metal pillar that connects two adjacent metal layers vertically. Via between M1 and M2 is called V1 (Via 1). Between M2 and M3 is V2, etc. Vias have limited current capacity — high-current nets need via arrays (many parallel vias) to prevent electromigration. A missing or broken via = open circuit = LVS failure.

↔️

Preferred Routing Direction

Each metal layer has a preferred routing direction: M1 vertical, M2 horizontal, M3 vertical, M4 horizontal... (alternating). This orthogonal arrangement minimizes parallel-running wires on adjacent layers (reduces coupling capacitance / crosstalk). Routing against preferred direction is allowed but penalized by the router.

📏

Routing Track

The routing grid on each metal layer is divided into tracks — parallel lines at the minimum wire pitch. Each track can hold one wire. The number of tracks in a routing channel = available routing resources. When more wires need to cross a region than there are tracks → routing overflow / congestion → DRC violations or unroutable design.

Metal Layer Stack (Color-Coded)

DRC Rule	Definition	Violation Impact
Spacing	Minimum distance between same-metal parallel wires	Short circuit risk, manufacturing defects
Width	Minimum wire width per metal layer	Higher resistance → IR drop, EM failure
Via enclosure	Metal must extend beyond via by minimum amount	Broken via connection on manufacturing variation
Antenna	Limits ratio of metal area to gate oxide area	Gate oxide damage during plasma etch
Density	Min/max metal fill requirements per layer	CMP non-uniformity → dishing/erosion

2.7 Physical Verification

🔍

DRC — Design Rule Check

Verifies the layout satisfies all foundry manufacturing rules (spacing, width, enclosure, density). Zero DRC violations required for tape-out. Tool: Calibre DRC, Mentor.

⚖️

LVS — Layout vs Schematic

Compares extracted netlist from layout with the reference schematic. Verifies all connections are correct and no opens/shorts were introduced during PD. Tool: Calibre LVS.

🛡️

ERC — Electrical Rule Check

Checks for floating nodes, unconnected power/ground, improper biasing, ESD violations, latchup risk areas. Ensures circuit will function correctly electrically.

Common DRC Violation Examples

SPACING VIOLATION

WIDTH VIOLATION

2.8 PD Tool Knowledge

Command	Purpose
read_db	Import design from Genus (unified data model)
init_design	Initialize design with LEF/DEF/SDC
floorPlan	Define die/core size and utilization
addRing / addStripe	Create power rings and stripes
place_design	Run global and detailed placement
ccopt_design	Concurrent CTS and optimization
routeDesign	Global + detailed routing
extractRC	Parasitic extraction (RC)
timeDesign	In-tool timing analysis
streamOut	Generate GDS II for tape-out

Command	Purpose
open_lib / open_block	Open design library and block
initialize_floorplan	Set die, core area, utilization
create_net_shape	Create power network shapes
connect_pg_net	Connect power/ground nets
place_opt	Placement with optimization
clock_opt	CTS with timing optimization
route_auto	Automatic global + detail routing
route_opt	Post-route optimization
write_gds	Output GDS stream
report_design	Design statistics and QoR

Feature	Cadence Innovus	Synopsys ICC2
Synthesis Handoff	Genus (write_db → read_db)	DC (write_icc2)
CTS Command	ccopt_design	clock_opt
Script Format	TCL / Encounter-style	TCL / IC Compiler style
STA Integration	Tempus (native)	PrimeTime (GoldRoute)
EM/IR Analysis	Voltus	StarRC + RedHawk
DRC/LVS	Calibre in-design	Calibre / IC Validator
Market Position	Strong	Strong

SECTION 03

Static Timing Analysis

3.1 Introduction to STA

🕐 Start Here — Why Does Timing Matter At All?

Every digital circuit operates on a clock — a signal that ticks millions or billions of times per second. On each tick, every flip-flop captures whatever data is at its input at that exact moment. The fundamental question STA answers is: "Does the data have enough time to travel from one flip-flop to the next between two consecutive clock ticks?" If yes → the chip works. If no → the chip captures wrong data → functional failure. STA checks this for every single one of the potentially millions of paths in your design.

Key Terms — Defined Before We Go Further

🔲

Flip-Flop (Register)

A memory element that captures (stores) its input D on the rising edge of the clock and presents it at output Q. Every register in your design is a flip-flop. Synthesis maps Verilog always @(posedge clk) blocks to flip-flop cells from the library.

🔀

Combinational Logic

Logic gates (AND, OR, NAND, MUX, adders…) between flip-flops. No memory — output depends only on current inputs. Data has to propagate through these gates within one clock period. The more gates in the path, the longer it takes, the lower the maximum frequency.

⏰

Clock Period

The time between two rising clock edges. A 1 GHz clock has period = 1ns. A 500 MHz clock has period = 2ns. All combinational logic between two FFs must finish within one period (minus setup time and clock uncertainty). The period sets your timing budget.

📍

Timing Path

The route data travels from a starting point (a FF output or input port) through combinational gates to an ending point (a FF input or output port). STA measures the propagation delay of every timing path and checks it against the constraint.

📊

Slack

Timing margin = Required Time − Actual Arrival Time. Positive slack (+) = data arrives early enough — timing is MET. Negative slack (−) = data arrives too late — timing VIOLATED. Goal: all slacks ≥ 0 at sign-off.

🏁

Critical Path

The timing path with the worst (most negative or least positive) slack. This is the bottleneck that limits your maximum operating frequency. Fixing the critical path is the primary goal of timing closure. WNS (Worst Negative Slack) = the slack of the critical path.

✅

Why STA over Dynamic Simulation?

Dynamic simulation requires test vectors, is slow, and may miss rare corner-case paths. STA analyzes ALL paths statically in minutes, covering 100% of the design space including paths with near-zero functional probability. A design with 1M flip-flops has trillions of possible paths — only STA can check them all.

⚠️

STA Limitations — What It Cannot Catch

STA cannot catch functional logic bugs (wrong RTL behavior), doesn't simulate dynamic power behavior, and requires correctly specified constraints (garbage-in garbage-out). False paths and multicycle paths must be explicitly declared by the engineer — STA trusts what you tell it.

Setup Time & Hold Time Waveforms

3.2 Timing Path Types

🗺️ What Is a Timing Path?

A timing path is the route a signal takes through combinational logic from where it starts (a startpoint) to where it's captured (an endpoint). Startpoint = either a flip-flop's clock pin (Q output launches data) or an input port (external data enters the chip). Endpoint = either a flip-flop's data pin (D input captures data) or an output port (data leaves the chip). Everything in between is combinational logic: AND gates, OR gates, adders, muxes, inverters — all the logic that computes the result. The sum of all gate delays + wire delays along this path is the path delay that STA measures.

STA analyzes 4 fundamental path types in digital circuits. Every timing path has a startpoint (port or FF clock pin) and endpoint (FF data pin or output port).

PATH TYPE 1: INPUT → REGISTER

PATH TYPE 2: REGISTER → REGISTER (Most Common)

PATH TYPE 3: REGISTER → OUTPUT

PATH TYPE 4: INPUT → OUTPUT (Combinational)

3.3 Setup & Hold Slack Analysis

Setup Slack

Slack_setup = (T_clock – T_setup – T_cq_launch – T_combo) – T_arrival

Setup Required Time

T_required = T_clock_edge + T_clk_latency_capture – T_clock_uncertainty_setup – T_setup

Hold Slack

Slack_hold = T_arrival – (T_clk_latency_capture + T_hold)

✅ Positive Slack (MET)

Data arrives before required time. Extra margin available. Setup: Slack = +0.3ns means 300ps of timing margin. Design passes. No action needed.

slack: +0.350 ns (MET)

❌ Negative Slack (VIOLATED)

Data arrives AFTER required time. Setup violation = data might not be captured correctly. Must fix before tape-out. Hold violation = data changes too fast.

slack: -0.120 ns (VIOLATED)

Sample Timing Report (PrimeTime Format)

TIMING REPORT — Setup Analysis

===========================================================
Path Type       : max (Setup)
Point                              Incr       Path
===========================================================
--- Input Port ---
clock CLK (rise edge)              0.000      0.000
clock network delay (ideal)         0.500      0.500
FF1/CK                              0.000      0.500 r
--- Data Path (Launch) ---
FF1/Q       (DFF_X2/Q)               0.120      0.620 r
U101/Y      (AND2_X4/Y)              0.085      0.705 r
U102/Y      (OAI21_X2/Y)             0.110      0.815 f
U103/Y      (INV_X4/Y)               0.062      0.877 r
U104/Y      (BUF_X8/Y)               0.075      0.952 r
FF2/D                                0.000      0.952 r
data arrival time                                0.952
--- Capture Edge ---
clock CLK (rise edge)              5.000      5.000
clock network delay (propagated)    0.510      5.510
FF2/CK                              0.000      5.510 r
library setup time                 -0.085     5.425
data required time                               5.425
-----------------------------------------------------------
data required time                               5.425
data arrival time                               -0.952
-----------------------------------------------------------
slack (MET)                                       4.473
===========================================================

3.4 Clock Domain Crossing (CDC)

CDC occurs when a signal crosses from one clock domain to another. This creates a risk of metastability — the output of a flip-flop remains at an indeterminate voltage level for an unpredictable time if setup/hold requirements are violated during the crossing.

2-FF Synchronizer (Most Common Fix)

CDC Signal Crossing Waveform

CDC Violation Type	Description	Fix
Single-bit crossing (no sync)	Flip-flop driven by different clock without synchronizer	Add 2-FF synchronizer
Multi-bit bus crossing	Multiple bits cross independently — may sample incoherent values	Use gray code, handshake, async FIFO
Fast-to-slow domain	Source clock faster; receiving domain may miss pulses	Pulse stretcher + synchronizer
Reconvergence	Two paths from different domains merge — non-deterministic glitch	Re-synchronize before combining

3.5 On-Chip Variation (OCV) & AOCV

Real silicon has spatial and temporal variation in process, voltage, and temperature (PVT). OCV models capture that cells on the same die can behave differently from each other.

⚙️

Process Corners

FF — Fast NMOS, Fast PMOS. Cells are fast. Best-case for timing.

TT — Typical-Typical. Nominal design point.

SS — Slow NMOS, Slow PMOS. Worst-case for setup timing.

🌡️

Voltage & Temperature

Low voltage + high temp = slow cells (worst setup). High voltage + low temp = fast cells (worst hold). Temperature inversion: at advanced nodes (<65nm) speed increases with temperature in some conditions.

📊

Derating

Apply derating factors to account for OCV. Late (slow) path: multiply by 1.05 (5% slower). Early (fast) path: multiply by 0.95 (5% faster). Creates pessimistic timing margin.

Method	Description	Accuracy	Pessimism
OCV (flat derating)	Apply fixed derate to all paths equally	Medium	High
AOCV (Advanced)	Derate based on depth (number of cells in path). Longer paths have more statistical averaging → less pessimism	High	Medium
POCV (Parametric)	Full statistical model using σ distributions for each cell. Most accurate	Highest	Low

3.6 Multi-Mode Multi-Corner (MMMC)

Modern designs must meet timing across multiple operating modes (functional, scan, standby) AND multiple PVT corners simultaneously. MMMC analysis runs all combinations in one pass.

Corner Name	Process	Voltage	Temp	Analysis Type	Purpose
func_slow	SS	0.9V	125°C	Setup	Worst-case functional timing (setup closure)
func_fast	FF	1.1V	-40°C	Hold	Worst-case hold (fast paths cause hold violations)
func_typical	TT	1.0V	25°C	Both	Nominal analysis for power estimation
scan_slow	SS	0.9V	25°C	Setup	Scan shift timing at slow corner
hold_fast	FF	1.2V	-40°C	Hold	Extreme hold analysis for ECO coverage

3.7 Synopsys PrimeTime

PrimeTime (PT) is the industry-standard sign-off STA tool. It uses accurate parasitic data (SPEF) from the extracted layout for final timing certification.

Key PrimeTime Commands

Command	Purpose
read_netlist	Read gate-level netlist from PD tool
read_sdc	Apply timing constraints (SDC)
read_parasitics	Load extracted parasitics (SPEF file)
set_operating_conditions	Set PVT corner for analysis
update_timing	Propagate timing through all paths
report_timing	Print timing paths (worst paths)
report_constraint	Report all violated constraints
check_timing	Validate constraint coverage (unconstrained paths)
report_global_timing	Summary: WNS, TNS, WHS, THS
pt_shell -file	Run PrimeTime in batch mode

Sample PrimeTime Script

TCL — PrimeTime Sign-off Script

## PrimeTime Sign-off Script
set_app_var search_path [". /tech/saed32nm/db"]
set_app_var target_library "saed32nm_ss0p9v125c.db"
set_app_var link_library   "* $target_library"

## Read design
read_netlist    "./out/chip_final.v"
link_design     chip_top

## Constraints and parasitics
read_sdc        "./out/chip_final.sdc"
read_parasitics -format spef "./out/chip.spef"

## PVT corner
set_operating_conditions "ss0p9v125c"

## Enable OCV derating
set_timing_derate -late  1.05 -cell_delay
set_timing_derate -early 0.95 -cell_delay

## Update timing
update_timing -full

## Reports
report_timing        -max_paths 20 -slack_lesser_than 0   > rpt/vio_setup.rpt
report_timing  -delay min -max_paths 20 -slack_lesser_than 0   > rpt/vio_hold.rpt
report_constraint    -all_violators                           > rpt/all_vio.rpt
report_global_timing -significant_digits 3                    > rpt/global.rpt
check_timing                                                   > rpt/check.rpt

3.8 Cadence Tempus

Feature	Synopsys PrimeTime	Cadence Tempus
Industry Status	Gold Standard Sign-off	Challenger / Growing
MMMC	Via scenario manager	Native MMMC (view definitions)
ECO Flow	PT-ECO + write_changes	Native ECO (eco_opt_design)
Innovus Integration	Via StarRC/Signoff	Seamless (same data model)
POCV Support	Yes (POCV derating)	Yes (SOCV)
Primary Use	Sign-off timing	In-design + sign-off

3.9 Timing Closure Techniques

Fixing Setup Violations

Technique	Method
Cell Upsizing	Replace slow cell with larger drive strength version (X4 → X8)
Buffer Insertion	Split long wire into shorter segments with buffers
Logic Restructuring	Reduce logic depth on critical path by rearranging gate tree
Floorplan Change	Move source/sink cells closer to reduce wire delay
Retiming	Move registers to balance pipeline stages
Frequency Reduction	Last resort: lower clock frequency (increase period)

Fixing Hold Violations

Technique	Method
Buffer Insertion	Insert delay buffers (delay cells) on short paths to add delay
Cell Downsizing	Replace fast (LVT) cell with slower (HVT) version
Wire Stretching	Make path wire longer to add RC delay
Clock Skewing	Intentionally skew clock to give more hold margin

ECO (Engineering Change Order) Flow

📌 What is ECO?

ECO is a controlled method to make targeted netlist changes after synthesis or tape-out to fix timing, functional bugs, or sign-off issues. It modifies only the affected cells/nets, preserving the rest of the design.

SECTION 04

Interview Prep & Quick Reference

Synthesis Interview Questions (Top 30)

1. What is logic synthesis and what are its inputs and outputs?

Logic synthesis is the process of converting an RTL (Register Transfer Level) hardware description into a gate-level netlist optimized for a target technology.

Inputs: RTL code (Verilog/VHDL/SystemVerilog), technology library (.lib/.db), timing constraints (.sdc), design rules.
Outputs: Gate-level netlist (.v), mapped SDC, timing/area/power reports, DDC database.

2. What is the difference between compile and compile_ultra in Design Compiler?

compile: Standard compile with basic optimization. Limited effort. Options: -map_effort [low/medium/high], -incremental for re-optimization of existing netlist.

compile_ultra: Advanced optimization including retiming, adaptive body biasing, path-based analysis. Enables -no_autoungroup (prevents flattening) and -timing_high_effort_script. Significantly better QoR at the cost of longer runtime. Used in production flows.

3. What is a technology library (.lib file)? What does it contain?

A .lib (Liberty) file characterizes every standard cell in the technology at specific PVT conditions. It contains:

Cell delay tables (input transition vs output load)
Setup/hold times for sequential cells
Leakage and dynamic power values
Area in technology units
Pin capacitances, max fanout, max transition limits
Function description (Boolean)

Multiple .lib files cover different PVT corners (ss125c, tt25c, ff-40c).

4. What is a false path? Give an example of when you would use set_false_path.

False path: A timing path that exists in the netlist but is not functionally active — it will never carry real data during operation, so timing should not be analyzed on it.

Examples:

Asynchronous reset/set ports: set_false_path -from [get_ports rst_n]
Scan test mode paths (active only during test, not functional operation)
Paths between mutually exclusive clocks that never switch simultaneously
Configuration pins written once at startup

Note: Incorrectly setting false paths can hide real timing problems. Use with care.

5. What is a multicycle path? How is set_multicycle_path used?

A multicycle path is one that is intentionally designed to take more than one clock cycle to propagate. This relaxes the timing constraint on that path.

Example: A multiplier that takes 2 clock cycles:
set_multicycle_path 2 -setup -from [get_cells mult_inst/reg*]
set_multicycle_path 1 -hold -from [get_cells mult_inst/reg*]

The -hold must be explicitly set to (N-1) to avoid hold violations introduced by the relaxed setup. Failure to set hold correction is a very common bug.

6. What is clock gating and why is it used?

Clock gating reduces dynamic power by stopping the clock to a flip-flop or a group of flip-flops when their output is not needed. Instead of clocking an FF every cycle (wasting power toggling), a gating condition (enable signal) controls whether the clock reaches the FF.

Implementation: Synthesis tools insert ICG (Integrated Clock Gating) cells which are AND/OR-latch combinations that suppress the clock edge cleanly without glitches. Reduces dynamic power by 20–40% in typical designs.

7. What is the difference between WNS, TNS, and WHS?

WNS (Worst Negative Slack): The most negative setup slack in the design. Represents the single worst timing path. Must be ≥ 0 at sign-off.

TNS (Total Negative Slack): Sum of all negative slacks across all endpoints. Indicates the total amount of timing work needed. WNS=0 but TNS<0 means many marginal paths.

WHS (Worst Hold Slack): The most negative hold slack. Indicates the worst hold violation. Must also be ≥ 0. Fixed by inserting delay buffers on short paths.

8. What is retiming in synthesis?

Retiming moves registers (flip-flops) across combinational logic boundaries without changing the circuit's functional behavior. It balances pipeline stages to improve frequency.

Example: If Stage 1 has 3ns of logic and Stage 2 has 1ns, retiming moves a register to equalize ~2ns each, doubling achievable frequency. The tool handles the mathematical transformation automatically. Enabled via compile_ultra in DC.

9. What is operand isolation in power optimization?

Operand isolation prevents switching activity on functional units (like adders, multipliers) when their outputs are not being used. An AND gate or mux is inserted at the inputs of the datapath block, driven by the enable signal. When disabled, all inputs are forced to 0, preventing glitches from propagating through the combinational logic and reducing switching power significantly.

10. What happens during elaboration in synthesis?

Elaboration parses the HDL source files and builds an internal design representation (GTECH netlist — technology-independent generic gates). During elaboration:

Module hierarchy is constructed
Parameters/generics are resolved to constants
FSMs are identified and optionally encoded
Registers, memories, operators (+, *, >>) are mapped to GTECH primitives
Design rule checks (unconnected ports, latches vs FFs) are performed

The check_design command after elaboration reports any issues.

11. What is set_dont_touch and when do you use it?

set_dont_touch prevents DC from optimizing, resizing, or removing a specific cell or net. Use cases:

Protect manually sized critical cells from being downsized
Preserve specific clock buffers needed for DFT
Protect cells needed for post-silicon debug/observation points
Guard hand-placed analog boundary interface cells

Over-use of set_dont_touch can degrade QoR by blocking legitimate optimizations.

12. What is the difference between target_library and link_library?

target_library: The technology library whose cells DC will USE when mapping the design. These are the cells that appear in the output netlist.

link_library: Libraries used to RESOLVE module references during linking. Includes "*" (current design) + all .db files. Needed so DC can find instantiated sub-modules and external IPs. A cell can be in link_library but not target_library — it gets resolved but DC won't use it for new cells.

13. What is ungroup and when should you use it during synthesis?

ungroup flattens a sub-module into its parent, removing the hierarchical boundary. This allows DC to optimize logic across that boundary (e.g., constant propagation from parent into child, logic sharing between siblings).

Use when: Sub-module boundaries prevent critical optimization. In compile_ultra, the -no_autoungroup flag disables DC's automatic ungrouping. Manual ungrouping is done before compile: ungroup -all -flatten. Tradeoff: loses hierarchy for debug and incremental compile benefits.

14. What is scan insertion and how does synthesis handle it?

Scan insertion (Design for Test, DFT) replaces regular flip-flops with scan flip-flops (SFF) that have an additional scan data input (SI) and scan enable (SE). During test mode, all SFFs form a chain allowing external test patterns to be shifted in and captured results shifted out.

In synthesis: After compile, insert_dft and preview_dft commands handle scan. The SDC must set false paths on scan paths (set_false_path -from [get_ports scan_en]). Scan adds ~5–10% area overhead.

15. What is the significance of set_max_area 0 in DC?

set_max_area 0 tells Design Compiler to minimize area as much as possible (target = 0 means "minimize"). DC will aggressively use smaller cells, share logic, and apply area recovery techniques after meeting timing. Setting this to 0 doesn't mean area will be 0 — it's a directive to minimize. Without this command, DC may leave unused area if timing is met. Always set after timing constraints are applied so timing takes priority.

16. What are HVT, SVT, and LVT cells? How are they used in synthesis?

Multi-threshold voltage cells on the same process node:

LVT (Low Vt): Fast switching, but high leakage. Used on critical timing paths.
SVT (Standard Vt): Balanced. General use cells.
HVT (High Vt): Slow, but very low leakage. Used on non-critical paths to reduce standby power.

Strategy: Use LVT to fix WNS on critical paths; replace non-critical LVT cells with HVT to recover power. DC can perform multi-Vt optimization automatically when multiple .lib corners are provided.

17. What is GTECH in Design Compiler?

GTECH (Generic Technology) is Synopsys's internal, technology-independent logic library used as an intermediate representation during synthesis. After elaboration, the design is mapped to GTECH primitives (GTECH_AND2, GTECH_FD1, etc.) before technology mapping to the target library. GTECH allows Boolean optimization without technology-specific constraints. The check_design on a GTECH netlist catches structural issues before committing to technology mapping.

18. What is the purpose of set_clock_uncertainty?

set_clock_uncertainty adds a timing margin to account for:

Jitter: Cycle-to-cycle variation in clock edge arrival (PLL jitter)
Skew: Spatial variation in clock arrival (before CTS; post-CTS uses propagated clocks)
Margin: Extra guardband for post-silicon variation

Pre-CTS: set_clock_uncertainty models all uncertainty.
Post-CTS: Usually only jitter+margin, as skew is captured in propagated clock latencies. Setup and hold have separate uncertainty values.

19. What is path grouping in synthesis optimization?

Path grouping organizes timing paths into groups so DC can apply targeted optimization effort. Each group can receive different weights and effort. Default groups: REGOUT (reg-to-output), REGIN (input-to-reg), COMBO (combinational), and per-clock groups.

group_path -name critical_paths -critical_range 0.5 -weight 5

Higher weight = more optimization effort. Useful to tell DC to focus on specific paths without spending runtime on already-met paths.

20. What is the difference between read_verilog and analyze + elaborate?

read_verilog: Reads, analyzes, and elaborates the design in one step. Simpler for single-design flows.

analyze + elaborate (two-step):
analyze -format verilog -library WORK [file list]
elaborate top_module

The two-step approach is preferred for large hierarchical designs because analyze compiles each file to an intermediate form, and elaborate builds the hierarchy. This allows reuse of analyzed modules and better error isolation. Also enables explicit parameter override during elaborate.

21. What causes latch inference vs flip-flop inference in synthesis?

In Verilog RTL:
Flip-flop is inferred when: output is assigned only on a clock edge (always @(posedge clk)).
Latch is inferred when: output is assigned inside a level-sensitive always block AND not all conditions assign the output (incomplete if/case).

Example latch inference: always @(en or d) if (en) q = d; // q holds when en=0 → LATCH

Latches are generally undesirable in synthesis (timing hard to analyze). Fix: Use flip-flops with explicit reset, or make if/case statements complete with else/default.

22. What is incremental compile and when do you use it?

Incremental compile (compile -incremental) re-optimizes only the portions of the design that violate constraints, leaving already-met portions unchanged. It is faster than a full compile and is used:

After making small ECO changes to the netlist
After constraint changes affecting only a subset of paths
In a second-pass optimization after an initial compile

Not as thorough as a full compile_ultra — use only when runtime is critical or changes are known to be local.

23. What does check_timing report and why is it important?

check_timing validates that all paths in the design are covered by timing constraints. It reports:

Unconstrained paths: Flip-flops or ports with no clock or timing constraint → timing not analyzed → potential sign-off risk
Loops: Combinational loops (no register) which cause infinite path delays
No-clock endpoints: FFs without an associated clock

Always run check_timing before reporting timing. "Clean" means 0 warnings — every path is constrained.

24. What is propagated clock vs ideal clock in synthesis?

Ideal clock: Clock arrives at all FFs simultaneously with zero skew and zero network delay. Used pre-CTS. The set_clock_uncertainty models expected skew/jitter as a guardband.

Propagated clock: After CTS, the actual clock network delay is computed from the clock source through every buffer/inverter to each FF's clock pin. The tool uses real propagated delays — more accurate, removes pessimism of ideal clock uncertainty. set_propagated_clock [all_clocks] switches to propagated mode in PrimeTime post-CTS.

25. What is set_driving_cell and set_load?

set_driving_cell: Specifies the cell driving each input port, allowing DC to accurately compute input transition times. Without this, DC assumes an ideal (zero-resistance) driver. Example: set_driving_cell -lib_cell BUFX4 [get_ports data_in*]

set_load: Specifies the capacitive load on output ports (models the off-chip load). Example: set_load 0.05 [get_ports data_out*]

Both are necessary for accurate I/O timing analysis. Without them, input/output timing will be optimistic.

26. What is the SAIF file and how is it used in power analysis?

SAIF (Switching Activity Interchange Format) captures the toggle rate and static probability of every net in the design from simulation. It is used by synthesis and power analysis tools to compute accurate dynamic (switching) power rather than relying on default activity assumptions (typically 20% toggle rate).

Flow: Run RTL or gate-level simulation → dump SAIF → read in DC/PT for power analysis: read_saif -input sim.saif -instance top. More accurate switching data = more accurate power optimization decisions.

27. What is the difference between a latch and a flip-flop from a timing perspective?

Flip-flop (edge-triggered): Captures data only at the clock edge. Setup/hold times apply at that edge. STA treats it as a fixed timing endpoint — straightforward.

Latch (level-sensitive): Transparent when clock is high (or low). Data can "time-borrow" through the latch during the transparent phase, borrowing time from the next cycle. This makes STA significantly more complex — the tool must perform "time-borrowing" analysis. Latches in pipelines can improve throughput but require careful constraint handling with set_latch_time and cycle_time constraints.

28. What is a generated clock? Give an example.

A generated clock is a clock derived from a master clock by division, multiplication, or phase shift — typically from a PLL output or a clock divider register.

create_generated_clock -name CLK_DIV2 -source [get_ports clk_in] -divide_by 2 [get_pins clkdiv_reg/Q]

Generated clocks are essential for STA to correctly analyze paths crossing from the master to generated domain. Without declaring them, those paths are unconstrained. Generated clocks also inherit uncertainty from their master unless explicitly overridden.

29. What is a combinational loop and how does it affect synthesis?

A combinational loop is a circuit path where the output feeds back to its own input without any register (flip-flop/latch) in between. This creates infinite path delay in STA (the propagation loops forever), and in real hardware causes oscillation or lock-up states.

Synthesis tools detect loops via check_design and report them as errors. Loops must be fixed before synthesis can complete. Common causes: feedback mux without enable register, asynchronous handshake signals coded incorrectly in RTL.

30. What is register balancing vs pipeline optimization?

Register balancing (retiming): Moves existing registers within the current pipeline structure to equalize logic depth between stages. No new registers are added. The functional latency (number of cycles) stays the same.

Pipeline optimization: Adds NEW pipeline stages (registers) to reduce combinational depth at the cost of increased latency. This is an architectural decision made at RTL level, not done automatically by synthesis.

Key difference: Retiming is synthesis-level; pipelining is architectural. Both improve timing but retiming is transparent to function while pipelining increases output latency.

Physical Design Interview Questions (Top 30)

1. What is utilization in floorplanning and what is a good target value?

Utilization = (Total Standard Cell Area) / (Core Area) × 100%. It represents how densely cells are packed in the core.

Target: 60–75% for most designs. Lower (<50%) wastes die area and increases cost. Higher (>80%) causes routing congestion, difficulty placing buffers, and degraded routability. Memory-heavy designs may use 40–60% because large SRAMs occupy significant area.

2. What is the difference between die area and core area?

Die area: The total silicon area of the chip, including the I/O ring, pads, and all structures to the edge of the die.

Core area: The interior region where standard cells and macros are placed. It is surrounded by the I/O ring. Core area = Die area − I/O ring area − margins.

The core-to-die margin accommodates power rings, I/O pad connections, and design rule keepouts. Utilization is measured relative to the core area, not die area.

3. What is IR drop and how does it affect the design?

IR drop is the voltage reduction along the power delivery network due to resistive metal wires. V_drop = I × R.

Effects:

Cells receiving lower VDD switch slower → increased cell delay → potential setup violations
Severe IR drop can prevent cells from switching at all → functional failure
Dynamic IR drop (transient) from simultaneous switching of many cells

Fix: Wider power stripes, more vias, adding decap cells near the IR hotspot, reducing switching current by spreading cells.

4. What is electromigration (EM) and how do you fix it?

Electromigration (EM) is the gradual displacement of metal atoms in a wire due to electron momentum transfer at high current densities. Over time it creates voids (opens) or hillocks (shorts), causing chip failure.

Fix:

Widen the wire to reduce current density (J = I/A)
Add parallel wires (increase cross-section)
Add more vias (reduce via current density)
Reduce switching frequency or activity

EM analysis is part of sign-off using tools like Voltus or RedHawk.

5. What is clock skew? What is acceptable skew?

Clock skew = difference in clock arrival time between any two flip-flops in the design (or between launch FF and capture FF on a specific path).

Skew = T_clk_capture − T_clk_launch

Acceptable values:

Local skew (adjacent FFs): < 30–50 ps
Global skew (across chip): < 100–200 ps

Positive skew (capture FF's clock arrives later): relaxes setup, tightens hold. Negative skew: tightens setup, relaxes hold. CTS targets balanced (near-zero) skew between all FFs in a domain.

6. What is the difference between global routing and detailed routing?

Global Routing: Divides the chip into a coarse grid (GCells) and assigns each net to a sequence of GCells. Determines which metal layers and routing regions each net passes through. Fast, approximate — does not produce actual wire geometries. Identifies congested areas.

Detailed Routing: Works within the global routing assignment to produce exact wire coordinates, widths, vias, and layer assignments. Must satisfy all DRC rules. The actual GDSII-ready metal geometries are the output.

7. What is a DRC violation? Give three examples.

DRC (Design Rule Check) violations are layout patterns that violate foundry manufacturing rules:

Spacing violation: Two wires on the same metal layer are closer than the minimum spacing rule
Width violation: A wire is narrower than the minimum width for that metal layer
Via enclosure violation: Metal doesn't extend enough beyond the via in all directions
Antenna violation: Metal attached to gate has too high area ratio (damages oxide during fab)
Density violation: Metal fill percentage outside foundry-specified min/max range

8. What is LVS and what errors does it catch?

LVS (Layout vs. Schematic) extracts a netlist from the physical layout (by identifying connected metal regions as nets and transistors from poly-over-active patterns) and compares it to the reference schematic/netlist.

Errors caught:

Open circuits: A connection exists in schematic but is missing/broken in layout
Short circuits: Two nets that should be separate are connected in layout
Extra devices: Layout has transistors not in schematic
Missing devices: Schematic has cells not present in layout

LVS must be clean before tape-out.

9. What is a macro in physical design? How is it placed?

A macro is a large pre-designed block (hard macro) with a fixed layout: SRAM, ROM, PLL, analog IP, large memories. Unlike standard cells which are placed in rows, macros have fixed dimensions and internal structure.

Macro placement guidelines:

Place at die edges or corners to minimize routing blockage in the center
Align to row boundaries if possible
Add a "halo" or keepout around each macro (no std cells within 2–5µm)
Consider macro pin accessibility — pins should face the routing channels
Group related macros (e.g., all SRAMs near their controllers)

10. What are filler cells and their purpose?

Filler cells (decap fillers) are placed in empty spaces between standard cells in each row to:

Maintain N-well continuity across the row (required for correct transistor operation)
Connect power rails (VDD/VSS straps run through standard cell rows)
Provide decoupling capacitance (some filler cells include capacitors)
Ensure minimum density requirements for metal layers

Different sizes exist (FILL1, FILL2, FILL4, FILL8, FILL16, FILL32) and the placer fills every gap. Must be removed before ECO changes and re-inserted after.

11. What is an antenna violation in routing?

During plasma etching in semiconductor fabrication, metal wires connected to gate terminals accumulate charge. If the metal-to-gate area ratio exceeds a threshold, the charge can damage the thin gate oxide.

Antenna ratio = Metal area connected to gate / Gate oxide area

Fixes:

Jump up to higher metal layer (top layer is added last, less exposure)
Insert antenna diodes at gate inputs (discharge the accumulated charge)
Use antenna-aware routing (route to higher layer early)

Antenna violations are found by DRC and must be fixed before tape-out.

12. What is crosstalk and how does it affect timing?

Crosstalk occurs when a switching wire (aggressor) capacitively couples noise onto an adjacent wire (victim).

Timing impact:

Crosstalk delta delay: Aggressor switching in same direction as victim → speeds up victim (improves setup, worsens hold). Opposite direction → slows down victim (worsens setup).
Crosstalk noise/glitch: On a quiet net, coupling from aggressor creates a voltage spike that may cause a logic error if the net is near a switching threshold.

Fixes: Shield critical nets with VDD/VSS, widen wire spacing, use lower metal layers (smaller coupling cap).

13. What is the difference between legalization and detailed placement?

Legalization: After global placement places cells at approximate (possibly overlapping) locations, legalization moves each cell to the nearest legal position — aligned to a placement row, on the power rail grid, with no overlaps. Cells may move significantly from their global placement position.

Detailed Placement: After legalization, cells are in legal positions but timing may be degraded. Detailed placement does local cell swaps, single-row and multi-row moves to improve timing and reduce wirelength while maintaining legality.

14. What is a placement blockage? Name three types.

A placement blockage prevents the placer from placing standard cells in a specified area.

Types:

Hard blockage: No cells placed at all. Used around macros, analog circuits, special structures.
Soft blockage: Discourages placement but allows it if necessary for congestion relief.
Partial blockage: Only buffers and inverters (low-level cells) allowed — commonly used around macro halos.
Route blockage: Blocks routing (not placement) on specific metal layers in a region.

15. What is SPEF? Why is it needed for sign-off timing?

SPEF (Standard Parasitic Exchange Format) is a file that describes the extracted RC parasitics (resistance and capacitance) of every wire in the physical layout. After routing, an extraction tool (like StarRC or RCX) reads the physical layout and produces SPEF.

SPEF is needed because wire delays depend heavily on actual metal resistance and capacitance, which are only known after physical layout. Pre-route timing uses estimated wire loads (WLM) which can be 20–30% off. Sign-off STA uses SPEF for accurate, real timing. Without SPEF, timing sign-off is unreliable.

16. What is timing-driven placement?

Timing-driven placement considers timing criticality when placing cells. Critical path cells are placed close together to minimize wire length and thus wire delay. Non-critical paths can tolerate longer wires.

The placer uses early wire length estimation and constraint data to prioritize cell proximity for critical nets. Without timing-driven placement, a pure wirelength minimizer might spread critical cells apart, degrading timing after routing when actual wire RC is seen. Most modern placers (Innovus, ICC2) do timing-driven placement by default.

17. What is a power domain and what is level shifting?

A power domain is a region of the chip that operates at a specific supply voltage, potentially different from the rest of the chip. Used in low-power design to run non-critical blocks at lower voltage (lower power).

Level shifters are required when a signal crosses between two power domains at different voltages. They translate signal levels: a signal valid at 0.6V/1.2V in domain A must be converted to the 0.8V/1.8V levels of domain B. Without level shifters, the receiver sees incorrect logic levels, causing functional failure. Level shifters must be inserted in the netlist during synthesis/PD with proper UPF (Unified Power Format) flow.

18. What is the purpose of tap cells in physical design?

Tap cells (also called well taps) connect the N-well to VDD and substrate to VSS at regular intervals to prevent latchup. They have no active function but provide necessary bias connections.

Without tap cells, parasitic PNP/NPN transistors in the CMOS structure can turn on, creating a low-resistance path from VDD to VSS (latchup), permanently damaging the chip. Foundry rules specify maximum tap cell pitch (typically 20–50µm). They are placed in every standard cell row at regular intervals.

19. What is congestion in routing? How do you resolve it?

Congestion occurs when the number of wires that need to pass through a routing region exceeds the available routing tracks (routing overflow).

Fixes:

Reduce placement density (lower utilization) in congested areas
Add routing blockages on congested layers to force rerouting
Move macros to open routing channels
Add extra metal layers via process upgrades
Use high-fanout net synthesis to break up congested drivers
Adjust floorplan to redistribute logic

Congestion map analysis in Innovus/ICC2 shows hotspots before detailed routing.

20. What is double patterning and why is it needed?

At advanced nodes (<20nm), the minimum wire pitch required is smaller than what a single photolithography exposure can resolve. Double patterning splits the layout into two separate masks, each printed in a separate exposure, whose combined result achieves the fine pitch.

This requires the layout to be "colorable" — adjacent wires must be assigned to different masks (colors). DRC checks for double patterning conflicts (two adjacent same-color wires that should be different colors). Routing tools must be double-patterning-aware and ensure no conflicts.

21. What is a flyline (ratsnest) and how is it used?

A flyline (ratsnest) is a straight-line visual connection between unconnected pins that are logically connected in the netlist. It shows the router "intent" — which pins must be connected — before actual routing is done.

Uses:

Visual guide during floorplanning to estimate wire congestion and length
Identify poor floorplan choices (macros creating long flylines across the chip)
Estimate wirelength for timing budgeting

High-density flyline areas after floorplanning predict routing congestion hot spots. Move macros/cells to reduce crossing flylines.

22. What is a high-fanout net and how is it handled in PD?

A high-fanout net is a signal connected to a very large number of sink pins (e.g., enable, scan_en, reset driving hundreds or thousands of FFs). High fanout causes:

Excessive wire capacitance → slow transition → timing violation
Single wire spanning entire chip → routing congestion

Handling: Use buffer tree synthesis — insert buffers to split the net into sub-nets. The synthesis and PD tools do this automatically for nets exceeding max_fanout. Scan enable and test signals often need 4–6 levels of buffering.

23. What is the difference between ICC2 and Innovus?

Both are industry-leading place-and-route tools from different vendors:

Synopsys ICC2: Tightly integrated with DC (write_icc2), PrimeTime for timing sign-off, and IC Validator for DRC. Uses hierarchical database (.dlib).

Cadence Innovus: Tight integration with Genus (write_db/read_db), Tempus for in-design STA, and Calibre in-design. Known for Concurrent Optimization (CCOpt) for CTS.

Both support advanced features (multi-patterning, advanced node DRC, power analysis). Choice depends on existing tool stack and foundry PDK support.

24. What is a power intent file (UPF/CPF)?

UPF (Unified Power Format) and CPF (Common Power Format) describe the power architecture of a multi-voltage design:

Power domain definitions (which cells are in each domain)
Supply voltages for each domain
Power state definitions (ON, OFF, low-power)
Level shifter and isolation cell requirements
Power switching cell locations

UPF is now IEEE 1801 standard. The PD tool reads UPF to automatically insert level shifters, isolation cells, and power switches at domain boundaries. Without UPF, multi-voltage designs cannot be correctly implemented.

25. What is the difference between CTS and CTO?

CTS (Clock Tree Synthesis): Builds the clock distribution network from scratch — inserting buffers, inverters, and routing wires to distribute the clock to all FFs with controlled skew and latency.

CTO (Clock Tree Optimization): A post-CTS step that fine-tunes the existing clock tree — adjusting buffer sizes, changing net routes, and tweaking the tree topology to improve skew, latency, and clock power without fully rebuilding the tree. Used after post-route optimization when incremental clock improvement is needed. In Innovus: ccopt_design covers both CTS and CTO.

26. What is metal fill and why is it required?

Metal fill is dummy metal patterns inserted into empty areas of each metal layer to satisfy foundry density rules. CMP (Chemical Mechanical Polishing) during fabrication requires uniform metal density across the wafer to achieve planar surface topography.

Without adequate fill: CMP removes too much metal in sparse areas (dishing) or leaves too much in dense areas (erosion) → non-uniform heights → via formation failures → reliability problems.

Fill is inserted after routing using fill tools. It must not electrically connect to any signal but must meet min/max density rules on each layer within specified check windows.

27. What is the difference between pre-route and post-route optimization?

Pre-route optimization occurs before detailed routing. Wire delays are estimated (using virtual wire models). Cell placement, sizing, and buffering changes are fast because no DRC checking is needed. Timing closure is attempted here first for efficiency.

Post-route optimization uses actual extracted parasitics (RC from real wires). It is slower and must maintain DRC cleanliness with every change. Changes are limited (ECO-mode: only add/resize buffers/inverters, minimal perturbation to avoid DRC). Sign-off timing happens post-route.

28. What is a standard cell row? How does orientation affect placement?

A standard cell row is a horizontal strip in the core area with fixed height (matching the standard cell height for the technology node). Rows alternate between N-side up (N2HS) and P-side up, with power rails (VDD and VSS) running horizontally through them. All standard cells must snap to row boundaries.

Row orientation alternates (flipped in Y) so adjacent rows share power rails, reducing the number of power straps needed. Some cells can only be placed in certain orientations (e.g., cells with specific Nwell connections). Placement tools handle orientation automatically per-row.

29. What happens during sign-off? What must pass before tape-out?

Sign-off is the final verification phase before releasing the design to the foundry. All checks must pass with zero failures:

STA: Zero setup AND hold violations across all MMMC corners
DRC: Zero design rule violations (Calibre DRC clean)
LVS: Layout vs. Schematic clean (zero shorts/opens)
IR Drop: All cells receive sufficient voltage (static + dynamic)
EM: All metal/via segments below current density limits
Antenna: All gates meet antenna ratio rules
ESD: ESD protection structures verified

30. What is the purpose of decap cells?

Decap cells (decoupling capacitor cells) are standard-cell-height structures that contain a large capacitor between VDD and VSS. They serve as local charge reservoirs:

Supply instantaneous current to switching cells without waiting for current from the power pads (which have long RC path)
Reduce dynamic IR drop by providing local charge
Filter high-frequency noise on the power supply

Placed in empty spaces near high-switching-activity areas. Some filler cells also contain small decap capacitances. Excessive decap can cause excessive inrush current at power-on.

STA Interview Questions (Top 30)

1. What is static timing analysis and how does it differ from dynamic simulation?

STA exhaustively analyzes all timing paths in a design mathematically without requiring simulation vectors. It checks whether data can propagate from every startpoint to every endpoint within timing constraints.

Differences from dynamic simulation:

STA covers 100% of paths; simulation covers only exercised paths
STA is fast (minutes); full simulation can take days
STA cannot find functional bugs; simulation can
STA is deterministic given constraints; simulation depends on input vectors
STA uses library models; simulation uses detailed transistor behavior

2. What is setup time and hold time?

Setup time (T_su): The minimum time BEFORE the active clock edge that data must be stable at the FF input (D pin). If data changes within this window, the FF may fail to capture correctly → metastability.

Hold time (T_h): The minimum time AFTER the active clock edge that data must remain stable. If data changes within this window, the FF may capture the new value instead of the intended value.

Both are characteristics of the flip-flop cell from the technology library, measured at specific operating conditions. They represent fundamental timing requirements of the storage element.

3. How is setup slack calculated?

Setup slack = Required Arrival Time − Actual Arrival Time

Required Arrival Time:
= Clock period + Capture clock latency − Clock uncertainty (setup) − Setup time of FF

Actual Arrival Time:
= Launch clock edge time + Launch clock latency + CK→Q delay + Combinational delay

Slack ≥ 0: Setup MET (timing passes)
Slack < 0: Setup VIOLATED (must fix)

Example: Required = 4.8ns, Arrival = 3.5ns → Slack = +1.3ns (MET with 1.3ns margin)

4. Why do we need to fix both setup AND hold violations?

Both represent different failure modes that cause the flip-flop to capture the wrong data:

Setup violation: Data arrives too late → FF is asked to capture before data is stable → metastability → wrong Q output (random). Functional failure at speed.

Hold violation: Data changes too soon after clock → FF captures new value when old value was expected → wrong Q output. A hold violation is particularly dangerous because it causes failure at ALL frequencies — it's not a speed problem, it's a structural problem that causes failure even at low frequency.

Both must be zero violations at sign-off in every MMMC corner.

5. What is clock-to-Q delay (CK-to-Q)?

CK-to-Q (propagation delay) is the time from when the clock edge arrives at the FF's clock pin to when the output Q settles to its new logic value. It is a library cell characteristic and contributes to the data path delay in timing analysis:

Data arrival time = T_clk_source + T_launch_clk_latency + T_CKtoQ + T_combo_logic

Typical values: 50–200ps depending on cell drive strength and load. Larger, faster cells have smaller CK-to-Q. It also depends on output load capacitance (higher load → longer CK-to-Q).

6. What is metastability? How do synchronizers help?

Metastability occurs when a flip-flop's input violates setup or hold time. The FF output neither fully resolves to 0 nor 1 — it remains at an intermediate voltage. Given enough time (mean time to resolve), it will eventually resolve to a valid logic level, but the resolution time is unpredictable — it can be arbitrarily long, causing downstream logic to see incorrect values.

Synchronizers (2-FF chains) help by providing extra time for the FF output to resolve before being used. The probability of metastability causing failure decreases exponentially with the resolution time given. Mean Time Between Failures (MTBF) increases exponentially with the number of synchronizer stages.

7. What is OCV (On-Chip Variation) and why does it matter?

OCV is the spatial variation in process, voltage, and temperature across a single die. Two identical cells at different locations on the same chip may have different delays due to process gradient, local power supply variations, and thermal gradients.

OCV matters because STA corner analysis (SS, TT, FF) assumes the whole chip is at one corner. In reality, the launch path might be slow while the capture path is fast (or vice versa), creating additional timing margin loss. OCV derating adds guardband by making launch paths pessimistically slow and capture paths pessimistically fast (or vice versa for hold).

8. What is AOCV? How does it differ from flat OCV derating?

Flat OCV: Apply the same derate factor (e.g., 5%) to every cell, regardless of path length. This is very pessimistic for long paths — a path with 50 cells has much more statistical averaging than one with 3 cells.

AOCV (Advanced OCV): Applies a smaller derating factor to longer paths (more cells) because statistical averaging reduces the probability of all cells simultaneously being at the worst case. Shorter paths get higher derating. This reduces over-pessimism in long paths, recovering timing margin and avoiding unnecessary ECO effort. The derate table is a function of path depth (cell count).

9. What is MMMC analysis? Name four typical corners.

MMMC (Multi-Mode Multi-Corner) runs STA simultaneously across all operating modes and PVT corners to ensure the design meets timing in every scenario:

func_slow: SS, 0.9V, 125°C — Setup check for functional mode
func_fast: FF, 1.1V, -40°C — Hold check for functional mode
scan_slow: SS, 0.9V, 25°C — Scan shift timing
hold_extreme: FF, 1.2V, -55°C — Worst-case hold

All must pass simultaneously. One tool run covers all corners efficiently.

10. What is clock uncertainty and what components does it model?

Clock uncertainty is a timing margin applied to clock edges to account for:

Jitter (period jitter): Cycle-to-cycle variation in clock period from the PLL/crystal
Skew: Spatial variation in clock arrival times (pre-CTS only; post-CTS uses actual propagated latencies)
Uncertainty margin: Additional guardband for modeling limitations

Setup uncertainty reduces required time (more pessimistic). Hold uncertainty reduces required hold time (more pessimistic for hold). Applied using: set_clock_uncertainty -setup 0.15 -hold 0.05

11. What is clock reconvergence pessimism removal (CRPR)?

When the launch and capture flip-flops share a portion of their clock path (common clock path), the STA tool would otherwise apply OCV derating to the shared segment twice — once making it slow (for launch) and once making it fast (for capture). This is physically impossible: the shared wire has one actual delay.

CRPR removes this double pessimism by identifying the common portion of the clock path and applying derating only to the diverging portions. This can recover significant timing margin (50–200ps) especially in designs with long shared clock networks.

12. What is a timing arc?

A timing arc is a delay specification between an input pin and an output pin of a cell in the library. It describes how long it takes for a transition at the input to propagate to the output. Types:

Cell arc: Input→Output delay within a cell (e.g., A→Y in AND2)
Net arc: Wire delay from cell output to next cell input (RC delay)
Setup/hold arc: Constraint arcs on FF data vs clock pins
Clock arc: CK→Q propagation arc of a flip-flop

STA tools traverse all arcs to compute path delays.

13. What is the purpose of read_parasitics / read_spef in PrimeTime?

read_parasitics -format spef filename.spef loads the extracted wire RC parasitics from the post-layout extraction tool (StarRC, QRC). Without this, PrimeTime uses ideal wires or estimated loading (from SDC set_load), which is inaccurate.

After loading SPEF, wire delays are computed from actual metal resistance and capacitance (R×C delay), giving accurate net delays. Sign-off timing MUST use SPEF parasitics. The SPEF file must match the design netlist exactly (same net names). Mismatches cause warnings and incorrect timing.

14. What is hold analysis and why is it corner-reversed from setup?

Hold analysis checks whether data changes too quickly after a clock edge. Hold slack = Data arrival time − (Capture clock latency + Hold time).

Hold violations occur when the DATA PATH is too fast (short logic path) and the CLOCK arrives late at the capture FF.

Therefore, hold analysis uses the FAST corner (FF process, high voltage, low temperature) which makes data paths fast and can make hold more critical. This is the reverse of setup analysis which uses the SLOW corner. That's why MMMC must check setup at slow corner AND hold at fast corner simultaneously.

15. What does check_timing check and what warnings indicate?

check_timing validates constraint completeness and reports:

Unconstrained endpoints: FF data/output ports with no timing path from a clock — path not analyzed by STA
No-clock FFs: Registers with no associated clock definition
Partial path constraints: Input_delay covers only -max but not -min (or vice versa)
Loop detection: Combinational loops
Multiple clocks: Endpoints with multiple clock paths (may need set_false_path or set_clock_groups)

All warnings should be investigated — unconstrained paths are a sign-off risk.

16. What is path-based analysis (PBA) vs graph-based analysis (GBA)?

GBA (Graph-Based Analysis): Standard STA mode. Each cell's arrival time is calculated once using the worst-case input transition and output load from all converging paths. Very fast but pessimistic — assumes the worst condition at every cell simultaneously, even if physically impossible.

PBA (Path-Based Analysis): Re-analyzes specific critical paths using the actual input transition experienced by each cell on that specific path. More accurate, less pessimistic — removes false worst-case combinations. Much slower (only applied to a subset of near-critical paths). Used to "rescue" paths that look violated in GBA but actually pass when analyzed properly.

17. What is input transition and output load in cell timing models?

Cell delay is characterized as a 2D lookup table indexed by:

Input transition time (slew): How fast the input signal switches (rise/fall). A slower input → longer cell propagation delay.

Output load capacitance: Total capacitance the cell drives (input caps of fanout cells + wire cap). Higher load → longer output transition and higher cell delay.

The 2D table is NLDM (Non-Linear Delay Model). STA tools interpolate within the table to compute accurate delays for the specific transition and load seen at each cell in the design.

18. What causes a max-capacitance violation and how is it fixed?

A max-capacitance violation occurs when the capacitive load on a cell's output exceeds the maximum capacitance limit specified in the technology library for that cell. This causes:

Output transition (slew) becoming too slow
Downstream cell delays increasing
Possible functional failure if slew is extremely slow

Fixes:

Insert buffers to split the high-fanout net
Upsize the driving cell to a higher drive strength
Reduce wire length (physical proximity of sinks)

Max-cap violations show up as DRC violations in STA reports.

19. What is the difference between setup uncertainty and hold uncertainty?

Setup uncertainty: Applied to reduce the timing window available for data to meet setup. It tightens setup (makes it harder to meet). Typically 100–150ps for pre-CTS, reduced post-CTS.

Hold uncertainty: Applied to increase the minimum data arrival time required to meet hold. It tightens hold (makes hold harder to meet). Typically 50ps.

The asymmetry is because hold uncertainty models jitter that shortens the clock cycle for the capture edge, while setup uncertainty models jitter that either shortens or lengthens. Pre-CTS uses larger uncertainty; post-CTS switches to propagated clocks with only jitter uncertainty remaining.

20. What is back-annotated timing? When is it used?

Back-annotated timing (post-layout STA) uses actual extracted RC parasitics (SPEF) from the physical layout to compute wire delays. Contrast with pre-layout timing which uses estimated loads.

Used: After routing is complete for final sign-off. The parasitics precisely capture the resistance and capacitance of every metal wire and via, giving timing accuracy within 5% of silicon measurement.

Back-annotation reveals new violations not seen pre-route (because estimated wires underestimated actual wire capacitance). These violations require post-route ECO fixes with minimal netlist perturbation.

21. What is a timing exception and why must it be carefully applied?

A timing exception modifies how STA analyzes a specific path: set_false_path, set_multicycle_path, set_max_delay, set_min_delay.

They must be carefully applied because:

Over-generous false paths hide real timing violations
Wrong multicycle path settings (missing hold correction) create hold violations
Incorrectly specified endpoints leave real functional violations unchecked
Timing exceptions survive synthesis to PD to sign-off — errors propagate through the entire flow

All exceptions must be documented and reviewed. Functional paths must never be marked false.

22. What is the difference between max-delay and false path?

set_false_path: Completely removes the path from timing analysis. The tool ignores it entirely — no timing report, no optimization. For paths that are genuinely never timing-critical in any operating scenario.

set_max_delay: Still analyzes the path for timing, but uses the specified delay as the timing constraint instead of the default (clock period). For paths that need to meet a specific delay that's different from the clock period (e.g., async paths that must complete within 10ns regardless of clock).

Key difference: set_false_path means "never check this." set_max_delay means "check this, but use this constraint."

23. What is a violation cascade and how do you prioritize fixes?

A violation cascade occurs when fixing one timing violation makes another one worse. For example, upsizing a cell to fix setup on path A may load a net and degrade setup on path B.

Prioritization strategy:

Fix the WNS (worst) path first — largest magnitude violation
Use ECO minimize-impact mode (minimize cell moves)
Iterate in small batches (fix 20 paths, re-analyze, fix next 20)
Monitor TNS trend — decreasing TNS = making progress
Separate setup and hold fixes (hold buffer insertion can slow setup)

24. How does temperature inversion affect timing at advanced nodes?

At mature nodes (130nm+): Higher temperature → slower transistors (mobility decreases). Standard worst-case timing = high temp.

At advanced nodes (<65nm): Below a threshold voltage, temperature inversion occurs — at low Vdd, transistors can be SLOWER at low temperature than high temperature because subthreshold current becomes significant. This means the traditional slow corner (SS, 125°C) may no longer be worst-case timing; SS at -40°C may be worse.

Impact: Need to check timing at multiple temperature points. Some foundries provide separate library corners for this. Ignoring temperature inversion at advanced nodes can lead to post-silicon timing failures.

25. What is signal integrity (SI) in STA context?

In STA, Signal Integrity (SI) analysis accounts for crosstalk-induced delay changes:

SI delta delay: Coupling from aggressor wires causes victim wire delay to increase or decrease. STA includes SI analysis in sign-off by computing the worst-case delay considering all possible aggressor switching combinations.

SI noise analysis: Checks if crosstalk-induced voltage glitches on quiet nets can cause logic errors. The noise immunity of the receiving cell must exceed the peak noise voltage.

SI analysis requires layout parasitics including coupling capacitance (SPEF with coupling) — simple ground capacitance models are insufficient for SI-accurate timing.

26. What is setup recovery and removal time for asynchronous pins?

For asynchronous control pins (async reset, preset, clear) of flip-flops:

Recovery time: Minimum time the async signal must be deasserted BEFORE the active clock edge. Analogous to setup time — if async reset is released too close to the clock, the FF may not properly respond to the clock. Checked with set_max_delay or special recovery constraints in SDC.

Removal time: Minimum time the async signal must remain asserted AFTER the active clock edge. Analogous to hold time. These are library-characterized values that must be checked if async resets are used in a synchronous design.

27. What is the difference between input/output delay -max and -min?

set_input_delay -max: Latest time data can arrive at the port relative to clock. Used for setup analysis of the first internal register that captures this input.

set_input_delay -min: Earliest time data arrives. Used for hold analysis (ensures data doesn't arrive so early that it violates hold at the capturing FF).

set_output_delay -max: Latest time data must be stable at output before next clock edge (for the downstream receiver's setup).

set_output_delay -min: Earliest time data must be stable (for downstream receiver's hold).

All four values (-max/-min for input/output) must be specified for complete I/O timing coverage.

28. What is POCV (Parametric OCV)?

POCV (Parametric/Statistical OCV) replaces flat or AOCV derating with a statistical model. Each cell delay is modeled as a Gaussian distribution with a mean and standard deviation (from silicon characterization).

STA computes the statistical distribution of path delay (sum of independent Gaussian cell delays → Gaussian path delay by central limit theorem). Slack is then expressed as a sigma value — e.g., "path meets timing at 3σ".

Benefits: Most accurate OCV model, removes pessimism from flat/AOCV derating. Used in advanced (<7nm) nodes where OCV is very significant. Requires POCV characterization data from the foundry library.

29. How do you handle paths between asynchronous clock domains in STA?

Paths between asynchronous clock domains (clocks with no fixed phase relationship) cannot be meaningfully analyzed by standard STA — the arrival time of data relative to the capture clock is unbounded.

Proper handling:

set_clock_groups -asynchronous: Tells the STA tool to not analyze paths between these clock domains. The crossing is handled by synchronizers in the design.
CDC analysis (separate tool: Mentor CDC, Cadence JasperGold): Verifies correct synchronization structures are present
The synchronizer itself is analyzed with appropriate timing constraints

Failure to set clock_groups for async clocks creates false setup violations with pessimistic slack values.

30. What is the ECO flow in PrimeTime and how is it used?

PT-ECO is PrimeTime's automated ECO (Engineering Change Order) capability for fixing timing violations post-route:

Flow:

PT analyzes sign-off netlist with SPEF, finds violations
fix_eco_timing -setup and fix_eco_timing -hold generate cell changes (upsize/insert buffers)
Changes written to eco_changes.tcl
Innovus/ICC2 reads changes, places/routes ECO cells
RC re-extracted, PT re-runs analysis
Iterate until clean

PT-ECO minimizes cell perturbation to preserve DRC cleanliness of the post-route database.

Formula Cheatsheet

⏱ Setup Timing

Slack_setup = T_req − T_arr

T_req = Period + T_clk_capture − T_uncertainty − T_setup

T_arr = T_clk_launch + T_cq + T_combo

Positive slack = PASS, Negative = FAIL

⏳ Hold Timing

Slack_hold = T_arr − (T_clk_cap + T_hold)

T_arr must be GREATER than capture clock + hold time

Fix: Add delay buffers to data path

📐 Floorplan

Util = CellArea / CoreArea × 100%

AR = CoreHeight / CoreWidth

CoreArea = CellArea / Util_target

Target utilization: 60–75%

⚡ IR Drop

V_drop = I × R_metal

R = ρ × L / (W × T)

Max allowed: typically 5–10% of VDD

Fix: wider stripes, more stripes, decaps

🌊 Electromigration

J = I / A (current density)

J_max = A × e^(-Ea/kT)

Black's equation for MTTF. Wider wires → lower J

🕐 Clock Skew

Skew = T_clk_cap − T_clk_launch

Positive: relaxes setup, tightens hold

Negative: tightens setup, relaxes hold

🔋 Dynamic Power

P_dyn = α × C × V² × f

α = activity factor, C = load cap, V = supply, f = frequency

💤 Leakage Power

P_leak ∝ W/L × e^(-Vt/nVT)

Exponential sensitivity to Vt. HVT cells reduce leakage

📊 WNS / TNS

WNS = min(all slacks)

TNS = Σ(negative slacks)

WNS → worst path. TNS → total work needed.

🌡 PVT Corners

Setup: SS, low V, high T

Hold: FF, high V, low T

Temperature inversion at <65nm nodes!

📡 Wire Delay (Elmore)

T_d = 0.69 × R × C

T_d ∝ L² (wire delay scales as L²)

Long wires need repeaters/buffers every Lopt

🔄 Fanout & Buffering

Logical Effort: g = C_in / C_inv

Optimal fanout = e ≈ 2.7 (e-based)

Buffer chain for high-fanout: h = C_load/C_in, stages = log_e(h)

VLSI Glossary

AOCV

Advanced On-Chip Variation. OCV derating method that applies smaller derating to longer (higher cell count) paths due to statistical averaging.

Antenna Violation

Layout violation where cumulative metal area connected to a gate exceeds the foundry's antenna ratio limit, risking gate oxide damage during plasma etching.

AOCV

Advanced OCV — path-depth-aware derating that reduces pessimism vs flat OCV for long paths with many cells (statistical averaging effect).

Aspect Ratio

Core height divided by core width. Typically 1:1 (square) but can vary based on I/O and macro constraints.

Back-Annotation

Loading post-layout extracted parasitics (SPEF) into STA for accurate timing analysis using real wire RC values.

Blackbox

A module whose internal implementation is hidden from synthesis/STA. Only the timing model (liberty file) is used for analysis.

Buffer Tree

A hierarchy of buffers used to drive high-fanout nets, reducing wire capacitance per driver and improving transition times.

CCD

Concurrent Clock and Data optimization in Cadence CTS flow — simultaneously optimizes clock tree and data paths.

CDC

Clock Domain Crossing. Transfer of signals between flip-flops clocked by different, potentially asynchronous clocks. Requires synchronizer circuits.

CMP

Chemical Mechanical Polishing. Fab step that planarizes the wafer surface after each metal layer deposition. Requires uniform metal density.

CRPR

Clock Reconvergence Pessimism Removal. Removes double-counting of OCV derating on the shared clock path between launch and capture FFs.

CTS

Clock Tree Synthesis. Process of building the clock distribution network to minimize skew and control latency from clock source to all FF clock pins.

CK-to-Q

Clock-to-Q propagation delay of a flip-flop — time from clock edge to Q output settling to new value. Part of data path delay.

Compile Ultra

Design Compiler's advanced compile command enabling retiming, adaptive body biasing, and high-effort optimization for best QoR.

Core Area

The interior region of the chip die where standard cells and macros are placed. Excludes I/O ring and pad frame.

CPF

Common Power Format. Cadence's format (now merged into UPF/IEEE 1801) for specifying multi-voltage power intent.

Crosstalk

Capacitive coupling between adjacent wires. Causes delta delays (aggressor switching affects victim timing) and noise glitches (functional risk on quiet nets).

Decap Cell

Standard-cell-height structure containing a VDD-to-VSS capacitor. Placed in empty areas to reduce dynamic IR drop and supply noise.

Derating

Multiplicative factor applied to cell or wire delays in STA to model OCV. Early path derated by <1.0 (faster), late path by >1.0 (slower).

DEF

Design Exchange Format. Contains physical placement coordinates, routing geometry, and other physical design information for a chip.

Die Area

Total silicon area of the chip including I/O ring, pads, and all structures. Larger than core area.

DRC

Design Rule Check. Verifies layout geometry against foundry manufacturing rules (spacing, width, enclosure, density). Must be clean for tape-out.

DFT

Design for Test. Techniques (scan insertion, BIST, boundary scan) that make the design testable after manufacturing.

ECO

Engineering Change Order. Targeted, minimal netlist or layout change to fix a specific timing, functional, or sign-off issue after implementation.

Elaboration

Synthesis step that parses HDL, resolves hierarchy/parameters, and maps to GTECH (generic technology-independent) primitives.

Electromigration (EM)

Gradual metal atom displacement due to high electron current density. Causes voids (opens) or hillocks (shorts) over time. Characterized by Black's equation.

ERC

Electrical Rule Check. Verifies electrical correctness: floating nodes, improper biasing, ESD violations, latchup risk.

Filler Cells

Cells placed in empty row spaces to maintain N-well continuity, connect power rails, and satisfy metal density rules.

False Path

A timing path that exists in the netlist but is never functionally active. Excluded from STA via set_false_path.

Flyline

Straight-line visual connection between logically connected but physically unrouted pins. Used to assess routing congestion in floorplanning.

Footprint

Physical area occupied by a cell or block on the die, including any keepout regions.

GBA

Graph-Based Analysis. Standard STA mode computing arrival times on the timing graph once. Fast but pessimistic vs Path-Based Analysis (PBA).

GDS II

Graphic Design System II. Binary file format that contains all layout geometry for the chip. Final output sent to the foundry for mask making.

GTECH

Generic Technology. Synopsys internal technology-independent gate library used as intermediate representation during synthesis elaboration.

HVT

High Threshold Voltage cell. Slower than SVT/LVT but has very low leakage power. Used on non-critical paths to minimize standby power.

ICG

Integrated Clock Gating cell. Latch-based AND gate that cleanly gates the clock for power reduction without glitches. Inserted by synthesis tools.

IR Drop

Resistive voltage drop along power distribution network wires (V=IR). Reduces effective VDD at cells, increasing delay.

Jitter

Cycle-to-cycle variation in clock period caused by PLL noise, supply variation, and other sources. Modeled in clock uncertainty.

Latency (Clock)

Total delay from clock source to a flip-flop's clock pin, through all buffers/wires of the clock distribution network.

Legalization

Placement step that moves cells from global placement positions to the nearest legal row-aligned positions with no overlaps.

LEF

Library Exchange Format. Describes the physical abstract views of cells (pin locations, blockages, dimensions) for use by PD tools.

Level Shifter

Cell that converts a signal between two different voltage levels at a power domain boundary. Required for multi-voltage designs.

Liberty (.lib)

Industry-standard format for cell characterization data: timing arcs, power, area, and function at specific PVT conditions.

LVS

Layout vs Schematic. Extracts netlist from layout and compares to reference schematic. Catches opens, shorts, and missing/extra devices.

LVT

Low Threshold Voltage cell. Fastest switching speed, but highest leakage power. Used on critical timing paths to meet WNS.

Macro

Pre-designed hard block (SRAM, ROM, PLL, analog IP) with fixed layout dimensions. Placed early in floorplanning, not synthesized.

Metastability

Condition where a flip-flop output remains at an intermediate voltage indefinitely after setup/hold violation. Resolved by 2-FF synchronizers.

MMMC

Multi-Mode Multi-Corner. STA analysis across all operating modes and PVT corners simultaneously in one tool run.

Multicycle Path

A timing path intentionally designed to take N clock cycles. Declared via set_multicycle_path to relax the timing constraint.

NLDM

Non-Linear Delay Model. 2D cell delay table indexed by input transition time and output load capacitance. Standard model in Liberty files.

OCV

On-Chip Variation. Spatial PVT variation across a single die causing identical cells at different locations to have different delays.

PBA

Path-Based Analysis. Accurate STA mode that re-analyzes specific paths with actual input transitions, removing pessimism vs GBA.

PDN

Power Delivery Network. The complete network of metal rails, rings, stripes, and vias that distributes VDD/VSS to all cells.

POCV

Parametric/Statistical OCV. Most accurate OCV model — each cell delay modeled as a statistical distribution, path slack expressed as sigma confidence.

PrimeTime

Synopsys's industry-standard sign-off STA tool. Uses SPEF parasitics for post-layout timing analysis across MMMC corners.

PVT

Process, Voltage, Temperature. The three main variation sources characterized by IC timing libraries at multiple corners.

QoR

Quality of Results. Overall measure of synthesis/PD success: WNS, TNS, area, power, DRC count, and routability.

Retiming

Synthesis technique that moves registers across combinational logic to balance pipeline stage delays without changing function.

SAIF

Switching Activity Interchange Format. Simulation output capturing signal toggle rates for accurate dynamic power analysis.

SDC

Synopsys Design Constraints. Industry-standard format for timing, area, and power constraints used by all EDA synthesis and STA tools.

Setup Time

Minimum time data must be stable at FF input BEFORE the active clock edge. Violation causes metastability.

Skew

Difference in clock arrival times between any two flip-flops. Target: <50ps local, <200ps global after CTS.

Slack

Timing margin at a path endpoint: Required Time − Arrival Time (setup) or Arrival Time − Required Time (hold). Negative = violation.

Slew Rate

Speed of signal transition (rise/fall time), measured as time to transition between 20%–80% of supply voltage. Slow slew → more delay and power.

SPEF

Standard Parasitic Exchange Format. Contains extracted wire RC values from post-layout extraction for accurate back-annotated STA.

SVT

Standard Threshold Voltage cell. Balanced speed/leakage. General-purpose cell for non-critical paths.

Tap Cell

N-well to VDD and substrate to VSS connection cell. Prevents latchup. Placed at regular intervals (≤50µm) in every standard cell row.

Tempus

Cadence's sign-off STA tool, tightly integrated with Innovus. Supports native MMMC and in-design ECO optimization.

Timing Arc

Delay specification between a cell input and output pin. Includes cell arcs (logic delay), net arcs (wire RC), and constraint arcs (setup/hold).

TNS

Total Negative Slack. Sum of all negative slack values across all timing endpoints. Indicates total timing work remaining.

Uncertainty (Clock)

Timing margin accounting for jitter, skew, and modeling uncertainty. Applied via set_clock_uncertainty in SDC.

UPF

Unified Power Format (IEEE 1801). Standard format for describing multi-voltage power intent: domains, voltages, level shifters, isolation cells.

Utilization

Ratio of placed cell area to total core area, expressed as percentage. Target 60–75% for most designs.

Via

Metal connection between two adjacent metal layers in the layout. Vias have resistance and current capacity limits (EM rules).

WHS

Worst Hold Slack. Most negative hold slack across all endpoints. Must be ≥ 0 at sign-off. Fixed with delay buffer insertion.

WNS

Worst Negative Slack. Most negative setup slack — represents the single worst timing path in the design. Must be ≥ 0 at sign-off.

SECTION 05

Interactive Waveform Lab

An interactive digital timing waveform viewer. Toggle signals, animate the waveform, and inject setup/hold violations to see how they appear in practice.

🧪 Lab Controls

■ CLK — System Clock ■ D — Data Input ■ Q — FF Output ■ RESET — Async Reset ■ Violation Window

📖 How to Use

Click Play to animate the waveform timeline
Toggle individual signals using the colored buttons
Click Introduce Violation to inject a setup or hold violation
The violation window appears highlighted in red on the waveform
Click Reset to restore clean waveforms

🔬 Timing Parameters Displayed

CLK — 5ns period, 50% duty cycle (200MHz)
D — Data changes asynchronously relative to CLK
Q — Captured on rising edge of CLK (after CK-to-Q delay)
RESET — Active-low async reset; clears Q immediately
Setup window — Red zone before capture edge where D must be stable

SECTION 06

Physical Verification (PV)

🧱 Start Here — What Is Physical Verification?

After Physical Design completes, you have a GDS II file — the full geometric layout of your chip. But how do you know it's actually manufacturable? That it matches your circuit? That it won't destroy itself electrically? Physical Verification is the set of automated checks that answer all these questions before sending the GDS to the foundry.

Think of it as the final quality inspection before manufacturing. A single unresolved DRC violation → foundry rejects your file. A single LVS open → chip has a broken wire → dead chip. PV sign-off is non-negotiable.

🔍

DRC — Will the foundry be able to manufacture it?

Checks that every wire, via, and shape meets the foundry's minimum size and spacing rules. Violations mean the photolithography process cannot print your shapes correctly → defective chip.

⚖️

LVS — Does the layout match the netlist?

Extracts the connectivity from your physical layout and compares it to the reference netlist. A mismatch means a wire is missing, shorted, or connected to the wrong place → wrong circuit manufactured.

⚡

ERC / Antenna — Will it work electrically?

Checks for floating gates, ESD risks, latchup, and plasma damage to gate oxide. DRC-clean + LVS-clean is not enough — ERC violations can destroy the chip during manufacturing or use.

🔑 Why PV Is Critical — The Stakes

Unlike simulation (checks function) or STA (checks timing), Physical Verification checks manufacturability and physical correctness. A chip that simulates perfectly and passes STA but has a spacing DRC violation will either fail in the fab or be rejected at tape-out review. A modern chip tapeout costs $500K–$5M for the mask set. One missed LVS open = wasted silicon run = millions of dollars lost. PV is the last line of defense.

6.1 PV Flow Overview

Physical Verification consists of several distinct checks, each targeting a different failure mode. They must all be run at sign-off, typically in this order:

📐

What gets checked?

Every geometric shape in the GDS file is checked against the foundry's Process Design Kit (PDK) rules — spacing, width, overlap, enclosure, density, connectivity, and electrical properties.

🛠️

Primary Tools

Calibre (Siemens EDA) — Industry-dominant sign-off PV tool. Also: Synopsys IC Validator (ICV), Mentor Hercules. Foundry PDKs are certified for Calibre — it's the reference.

📋

PDK Rule Deck

Foundry provides a Calibre rule deck (.svrf file) — typically 50,000–200,000 lines of rules. Engineers run this deck against their GDS. Do NOT modify rule decks without foundry approval.

6.2 DRC — Design Rule Check

DRC verifies that your layout geometry satisfies every manufacturing rule in the foundry's PDK. These rules exist because the lithography and etching processes have physical limits — too-small features simply cannot be manufactured reliably.

Understanding DRC Rule Categories

Rule Category	What It Checks	Why It Exists	Typical Fix
Minimum Width	Wire width ≥ Wmin per metal layer	Too-thin wires break during CMP or have excessive resistance / EM risk	Widen the wire; router usually handles this automatically
Minimum Spacing	Gap between same-layer shapes ≥ Smin	Lithography cannot resolve too-small gaps → shorts between wires	Increase routing track separation; re-route in congested area
Via Enclosure	Metal must extend beyond via edge by min amount on all sides	Overlay (misalignment) in fab could expose via without metal contact	Use larger via enclosure design rule; ensure auto-router uses correct rules
Via Coverage	Minimum number of vias on high-current nets	Single via has limited current capacity; EM requires multiple vias	Replace single-cut vias with via arrays; use via doubling ECO
Notch Rule	Internal notch (concave corner) ≥ Nmin	Narrow notches print incorrectly — corners round off → shape deformation	Fill small notches; ensure polygon merging after fill insertion
Area Rule	Minimum enclosed polygon area	Tiny isolated shapes may not print or etch completely	Remove floating metal shapes; merge small disconnected polygons
Extension Rule	Active/poly must extend beyond diffusion edge	Transistor channel defined by overlap; insufficient extension = no transistor	Standard cell library handles this; flagged in custom analog layout
Density Rules	Min/max metal fill % per window per layer	CMP planarization requires uniform metal density across the wafer	Run fill insertion tool (Calibre Fill); remove excess fill if over-dense
Double-Patterning	Adjacent same-mask shapes must be separable into 2 colors	At <20nm, single lithography cannot print minimum pitch → 2 exposures needed	Assign colors using DP-aware router; fix coloring conflicts
Poly Spacing to Diff	Minimum distance between poly gate and nearby diffusion	Gate coupling to adjacent diffusion can cause leakage or latchup	Handled by standard cell design; appears in custom layout

DRC Violation Examples — Before & After (Annotated)

Running DRC with Calibre

SHELL — Calibre DRC Invocation

# Calibre DRC batch run
calibre -drc \
    -hier \                          # hierarchical mode (faster, uses cell caching)
    -turbo 16 \                    # 16 CPU threads parallel
    -64 \                           # 64-bit mode for large designs
    -runset ./drc_runset.svrf \    # DRC rule deck from foundry PDK
    -gds    ./out/chip_final.gds \ # input layout
    -top    chip_top               # top-level cell name

# Key sections in the Calibre runset (.svrf):
DRC RESULTS DATABASE  "drc.results"      ; output DB
DRC SUMMARY REPORT   "drc_summary.rpt"  ; human-readable summary
DRC MAXIMUM RESULTS  1000               ; stop at 1000 per rule (debug mode)
LAYOUT SYSTEM        GDSII

# Check results
grep "RULE" drc_summary.rpt | sort -k3 -rn | head -20
# Shows top 20 rules with most violations — fix these first

💡 Pro DRC Workflow

Don't try to fix DRC violations randomly. Sort by rule then by count — the top 5 rules usually account for 90% of all violations. Fix one rule type at a time using batch ECO scripts. Always re-run DRC after each fix iteration to catch cascading effects (fixing a spacing violation can sometimes introduce a width violation nearby).

6.3 LVS — Layout vs. Schematic

LVS extracts a netlist from your physical layout (by tracing metal connectivity and identifying transistors) and compares it against your reference schematic/netlist. Any mismatch is a critical bug that would cause chip failure.

How LVS Works — Step by Step

Common LVS Error Types and Root Causes

LVS Error	What It Means	Root Cause	How to Debug
Open Net	A connection present in the schematic is missing in the layout — net is broken	Missing wire segment, broken via, net not routed, missing metal fill connection	Highlight the net in layout viewer. Find the discontinuity. Add missing wire/via segment.
Short Circuit	Two nets that should be separate are electrically connected in layout	Routing DRC waiver created a short, accidentally connected polygons, missing wire cut	Identify which two nets are shorted. Find where they touch. Remove the connection or add a cut.
Device Mismatch	Device exists in schematic but not in layout (or vice versa)	Cell not placed, wrong cell reference, flatten/unflatten issue, macro not properly instantiated	Compare instance counts. Find missing instance in layout. Check hierarchy mapping.
Port Mismatch	Port name or type doesn't match between layout and schematic	Wrong pin label on layout port, renaming in synthesis not propagated to layout, case mismatch	Check label text on layout pins vs netlist port names. Calibre is case-sensitive.
Unconnected Port	A port declared in the netlist has no connection in the layout	I/O pad not connected to core, power domain port not properly tied, spare gate left floating	Find the port in the layout. Verify it has a metal label and is connected to the correct net.
Parameter Mismatch	Device dimensions differ between layout and schematic (W/L, capacitor value)	Standard cell used wrong size, analog cell manually edited without updating schematic	Check transistor W/L in layout vs SPICE netlist. Typically only affects analog blocks.

SHELL — Calibre LVS Run + Key Report Fields

# Calibre LVS run
calibre -lvs \
    -hier \
    -turbo 16 \
    -64 \
    -runset    ./lvs_runset.svrf \
    -gds       ./out/chip_final.gds \
    -top       chip_top \
    -netlist   ./out/chip_netlist.v  # reference from synthesis

# LVS report sections to check:
# 1. CIRCUIT COMPARISON RESULTS — Overall PASS/FAIL
# 2. SHORTS — nets merged that shouldn't be
# 3. OPENS — nets split that should be connected
# 4. UNMATCHED NETS — present in one side only
# 5. UNMATCHED INSTANCES — devices missing

# LVS clean confirmation in report:
# "CORRECT" → clean
# "INCORRECT" → failures exist

# Quick grep for errors:
grep -E "INCORRECT|SHORTS|OPENS|Unmatched" lvs_summary.rep

6.4 ERC — Electrical Rule Check

ERC catches electrical issues that DRC and LVS miss. A layout can be DRC-clean and LVS-clean but still have electrical errors that cause chip malfunction.

ERC Check	What It Detects	Consequence if Missed
Floating Gate	MOSFET gate connected to nothing (floating net)	Gate floats to indeterminate voltage → random switching behavior. Very common ERC error in early PD.
Floating Well	N-well or P-well not connected to VDD/VSS	Well floats → transistors biased incorrectly → latchup risk, parametric failures
VDD/VSS Short	Power and ground nets connected together	Direct short circuit → chip draws excessive current → burns out immediately on power-up
Input Not Driven	Logic input pin with no driver	Input floats → oscillation, metastability, excessive power consumption
Output Contention	Two outputs driving the same net simultaneously	Short circuit between drivers → device damage, incorrect logic level
ESD Violation	I/O pad has insufficient ESD protection structure	ESD event during handling destroys input gate oxide → dead chip before it even runs
Latchup Violation	Tap cells too far from active region (>50µm)	Parasitic SCR triggers → VDD-to-VSS latchup → chip permanently damaged

6.5 Antenna Check

During plasma etching in fabrication, metal connected to gate terminals accumulates charge. The antenna ratio is the cumulative metal area divided by the gate area. Exceeding the foundry limit damages the thin gate oxide — permanently degrading or destroying the transistor.

Antenna Ratio

AR = Σ(Connected Metal Area on Layer L) / Gate Oxide Area

⚠️ When Violation Occurs

Foundries specify max AR per metal layer. Typically AR < 400 for M1, AR < 800 for M2+. Violation occurs when a long wire on lower layers is connected to a gate before any higher layer connection breaks the accumulation path.

💡 Two Fix Strategies

1. Jump to higher layer (preferred): Re-route wire through a higher metal layer early. Higher layers are deposited later in the fab process — less plasma exposure time → lower charge accumulation.

2. Insert antenna diode: Place a reverse-biased diode (anode to net, cathode to VSS) at the gate. During fab the diode conducts the plasma current safely to ground before oxide damage occurs.

6.6 Metal Fill & Density Rules

CMP (Chemical Mechanical Polishing) planarizes each metal layer. Non-uniform metal density causes uneven polishing: sparse areas "dish" (metal removed excessively) and dense areas retain more. Both cause via formation failures and reliability issues.

Parameter	Typical Range	Effect of Violation
Minimum metal density	20–30% per check window	Dishing: metal recedes below ILD surface → via misses metal → open circuit
Maximum metal density	70–80% per check window	Erosion: ILD polished away → shorts between layers, increased leakage
Check window size	50×50 µm – 200×200 µm	Foundry-defined. Smaller windows = tighter local control
Fill shape min size	≥ Wmin per layer	Too-small fill shapes violate width rules themselves
Fill to signal spacing	≥ 2× normal spacing	Fill too close to signal → coupling capacitance → SI issues

SHELL — Calibre Metal Fill Insertion

# Run Calibre fill (after routing, before final DRC)
calibre -drc -hier -runset fill_runset.svrf

# fill_runset.svrf key options:
LAYOUT SYSTEM     GDSII
LAYOUT PATH       "chip_prefill.gds"
DRC RESULTS DATABASE "fill_out.gds"    ; GDS with fill added

# Fill insertion is non-electrical — it must NOT connect to any signal net
# Most foundry fill decks insert floating unconnected metal polygons
# Some advanced PDKs insert connected fill for better SI (optional)

# After fill: re-run DRC to verify:
# 1. Fill shapes themselves don't create new DRC violations
# 2. Fill-to-signal spacing rules satisfied
# 3. Density targets met on all layers

6.7 PV Tool Knowledge

Calibre from Siemens EDA (formerly Mentor Graphics) is the industry-standard sign-off verification tool. Virtually all foundry PDKs are certified for Calibre. If your Calibre DRC is clean, the foundry accepts your GDS.

Calibre Mode	Command	Purpose
DRC	calibre -drc -hier -runset drc.svrf	Design rule verification against foundry rules
LVS	calibre -lvs -hier -runset lvs.svrf	Layout vs schematic comparison
PEX/RCX	calibre -xrc -rcx -runset rcx.svrf	Parasitic RC extraction → generates SPEF
Fill	calibre -drc -hier -runset fill.svrf	Insert dummy metal fill to meet density rules
ERC	calibre -erc -runset erc.svrf	Electrical connectivity and latchup checks
PERC	calibre -perc -runset perc.svrf	ESD and latchup reliability analysis
DFM	calibre -dfm -runset dfm.svrf	Design-for-manufacturing: yield improvement checks
Litho Check	calibreLitho -verify	Optical proximity correction / lithography simulation

📌 Calibre Interactive (RVE)

The Results Viewing Environment (RVE) is Calibre's GUI for viewing DRC/LVS errors. Open it with calibredrv -m gds -gui. Features: highlight errors in layout, zoom to violation, batch-fix mode, error count by rule, and cross-probe to schematic for LVS.

Synopsys IC Validator (ICV) is Synopsys's native sign-off verification tool, tightly integrated with StarRC (parasitic extraction) and ICC2. Growing adoption especially in designs using the full Synopsys tool chain.

ICV Mode	Command	Purpose
DRC	icv -drc -i chip.gds -c icv_drc.rs	Design rule verification
LVS	icv -lvs -i chip.gds -s netlist.v	Layout vs schematic
Fill	icv -fill -i chip.gds -c fill.rs	Density fill insertion
ERC	icv -erc -i chip.gds	Electrical rule check
In-design DRC	icc2_shell> check_drc	DRC inside ICC2 during routing — catch violations early

Feature	Calibre (Siemens)	ICV (Synopsys)
Foundry Certification	Gold Standard — all foundries	Certified at major foundries
Tape-out Acceptance	Universally accepted	TSMC, Samsung, GF certified
PD Integration	In-design: Innovus + ICC2	Native in ICC2
Parasitic Extraction	Calibre xRC/PEX	StarRC (separate tool)
GUI Viewer	Calibre RVE	Custom Error Browser
Speed (large designs)	Excellent (hierarchical)	Excellent (hierarchical)
Rule Deck Language	SVRF / TVF	RSDB / SVRF compatible

6.8 Physical Verification — Interview Questions

1. What is the difference between DRC, LVS, and ERC? Which must pass for tape-out?

DRC (Design Rule Check): Checks layout geometry against foundry manufacturing rules — spacing, width, enclosure. Ensures the chip CAN be manufactured.

LVS (Layout vs Schematic): Extracts connectivity from layout and compares to reference netlist. Ensures the manufactured chip WILL match the intended circuit.

ERC (Electrical Rule Check): Checks for floating nodes, power/ground violations, ESD issues. Ensures the chip WILL WORK electrically.

All three must be 100% clean for tape-out. No exceptions. A single unresolved error means the foundry rejects the submission or the chip is at risk.

2. What is an antenna violation and how do you fix it?

An antenna violation occurs when the ratio of metal area connected to a gate terminal exceeds the foundry's limit during plasma etching. The metal accumulates charge which can tunnel through and damage the gate oxide — permanently destroying the transistor.

Two fixes:

Layer jump: Route the wire through a higher metal layer before connecting to the gate. Higher layers are deposited later → less plasma exposure time → less charge accumulation. This is the preferred fix as it adds no area.
Antenna diode: Insert a reverse-biased diode (anode on the net, cathode to VSS) at the gate input. During fab, accumulated charge safely bleeds to ground through the diode. Adds small area (~0.5–1 std cell)

3. What causes an LVS short and how do you debug it?

An LVS short means two nets that should be electrically separate in the schematic are connected in the layout.

Common causes:

Two wires on the same metal layer touching (spacing DRC violation that was waived)
Via connecting two unrelated nets through the same via hole
Accidentally connected power/signal during manual ECO
Missing via cut between two nets running over each other

Debug: Open Calibre RVE. Click the shorted net pair. The tool highlights in layout. Zoom to find the touching shapes. Check the DRC results — usually there's a spacing violation co-located with the LVS short. Fix the spacing violation to separate the nets.

4. What is a DRC waiver and when is it appropriate?

A DRC waiver is an explicit exception that suppresses a specific DRC violation from being reported, with foundry documentation justifying why it's acceptable.

Legitimate use cases:

Spacing violations in ESD clamp cells — intentionally tight by design, foundry-approved cell
Density violations in seal ring or pad ring areas — these regions have special rules
Known violations inside foundry-provided hard IP (black box) — foundry-guaranteed correct

Never waive: Violations you don't understand. Violations in custom logic you designed. Always document the waiver with justification and get foundry approval if required. Incorrect waivers = chip failure in fab.

5. Why does metal fill affect timing, and how do you manage it?

Metal fill polygons are floating (unconnected) metal shapes on each layer. Although electrically disconnected, they add parasitic coupling capacitance to nearby signal wires. This:

Increases wire capacitance → slower transitions → increased propagation delay
Adds coupling between fill and signal → minor crosstalk noise
Can shift timing by 2–5% on metal-dense designs

Management:

Run fill insertion BEFORE final sign-off STA (not after). STA must include fill parasitics.
Foundry fill rules specify minimum fill-to-signal spacing — this limits coupling impact.
Some flows use "timing-aware fill" which avoids placing fill near critical nets.
StarRC/Calibre xRC re-extraction after fill captures the additional capacitance in SPEF.

6. What is double patterning and which nodes require it?

Double Patterning (DP) splits a single metal layer's patterns into two separate photomasks that are exposed sequentially. After first exposure + etch, the second mask fills in the remaining patterns. Together they achieve pitches half of what single exposure can print.

Required at:

28nm: some critical layers
20nm/16nm: M1, M2, via layers
10nm/7nm: Most metal layers, Fin definition, contact layers

DRC implications: Adjacent wires on a DP layer must be "colorable" — assigned to alternating masks without conflict. A "coloring conflict" occurs when three adjacent wires are too close together to assign alternating colors without two same-color wires violating spacing. Requires routing perturbation to resolve.

7. What is Calibre xRC and how does it relate to STA sign-off?

Calibre xRC (eXtracted RC) is Calibre's parasitic extraction engine. It reads the post-route GDS layout and computes the actual resistance (R) and capacitance (C) of every metal wire and via, outputting a SPEF file.

Relationship to STA:

Routing completes → GDS generated
Calibre xRC reads GDS → produces chip.spef
PrimeTime reads chip.spef via read_parasitics
Wire delays computed from real RC → accurate back-annotated timing
Sign-off STA with real parasitics must pass before tape-out

Pre-route timing uses estimated loads (WLM) which can be 20–40% off. Calibre xRC gives ground-truth parasitic data within ~3% of silicon silicon measurement. Without SPEF from Calibre xRC, timing sign-off is not reliable.

8. How do you approach a large DRC run with 50,000 violations?

50,000 violations sounds overwhelming but they're usually from just 3–5 root causes. Systematic approach:

Sort by rule, count descending: grep "RULE" drc_summary.rpt | sort -k3 -rn. The top rule might account for 40,000 violations from one root cause.
Fix the top rule first: Understand why it's occurring. Is it a routing configuration issue? A macro halo not set? A missing fill constraint?
Batch fix vs point fix: If 10,000 violations are "M2 spacing" due to track pitch, fix the router configuration and re-route — don't fix them one by one.
Re-run DRC after each major fix: Cascade effects — fixing spacing might introduce new width violations.
Isolate by region: If violations cluster in one area, focus there. Use Calibre's "check window" to run DRC on a sub-region during debug.

9. What is LVS clean vs LVS correct — is there a difference?

LVS clean: Calibre reports "CORRECT" — all nodes match between layout and schematic. No shorts, opens, or device mismatches. This is what tape-out requires.

Important nuance: LVS-clean does NOT guarantee the design is functionally correct. It only guarantees the layout faithfully implements the netlist. If the netlist itself has a bug (wrong logic, timing violation, incorrect constraint), LVS will still pass. That's why functional verification (simulation), STA, and LVS are all independently required — they catch different classes of errors. An LVS-clean, STA-clean chip can still fail functionally if the RTL logic was wrong.

10. What is the seal ring and why does it have special DRC rules?

The seal ring is a continuous ring of metal and active structures running around the perimeter of the die, between the pad ring and the dicing street. Its purposes:

Mechanically seals the chip edge against moisture ingress (prevents corrosion)
Guards against plasma-induced damage at the die edge during dicing
Provides a stress buffer between the die bulk and the scribe line

Special DRC: The seal ring intentionally violates several standard DRC rules — it has very narrow/tight structures and rule violations are expected and foundry-approved. Engineers must either exclude the seal ring cell from DRC or use the foundry-provided waiver file that suppresses known seal ring rule violations. Trying to "fix" seal ring DRC violations is a common mistake by junior engineers.

SECTION 07

How to Prepare — Career Roadmap

This section is your end-to-end guide to entering and advancing in VLSI engineering. Whether you're a student targeting your first role or an experienced engineer moving into a specialized domain, follow this structured path. Advice written from the perspective of what hiring managers and senior engineers actually look for.

7.1 VLSI Domains — Which One Is For You?

🔧

RTL Design Engineer

What you do: Write synthesizable Verilog/SystemVerilog. Design microarchitecture — FSMs, pipelines, datapaths. Write verification plans.

Skills needed: SystemVerilog, microarch, timing-aware RTL coding.

Companies: Intel, AMD, ARM, Qualcomm, Apple, NVIDIA (design teams).

⚙️

Synthesis Engineer

What you do: Run synthesis flows (DC/Genus), meet QoR targets, write SDC constraints, perform timing closure at gate level.

Skills needed: TCL scripting, DC/Genus, SDC, timing analysis, QoR optimization.

Companies: Samsung, MediaTek, Marvell, Broadcom.

🗺️

Physical Design Engineer

What you do: Floorplan, power plan, place, CTS, route, close timing post-route. Work in Innovus or ICC2 daily.

Skills needed: Innovus/ICC2, floorplanning, CTS, routing DRC, ECO.

Companies: TSMC, GlobalFoundries, fabless design houses, Apple silicon.

⏱️

STA Engineer

What you do: Sign-off timing across all MMMC corners using PrimeTime/Tempus. Write ECO scripts. Own WNS/TNS/WHS closure.

Skills needed: PrimeTime, SPEF, MMMC, OCV/AOCV, ECO flows.

Companies: Any semiconductor company with tape-out responsibility.

✅

Physical Verification Engineer

What you do: Run DRC, LVS, ERC, Antenna checks. Debug violations. Own Calibre flow. Coordinate tape-out sign-off.

Skills needed: Calibre DRC/LVS, SVRF rule decks, GDS debugging, Calibre xRC.

Companies: Foundry customers, TSMC design enablement, IP companies.

🔬

Verification Engineer (DV)

What you do: Write UVM testbenches, functional coverage, formal verification, emulation. Ensure RTL is functionally correct.

Skills needed: SystemVerilog, UVM, SVA, Questa/VCS, formal tools.

Companies: All major semiconductor companies.

7.2 Learning Roadmap — Fresher to Professional

7.3 Essential Tools — What to Learn and How

🎯 Reality Check

Industry EDA tools (DC, Innovus, PrimeTime, Calibre) are expensive and require a license. As a student, use the free alternatives below to build hands-on experience. Hiring managers know you won't have industry tool access — they want to see that you understand the concepts and can demonstrate hands-on work with open-source equivalents.

Domain	Industry Tool	Free/Open Alternative	How to Practice
Synthesis	Synopsys DC / Cadence Genus	Yosys (open source)	Synthesize your Verilog designs with Yosys. Understand liberty files. Write SDC constraints manually. Compare area reports.
Place & Route	Cadence Innovus / Synopsys ICC2	OpenROAD via OpenLane2	Use OpenLane with Sky130 PDK. Run full RTL-to-GDS on a small design (UART, I2C, simple CPU). Examine each output.
STA	Synopsys PrimeTime / Cadence Tempus	OpenSTA (inside OpenLane)	Read timing reports from OpenSTA. Understand slack calculation. Introduce timing violations manually and fix them.
Simulation	Synopsys VCS / Cadence Questa	Verilator / Icarus Verilog	Write testbenches. Simulate your RTL. View waveforms in GTKWave. Practice writing self-checking testbenches.
Physical Verification	Calibre DRC/LVS	Magic VLSI / KLayout	Open Sky130 GDS in KLayout. Inspect metal layers. Run built-in DRC checks. Understand what each layer represents.
Parasitic Extraction	Calibre xRC / Synopsys StarRC	OpenRCX (inside OpenROAD)	Run OpenRCX on a placed-and-routed design. Examine the SPEF output. Understand how RC values affect timing.
Waveform Viewing	Synopsys DVE / Cadence SimVision	GTKWave	View VCD dumps from Verilator/Icarus simulation. Practice reading waveforms, adding cursors, measuring timing.
Layout Editing	Cadence Virtuoso / Synopsys L-Edit	Magic VLSI / KLayout	Draw simple standard cells in Magic. Understand how transistors form. See the connection between schematic and layout.

OpenLane Quick Start — Full RTL-to-GDS in One Command

SHELL — OpenLane2 with Sky130 PDK

# Install OpenLane2 (requires Docker or Nix)
pip install openlane

# Create a minimal design config
mkdir my_design && cd my_design
cat > config.json << 'EOF'
{
  "DESIGN_NAME": "my_alu",
  "VERILOG_FILES": "src/alu.v",
  "CLOCK_PORT": "clk",
  "CLOCK_PERIOD": 10,
  "FP_CORE_UTIL": 40,
  "PL_TARGET_DENSITY": 0.4
}
EOF

# Run complete RTL-to-GDS flow
openlane config.json

# Outputs you'll find in runs/RUN_*/:
# synthesis/    → gate-level netlist (.v)
# floorplan/    → DEF with core/IO defined
# placement/    → placed cells DEF
# cts/          → clock tree built DEF
# routing/      → fully routed DEF + GDS
# signoff/      → timing reports (OpenSTA)
# signoff/      → DRC results (KLayout/Magic)

# View final GDS in KLayout:
klayout runs/RUN_latest/final/gds/my_alu.gds

7.4 Interview Preparation Plan — 8 Weeks

Week	Topic Focus	What to Study	Practice Task
Week 1	Digital Fundamentals	Setup/hold time, metastability, clock domains, timing diagrams, flip-flop operation	Draw timing diagrams by hand. Explain setup violation to a friend without notes.
Week 2	Synthesis Concepts	RTL-to-netlist flow, SDC constraints (create_clock, set_input/output_delay, false_path, multicycle_path), QoR metrics (WNS/TNS)	Write a complete SDC file for a simple design from memory. Run Yosys synthesis on a small Verilog module.
Week 3	STA Deep Dive	Setup/hold slack formulas, 4 path types, timing reports, OCV/AOCV, propagated clock, MMMC corners	Manually calculate setup slack for a given circuit. Read a full PrimeTime report and identify violations.
Week 4	Physical Design	Floorplan formulas (utilization, AR), IR drop, CTS (skew/latency/uncertainty), routing DRC rules, Innovus vs ICC2	Run OpenLane on a UART or I2C controller. Examine floorplan DEF, routing layers, DRC results.
Week 5	Physical Verification	DRC categories, LVS flow, antenna violations, metal fill/density, Calibre commands, ERC checks	Open a Sky130 GDS in KLayout. Identify metal layers. Find a DRC violation and understand which rule it breaks.
Week 6	Advanced Topics	CDC (2FF synchronizer, set_clock_groups), low power (clock gating, multi-Vt, UPF), timing closure ECO flow	Write a CDC synchronizer in Verilog. Simulate it with an asynchronous signal crossing. Verify no metastability.
Week 7	Mock Interviews	Work through all 90 Q&As in this guide. Time yourself. Answer out loud, not just in your head.	Do 3 mock interviews with a peer or use a mirror. Record yourself. Identify weak areas and go back to Week 2–6.
Week 8	Company-Specific Prep	Research target company's products. Know their process node (e.g., TSMC 5nm, Samsung 3nm). Read recent conference papers from their engineers.	Prepare 3–5 intelligent questions to ask the interviewer. Show you understand their specific domain challenges.

7.5 What Interviewers Actually Evaluate

✅ What Gets You Hired

Can explain why, not just what. "Why does hold analysis use the fast corner?" shows deep understanding.
Hands-on experience — even with open-source tools. Running OpenLane end-to-end beats "I studied PD in class."
Correct use of units and numbers. "Skew under 50ps," not "skew should be small."
Knows limits and tradeoffs. "Increasing drive strength fixes setup but increases power and may cause hold violations."
Asks clarifying questions before answering — demonstrates engineering mindset.
Admits uncertainty honestly: "I haven't used Genus directly but DC concepts are the same — let me explain my DC knowledge."
Connects concepts: "DRC-clean layout is needed before Calibre xRC extraction which feeds sign-off STA."

❌ Common Mistakes That Fail Candidates

Memorizing answers without understanding. Interviewers probe with follow-up questions — memorized answers collapse immediately.
"I know the theory but haven't used the tools." Every VLSI job requires tool proficiency. Use open-source tools to fill this gap.
Getting confused between setup and hold. This is the most basic STA concept — if you mix them up, the interview ends.
Not knowing which corner is used for setup vs hold analysis. This comes up in almost every STA interview.
Saying "I would just rerun synthesis" to fix a post-route timing violation. Late-stage fixes must be ECO-based — no full rerun.
Cannot explain what SPEF is and why it's needed. This is fundamental to any STA sign-off conversation.
Treating LVS-clean as the same as functionally correct. Interviewers know this is a common misconception.

⚡ The Most Common Interview Question — And How to Answer It Properly

"Explain setup and hold time." — Almost every VLSI interview starts here. Wrong answer: "Setup is the time before clock, hold is the time after clock."

Right answer: "Setup time is the minimum duration that data must be stable at the flip-flop input before the active clock edge, so the FF can reliably capture it. Hold time is the minimum duration data must remain stable after the clock edge. Violating setup causes data to arrive late — the FF may not capture the correct value. Violating hold causes data to change too quickly — the FF may capture the new value instead of the intended one. Critically, hold violations cause failures at all frequencies, not just high speed — they're structural, not a speed problem. That's why they're fixed with delay buffer insertion rather than clock frequency reduction."

7.6 Essential Books, Courses & Resources

📖

Foundational Books

Weste & Harris — CMOS VLSI Design (the bible)
Rabaey et al. — Digital Integrated Circuits
Patterson & Hennessy — Computer Organization & Design
Bhatnagar — Advanced ASIC Chip Synthesis (DC-specific)
Elmore — RC delay modeling papers

🌐

Online Resources

efabless.com — Free chip tapeout with Sky130
openroad.tools — Open-source RTL-to-GDS
vlsiuniverse.com — VLSI interview prep
Synopsys SolvNetPlus — Official DC/PT docs
Cadence Training — Innovus/Genus tutorials
IEEE Xplore — DAC, ICCAD, CICC papers

🎓

Courses & Projects

MIT 6.004 — Computation Structures (free)
Coursera: HDL & FPGA — Entry RTL practice
Build a RISC-V CPU in Verilog, synthesize it
Tape-out on Sky130 via efabless chipIgnite
Contribute to OpenROAD — visible open-source work
ICCAD Contests — Register for student competitions

7.7 A Day in the Life — By Role

Physical Design Engineer — Typical Day (Pre-Tapeout)

Time	Activity	Tools Used
9:00 AM	Check overnight Innovus place-and-route run results. Review DRC violation count trend and timing summary.	Innovus GUI, log parser scripts
9:30 AM	Team stand-up: report WNS/TNS status, blocking issues (congested area, unresolvable DRC in macro boundary)	Confluence, Jira
10:00 AM	Debug 3 specific DRC violations near a macro corner that have resisted automatic fixing. Manually re-route 2 wires.	Innovus ECO route, DRC GUI
11:30 AM	Review CTS results. Skew is 220ps — above 200ps target. Adjust ccopt settings, re-run CTS on critical clock domain.	Innovus ccopt_design
1:30 PM	Run post-route STA to check setup/hold after morning ECO changes. Two new hold violations introduced by yesterday's buffer insertion.	Tempus, timing reports
2:30 PM	Fix hold violations by inserting delay buffers on 2 short paths. Re-run route_opt for those nets.	Innovus ecoAddDelay
3:30 PM	Meeting with STA team to align on acceptable WNS margin at this stage of the project.	–
4:00 PM	Write TCL script to automate tomorrow's overnight run: place → CTS → route → extractRC → STA → DRC. Submit to compute farm.	TCL, LSF job scheduler
5:00 PM	Update project tracking spreadsheet. Document today's changes. Review tomorrow's schedule.	Excel, Confluence

STA Engineer — Typical Day (Sign-off Phase)

Time	Activity	Tools Used
9:00 AM	Review overnight PT sign-off results across 5 MMMC corners. Identify which corners still have violations.	PrimeTime, report parser
9:45 AM	WNS is -0.04ns at func_slow corner on one clock domain. Identify the critical path — reg-to-reg through a wide adder.	PT report_timing
10:15 AM	Run PT-ECO to generate fix suggestions (upsize 3 cells, insert 1 buffer). Review suggestions for reasonableness.	PT fix_eco_timing
10:45 AM	Send ECO script to PD team for implementation in Innovus. Coordinate on expected turnaround.	Email, Jira ticket
11:30 AM	Review hold corner (func_fast) — clean. Review scan corner (scan_slow) — 2 violations. Update tracking spreadsheet.	PrimeTime, Excel
1:30 PM	New SPEF delivered from PD team after yesterday's route ECO. Run full PT update on all 5 corners. ~2 hour runtime.	PrimeTime update_timing
3:30 PM	Results back: func_slow now +0.02ns (CLEAN). Scan corner improved to -0.01ns — one path remains. Document.	PrimeTime, Confluence
4:00 PM	Debug the remaining scan violation — it's an MCP (multicycle_path) that has wrong hold correction. Fix SDC, re-run.	PrimeTime, SDC editor
5:00 PM	Submit final overnight run with updated SDC. Send status email to project lead: "func_slow CLEAN, scan_slow -0.01ns in progress"	LSF, email

Physical Verification Engineer — Typical Day (Pre-Tapeout)

Time	Activity	Tools Used
9:00 AM	Review overnight Calibre DRC results. 142 violations remain (down from 2,400 last week). Classify by rule.	Calibre RVE, shell scripts
9:30 AM	Top rule: M3_SPACING.2 — 47 violations, all in one macro boundary region. Root cause: macro halo not set correctly.	Calibre RVE, Innovus
10:00 AM	Fix: Adjust macro halo in Innovus, re-run routing around that macro. Generates new GDS for next DRC run.	Innovus, script
11:00 AM	LVS run completed overnight — 3 opens found. Debug: all 3 are on VDD tie-off cells that weren't properly connected after fill insertion.	Calibre RVS, layout viewer
11:45 AM	Fix: Add missing metal connections in Innovus. Verify fix with quick LVS on the affected nets only.	Innovus ECO, Calibre partial LVS
1:30 PM	Antenna check: 8 violations remain. All on NAND gate inputs with long M1 wires. Add jumper vias to M3 for 6 of them; insert 2 antenna diodes for the others.	Innovus antenna fixer, Calibre
3:00 PM	Submit full DRC/LVS/Antenna run to compute farm with new GDS. Expected 4 hours runtime.	LSF compute farm
3:30 PM	Prepare tape-out checklist. Verify all IP blocks have current DRC waivers. Coordinate with fab liaison on GDS delivery window.	Confluence checklist, email
5:00 PM	Update PV sign-off dashboard. Send status: "DRC: 142→TBD tonight, LVS: 3 opens fixed, Antenna: 8→2 fixes sent to PD"	Dashboard, email

Synthesis Engineer — Typical Day (Synthesis Closure)

Time	Activity	Tools Used
9:00 AM	Review overnight compile_ultra run. WNS = -0.28ns, TNS = -15.4ns. 47 violating endpoints. Area 4.2 mm².	DC, QoR scripts
9:30 AM	Identify top 5 violating paths — all through the FP multiply unit. Discuss with RTL team: can MCP be applied?	DC report_timing, email
10:00 AM	RTL team confirms multiplier is 2-cycle. Add MCP to SDC. Re-run compile_ultra incremental on that path group.	DC compile -incremental
11:00 AM	New WNS = -0.06ns, TNS = -0.9ns. Good progress. Remaining violations are in the interconnect arbiter.	DC, report_qor
1:30 PM	Try path group with higher weight on the arbiter timing paths. Also try ungroup on the arbiter sub-module to allow cross-boundary optimization.	DC group_path, ungroup
2:30 PM	Check max-cap violations — 12 found on high-fanout reset net. Set don't_touch on clock buffers. Insert buffer tree on reset.	DC set_max_fanout, compile
3:30 PM	Run power analysis with SAIF from simulation. Dynamic power = 380mW — 15% over target. Apply clock gating and increase HVT cell usage on non-critical paths.	DC, power_compiler
4:30 PM	Submit overnight full compile_ultra run with updated SDC and power optimizations. Write synthesis run notes for the team.	LSF, Confluence
5:00 PM	Write handoff email to PD team with current netlist, mapped SDC, and QoR summary noting areas of concern for floorplanning.	Email

7.8 Skills Proficiency Matrix

Rate yourself honestly against this matrix. Target "Intermediate" in your primary domain and "Awareness" in adjacent domains before your first interview. "Advanced" in your domain is the 3–5 year mark.

Skill Area	Awareness	Intermediate (Hire-ready)	Advanced (3–5yr)
Verilog / SV	Can read RTL code. Knows gates, FFs, always blocks.	Writes synthesizable RTL. Understands latch vs FF inference. Codes FSMs correctly.	Writes parameterized, reusable RTL. Knows synthesis implications of every construct.
Synthesis / SDC	Knows flow: RTL → netlist. Knows create_clock exists.	Writes complete SDC. Runs DC. Interprets QoR. Understands WNS/TNS.	Tunes compile strategies. Multi-Vt optimization. compile_ultra deep settings.
STA	Can define setup/hold time. Knows slack formula.	Reads PT timing reports. Understands MMMC. Knows OCV derating. Can close timing with ECO.	POCV, CRPR, PBA vs GBA. Develops full MMMC corner methodology for a project.
Physical Design	Knows PD flow stages. Can explain utilization and skew.	Can floorplan a block. Runs Innovus/OpenROAD full flow. Understands DRC and IR drop.	Closes timing at advanced nodes. Owns CTS strategy. Designs power grid from scratch.
Physical Verification	Knows DRC/LVS purpose. Can identify a spacing violation.	Runs Calibre DRC/LVS. Debugs top violation types. Understands antenna and fill.	Owns tape-out PV sign-off. Writes Calibre runset modifications. DP-aware verification.
TCL Scripting	Can read/modify existing TCL scripts. Knows variables, loops, procs.	Writes TCL flow scripts from scratch. Parses timing reports. Automates batch jobs.	Writes complex flow automation, QoR parsers, automatic ECO generators in TCL/Python.
Low Power	Knows clock gating saves power. Knows LVT has more leakage.	Sets up multi-Vt optimization in synthesis. Understands UPF domains and level shifters.	Designs full multi-voltage UPF architecture. Owns power sign-off (Voltus/RedHawk).