SDH Performance Monitoring
Error counters (B1/B2/B3), thresholds, and maintenance signals.
The Network's Dashboard: Why Performance Monitoring is Crucial
A key differentiator that set SDH/SONET apart from older technologies was its comprehensive, built-in system for monitoring the health of the network in real time. Think of it like the dashboard of a car: a driver doesn't wait for the engine to seize before realizing there's a problem; they monitor gauges like the temperature and oil pressure lights. Similarly, a network operator cannot afford to wait for a catastrophic failure that impacts thousands of customers.
SDH/SONET performance monitoring provides this "dashboard" by embedding a rich set of tools directly into the signal's overhead bytes. These tools allow the network to continuously check for , detect equipment failures, and report problems automatically. This enables proactive maintenance and ensures the high levels of reliability required for a global communication backbone.
The Core Mechanism: Bit Interleaved Parity (BIP) Error Checking
The primary tool for detecting transmission errors in SDH/SONET is called Bit Interleaved Parity (BIP). It is a simple but powerful type of checksum that operates at multiple layers within the network, allowing operators to pinpoint where errors are occurring.
How BIP Works
A BIP-N calculation involves organizing all the bits of the data block being checked into an imaginary matrix with N columns. Then, a single parity bit is calculated for each column (typically even parity, meaning the sum of all '1's in the column plus the parity bit must be an even number). These N parity bits are then transmitted in a dedicated overhead byte. The receiving device performs the exact same BIP-N calculation on the data it receives and compares its result with the BIP-N value sent by the transmitter. Any mismatch indicates that one or more bit errors have occurred.
Live SDH performance gauges
Switch layers and windows to see how parity counters evolve into service KPIs.
Hop-by-hop monitoring discovers dirty fiber spans before they destabilise the multiplex.
Recent parity sample showing sporadic noise bursts after a splice.
BIP mismatches
12
Number of parity mismatches counted for this window.
ES
4
Errored seconds inside the window.
SES
1
Severely errored seconds inside the window.
BBE
6
Background block errors outside SES periods.
UAS
0
Seconds of unavailability accumulated in the window.
Availability
99.995%
Resulting availability ratio after excluding UAS seconds.
Clean40 s
No parity mismatches detected in this slice.
Clean payload
- Laser drift on span B⇄C has been isolated; downstream multiplexers still see clean parity.
- Use these counters to dispatch field crews before higher layers lose framing.
Recent parity sample showing sporadic noise bursts after a splice.
Clean
No parity mismatches detected in this slice.
Errored second (ES)
At least one BIP mismatch recorded in the slice.
Severely errored (SES)
Parity bursts exceeded the SES threshold for the slice.
Unavailable (UAS)
Service treated as unavailable because of sustained SES streaks.
SDH/SONET intelligently applies BIP checks at its three main operational layers:
- B1 (BIP-8) – Regenerator Section Level: This check monitors the health of the physical fiber link between two adjacent devices (e.g., between two regenerators). It's located in the Regenerator Section Overhead (RSOH) and provides a hop-by-hop check, allowing for rapid isolation of a faulty cable or failing laser.
- B2 (BIP-N×24) – Multiplex Section Level: This is a much more robust error check located in the Multiplex Section Overhead (MSOH). For an OC-3/STM-1, it's a BIP-24; for an OC-12/STM-4, it's a BIP-96. It monitors the integrity of the entire signal between two multiplexers, spanning multiple regenerator hops. The B2 count is the primary indicator of line quality for network operators.
- B3 (BIP-8) – Path Level: This check is part of the and is calculated only over the payload of a specific Virtual Container. It allows an operator to monitor the quality of a single customer's service (e.g., a DS3 circuit) across the entire network, independent of the health of other services sharing the same physical fiber.
Translating Raw Errors into Standardized Metrics
A raw count of BIP errors is useful, but to manage a large network effectively, these errors are categorized into standardized performance monitoring parameters, typically measured over 15-minute and 24-hour intervals. These parameters help distinguish between minor, transient glitches and severe, service-impacting problems.
- Errored Second (ES): A one-second interval during which one or more BIP errors occurred. It simply answers the question: "Was there any error in this second?"
- Severely Errored Second (SES): A one-second interval where the number of BIP errors exceeds a predefined high threshold, indicating a significant burst of errors. An SES suggests that the quality of service was seriously degraded during that second.
- Background Block Error (BBE): A BIP error that is not part of a Severely Errored Second. BBEs represent isolated, low-level errors that might not impact service but can indicate a deteriorating link that requires future maintenance.
- Unavailable Second (UAS): A second during which the service is considered completely unavailable. A period of unavailability typically begins after a consecutive string of 10 Severely Errored Seconds and ends after a period of 10 consecutive seconds without any SES. This metric is critical for Service Level Agreement (SLA) calculations.
Alarm and Maintenance Signals: The Network's Cries for Help
Beyond performance metrics for "soft" errors, SDH/SONET has a sophisticated system of alarms to report "hard" failures, such as equipment failure or a cut fiber. These signals are crucial for rapid fault localization and triggering protection switching.
Critical Failure Alarms
- Loss of Signal (LOS): The most severe alarm. It indicates that the receiving equipment detects no incoming light on its optical input. This almost always signifies a fiber cut or a complete failure of the upstream transmitting laser.
- Loss of Frame (LOF): The receiver detects incoming light, but it cannot find the valid A1/A2 framing pattern within the data stream. This means it is receiving data but cannot determine the boundaries of the STM/OC frames, making the data unintelligible.
Fault Communication Signals
When a node detects a critical failure, it must inform other nodes in the network. It uses two key maintenance signals to do this:
- Alarm Indication Signal (AIS): Also known as a "blue alarm," this is an "alarm keep-alive" signal. When a node detects a failure like LOS or LOF, it stops trying to send normal data and instead transmits a special, all-'1's AIS signal downstream. This serves a critical purpose: it tells all subsequent nodes, "There is a problem upstream from me." This prevents a chain reaction of alarms, a phenomenon known as an alarm cascade, and allows operators to immediately focus on the root cause (the first node that is not receiving an AIS but is reporting an LOS/LOF failure). AIS is an invaluable tool for fault isolation.
- Remote Defect Indication (RDI): Formerly known as Far End Receive Failure (FERF), RDI is the upstream counterpart to AIS. When a node detects a failure on its input (e.g., receives an AIS), it must notify the nodes upstream that the connection is broken and no longer reliable. It does this by setting a bit in the RDI signal that it transmits in the opposite direction. RDI is carried in the K2 byte for the line layer and the G1 byte for the path layer.