TechnologyOctober 5, 2009

Introduction to Device Level Ring topology for EtherNet/IP

Abstract background

There is a class of industrial control applications where a ring/linear network based on end nodes with integrated 2-port switches is more suitable than conventional star network topology.

The application also has a need for continuous system operation, even when a single fault occurs in the end node, its network interface or cable system. Device Level Ring (DLR) provides fault-tolerant network design for both daisychain and linear topology. This paper provides an introduction to DLR networks describing principle of operation, features and performance together with recommendations for implementing the DLR interface. Anatoly Moldovansky, Sivaram Balasubramanian and Brian Batke

MANY END-USER applications call for devices to be connected to the network using a linear topology where end-devices typically have two Ethernet ports and are connected in sequence, one device to the next. A problem with this approach is that a failure of one node, or a link between two nodes, causes nodes on either side of the failure to be unreachable. By using a ring protocol implemented in the end devices, these devices may be configured in a ring topology so that a single-point failure does not prevent communication between the remainder of the functioning devices.

The DLR protocol provides high network availability in a ring topology and was intended primarily for implementation in EtherNet/IP end-devices that have two Ethernet ports and embedded switch technology. It provides fast network fault detection and reconfiguration in order to support the most demanding control applications. For example, a ring network of 50 nodes implementing the DLR protocol has the worst case fault recovery time of less than 3ms.

Since the DLR protocol operates at Layer 2 (in the ISO OSI network model) the presence of the ring topology and the operation of the DLR protocol are transparent to higher layer protocols such as TCP/IP and CIP, with the exception of a DLR Object that provides a DLR configuration and diagnostic interface via CIP.

A DLR network includes at least one node configured to be a ring supervisor, and any number of normal ring nodes. It is assumed that all the ring nodes have at least two Ethernet ports and incorporate embedded switch technology. Non-DLR multi-port devices �C switches or end-devices �C may be placed in the ring subject to certain implementation constraints. Non-DLR devices will also affect the worst-case ring recovery time.

Supported topologies

The DLR protocol supports a simple, single-ring topology; it has no concept of multiple or overlapping rings. A network installation may however use more than one DLR-based ring provided each of the rings is isolated: DLR protocol messages from one ring must not be present on another ring.

The DLR protocol may coexist with, but does not interface to standard network protocols such as IEEE Spanning Tree (STP, RSTP, MSTP), and also with vendor-specific redundancy protocols. That is, users may construct network topologies with DLR protocol rings connected to switches that are running Spanning Tree or other ring protocols as shown in Fig. 1.

The switches to which the DLR rings are connected may be run with STP/RSTP/MSTP to ensure loop-free operation when redundant paths are present (indicated by the green lines in Fig. 1). STP messages (BPDUs) that are sent by the switch on the DLR ring ports will be blocked by the DLR Ring Supervisor so that the switches do not block the DLR ports.

Switch ports to which DLR devices are connected must be configured properly in order to ensure proper functioning of the network. More complicated topologies combining DLR rings and non-DLR switches running STP/RSTP/MSTP may result in DLR ports being blocked in an undesirable manner.

Normal and abnormal operation

Figure 2 illustrates the normal operation of a DLR network. As it is shown there, each node has two Ethernet ports, and is assumed to have implemented an embedded switch. When a ring node receives a packet on one of its Ethernet ports, it determines whether the packet needs to be received by the ring node itself (e.g., the packet has the node’s MAC address) or whether the packet should be sent out on the node’s other Ethernet port.

The active ring supervisor blocks traffic on one of its ports with the exception of few special frames and does not forward traffic from one port to other. Because of this configuration a network loop is avoided and only one path exists between any two ring nodes during normal operation. The active ring supervisor transmits a Beacon frame through both of its Ethernet ports once per beacon interval (400µs by default). The active ring supervisor also sends Announce frames once per second.

The Beacon and Announce frames serve several purposes:

The presence of Beacon and Announce frames inform ring nodes to transition from linear topology mode to ring topology mode;
Loss of Beacon frames at the supervisor enables detection of certain types of ring faults. (Note that normal ring nodes are also able to detect and signal ring faults);
The Beacon frames carry a precedence value, allowing selection of an active supervisor when multiple ring supervisors are configured.
Figure 3 illustrates the use of Beacon and Announce frames sent by the active ring supervisor.

Failure states

The most common form of link failure includes the following cases:

Link or other physical layer failure recognised by a node adjacent to the failure;
Power failure or power cycling a ring node, recognised by the adjacent node as a link failure;
Intentional media disconnect by user to bring new nodes online or to remove existing ones.
In the above cases, the nodes adjacent to the fault send a Link_Status message to the active ring supervisor. Figure 4 shows ring nodes adjacent to a fault sending a Link_Status message to the active ring supervisor.

After receipt of the Link_Status message, the active ring supervisor reconfigures the network by unblocking traffic on its previously blocked port and flushing its unicast MAC table. The supervisor immediately sends Beacon and Announce frames with the ring state value indicating that the ring is now faulted.

Ring nodes also flush their unicast MAC tables upon detecting loss of the beacon in one direction, or upon receipt of Beacon or Announce frames with the ring state value indicating the ring fault state. Flushing the unicast MAC tables at both supervisor and ring nodes is necessary for network traffic to reach its intended destination after the network reconfiguration. Figure 5 shows the network configuration after a link failure, with the active ring supervisor passing traffic through both of its ports.

In addition to the more common link failures, there is a class of uncommon failures:

Higher level hardware/firmware component(s) on a ring node has failed leading to lost traffic, but the physical layer is functioning normally with power supply intact;
A chain of ring protocol-unaware nodes are connected between protocol-aware nodes, and the failure has occurred somewhere in the middle of this chain. In these cases, the active ring supervisor will detect the loss of Beacon frames first on one port, and eventually on both of its ports. The active ring supervisor will reconfigure the network as above. In addition, the active ring supervisor will send a Locate_Fault frame to diagnose the fault location.
It is possible for a partial network fault to occur such that traffic is lost in only one direction. The active ring supervisor detects a partial fault by monitoring the loss of Beacon frames on one port. When a partial fault is detected the active ring supervisor blocks traffic on one port and sets a status value in the DLR Object. The ring at this point will be segmented due to the partial fault, requiring user intervention.

Certain conditions such as a faulty network connector may cause the active ring supervisor to detect a series of rapid fault/restore cycles. If left to persist, such a condition could result in network instability that might be difficult to diagnose. When the active ring supervisor detects the rapid fault/restore condition (5 faults in a 30 second period), it sets a status value in the DLR Object, and blocks traffic on one port. The user must explicitly clear the condition via the DLR Object.

Classes of DLR implementation

There are several classes of DLR implementation. Detailed requirements for each class of implementation are further specified in subsequent sections.

Ring Supervisor. This class of devices is capable of being a ring supervisor. Such devices must implement the required ring supervisor behaviours, including the ability to send and process Beacon frames at the default beacon interval of 400µs.

Ring Node, Beacon-based. This class of devices implements the DLR protocol, but without the ring supervisor capability. The device must be able to process and act on the Beacon frames sent by the ring supervisor. Beacon-based ring nodes will support beacon rates of 400µs.

Ring Node, Announce-based. This class of devices implements the DLR protocol, but without the ring supervisor capability. In order to accommodate nodes that do not have the capacity to process Beacon frames, ring nodes may simply forward, but not explicitly process, Beacon frames. Such nodes must process Announce frames.

Table 1 summarises variables used in the DLR protocol behaviour and messages. Refer to the subsequent sections on Ring Node and Ring Supervisor behaviour and DLR specification for further details. The DLR Object exposes these variables (with the exception of the Node State) via object attributes.

Ring Supervisor

Startup. An enabled ring supervisor will start in FAULT_STATE and configure both ports to forward frames. The supervisor will send Beacon frames out both of its ports, with the Ring State set to RING_FAULT_STATE. The supervisor will also send Announce frames out both of its ports with the Ring State set to RING_FAULT_STATE.

Once the Beacon frames are received through both ports the supervisor will transition to NORMAL_STATE, flush its unicast MAC address table and reconfigure one of its ports not to forward packets, except for the following, which will be forwarded to the host for processing:

Beacon frames with the supervisor’s own MAC address (in general needed only for software implementations);
Beacon frames from other ring supervisors;
Link_Status/Neighbor_Status frames;
Neighbor_Check request or response, and Sign_On: always forward received frames. For frames originated by the supervisor, only forward frames with the Source Port matching the blocked port.
Upon transition to NORMAL_STATE, the Ring State in the Beacon frames will be set to RING_NORMAL_STATE. The ring supervisor will also send an Announce frame out one port, with Ring State set to RING_NORMAL_STATE.

Multiple ring Supervisors. When multiple ring supervisors are configured, each supervisor sends Beacon frames when it comes online. The Beacon frames carry a supervisor precedence value. When a supervisor receives a Beacon frame, it checks the precedence value. If the precedence in the Beacon frame is higher than the receiving node’s precedence value, the receiving node transitions to FAULT_STATE and becomes a backup supervisor. If the precedence values are the same, the node with the numerically higher MAC address becomes the active supervisor.

The backup supervisors configure their DLR parameters with the values obtained from the active supervisor’s Beacon frames: Beacon Interval, Beacon Timeout, VLAN ID. The backup supervisors continue to monitor both ports for timeout of the Beacon frames (no Beacons received within the Beacon Timeout period). If the Beacon has timed out on both ports, the backup supervisor waits for an additional Beacon Timeout period (during which time other nodes transition to linear mode), then begins sending its own Beacons so that a new supervisor can be selected.

Sign on. In order to identify ring protocol participants, the active ring supervisor will send a Sign_On frame when it transitions to NORMAL_STATE.

Normal Ring Operation. When in the NORMAL_STATE, the active ring supervisor will send Beacon frames out both of its ports. It will also send an Announce frame once per second out one port. One of the active supervisor’s ports will be configured not to forward frames, with a few exceptions.

Ring fault detection

One of several possible events will cause the active ring supervisor to transition to FAULT_STATE:

Beacon frame received from another supervisor with a higher precedence value;
Loss of Beacon frames on either port for the period specified by the Beacon Timeout, indicating a break somewhere in the ring;
Detection of loss of link with the neighbouring node on either port;
Link_Status frame received from a ring node, indicating a ring node has detected a fault.
In all of the cases listed above, the active ring supervisor will:
Transition to FAULT_STATE;
Flush its unicast MAC address table;
Unblock the blocked port;
Send Beacon frame out both ports, with Ring State set to RING_FAULT_STATE;
Send Announce frame out both ports, with Ring State set to RING_FAULT_STATE.
In addition, in case 2 above, the active ring supervisor will initiate the Neighbor Check process by issuing a Locate Fault frame. The active ring supervisor will also issue its own Neighbor Check frame through the port(s) on which the beacon has timed out.

When in FAULT_STATE the ring supervisor will continue to send Beacon frames, in order to detect ring restoration.

Ring Restoration. When the active ring supervisor is in FAULT_STATE, receipt of Beacon frames on both ports will cause a transition to NORMAL_STATE.

Beacon vs Announce

Ring nodes (that is, non-supervisor nodes) may have different implementations depending on whether or not they are able to process the Beacon frames which by default are sent every 400µs. Nodes that are able to process the Beacon frames generally have hardware assistance in implementing the DLR protocol, so that they don’t burden the device’s CPU with processing the Beacon frames.

Devices that would need to process the Beacon frames in the device’s CPU can instead configure their embedded switch to simply pass the Beacon frames on the network without interpretation or further processing. Such devices must however process the Announce frames, which also indicate the ring state but are sent at a much slower rate. It is possible to implement a Beacon-based node without hardware assistance, provided the device’s CPU has sufficient capacity to process the Beacon frames in addition to its other required functions.

It is desirable for device implementations to be Beacon-based rather than Announce-based, since better ring recovery performance results when ring nodes are able to process Beacon frames.

How DLR operates

Startup �C Beacon-based. A Beacon-based ring node will start up in IDLE_STATE, which presumes the network is in linear topology mode. Upon receiving a Beacon frame through either port, the node will transition to FAULT_STATE, which presumes the ring topology mode. The ring node will flush its unicast MAC address table and save the ring supervisor parameters from the Beacon frame:

Supervisor MAC address;
Supervisor precedence value;
Beacon timeout;
VLAN ID.
Upon receiving Beacon frames through both ports, the node will transition to NORMAL_STATE and flush its unicast MAC address table.

Startup �C Announce-Based. An Announce-based ring node will start up in IDLE_STATE, which presumes the network is in linear topology mode. Upon receiving an Announce frame through either port, the node will transition to the ring state indicated in the Announce frame. The ring node will flush its unicast MAC address table and save the ring supervisor parameters from the Announce frame:

Supervisor MAC address;
Ring State;
VLAN ID.
Fault Detection. One of several possible events will cause a ring node to transition from NORMAL_STATE.

For Beacon-based nodes:

Receipt of a Beacon frame with the Ring State set to RING_FAULT_STATE;
Receipt of a Beacon frame with a different MAC address and higher precedence than the current ring supervisor;
Loss of Beacon frames on both ports for the period specified by the Beacon Timeout, which causes the node to transition to IDLE_STATE (i.e., the topology is now linear);
Loss of Beacon on a single port for the period specified by the Beacon Timeout.
For Announce-based nodes:

Receipt of an Announce frame with the Ring State set to RING_FAULT_STATE;
Loss of Announce frame for the Announce timeout duration, which causes the node to transition to IDLE_STATE (topology is now linear).
In all of the cases listed above, the ring node will flush its unicast MAC address table and transition to FAULT_STATE (Exception: loss of Beacon on both ports or loss of Announce causes transition to IDLE_STATE).

Ring Restoration. For ring nodes, the process for ring restoration is the same as the Startup case described above.

Sign On Process. The Sign_On frame is used to identify all ring participants. The active ring supervisor will send a Sign_On frame when it transitions to NORMAL_STATE. The active supervisor transmits a Sign_On frame once every one minute while in NORMAL_STATE, until it receives a Sign_On that it sent out previously. Upon receiving such a frame the active supervisor will cease to send further Sign_On frames until next transition into NORMAL_STATE. The collected participant list can be accessed through the DLR Object.

The Sign_On frame is a multicast message transmitted from one port of the active ring supervisor. The receiving ring participant node traps the Sign_On frame and forwards it only to the host CPU. The host CPU increments the number of nodes in list, add its own addresses to list and transmit the Sign_On frame only through the other port than the receiving port. The Sign_On frame is transmitted from one ring participant node to the next in similar fashion and eventually reaches the active ring supervisor. The active ring supervisor can identify the Sign_On frame it sent out by confirming that the first entry is its own.

Neighbor Check Process. When the active ring supervisor detects the loss of Beacon, it sends a Locate_Fault frame through both ports. Upon receipt of the Locate_Fault frame, each ring node issues a Neighbor_Check request through its port on which loss of the Beacon frame was detected (Announce-based nodes send through both ports). The supervisor also issues its own Neighbor_Check request.

When any node receives a Neighbor_Check_Request frame it responds with a Neighbor_Check_Response frame through the port on which original request was received. If the node sending the Neighbor_Check_Request does not receive a response in 100ms, it will retry the request. After three retries, if no response is received the node sends a Neighbor_Status frame to the ring supervisor.

Figure 6 illustrates the Neighbor Check process. In this example, all healthy ring nodes respond, while the failed ring node 3 does not. Ring nodes 2 and 4 will each ultimately send a Neighbor_Status frame to the ring supervisor.

Implementation requirements

The following are general requirements and recommendations for all devices that implement embedded switch technology (whether implemented via commercially-available chips, FPGA, asic, etc.):

IEEE 802.3 operation:

Auto-negotiation, with 10/100Mbps, full/half duplex (Required);
Forced setting of speed/duplex (Required);
Recommended: Turn off flow control on ring ports;
Auto MDIX (medium dependent interface crossover), in both autonegotiate and forced speed/duplex modes.
Quality of Service:

2 queues (Required), 4 (Recommended);
High priority queue for DLR frames, with strict priority scheduling for the high priority queue (Required);
Prioritisation via 802.1Q/D (Required) and DSCP (highly recommended). Usage will be consistent with the EtherNet/IP QoS scheme published in Volume 2. For IP frames the embedded switch should use the DSCP value. For non-IP frames the priority in the 802.1Q header should be used;
Broadcast rate limiting for host CPU (Recommended). The broadcast threshold tolerated by a device is dependent on the host CPU. As a general recommendation, the broadcast rate limiting should be triggered when the broadcast traffic exceeds 1% of bandwidth;
Filtering of incoming unicast and multicast to host CPU (Recommended, but in practice most all devices will require this).
DLR implementation
The following implementation requirements apply to DLR nodes, whether ring supervisors or ring nodes:

Preserve IEEE 802.1Q VLAN Id and tag priority of ring protocol frames;
Disable IP multicast filtering on ring ports or flush multicast filtering table of ring ports on ring state transitions;
Configure multicast address for Beacon frames to be forwarded on ring ports, and to the host CPU for Beacon-based implementations;
Configure multicast address for Announce and Locate_Fault frames to be forwarded to the host CPU and on ring ports;
Configure multicast address for Neighbor_Check_Request/Response and Sign_On to be forwarded only to host CPU;
Implement a mechanism to flag the port through which such a frame was received from ring;
Implement a mechanism to forward such frames from host CPU on to ring only through the port it was intended to go out;
Configure unicast MAC address of active ring supervisor so that supervisor frames forwarded on both ports;
Flush unicast MAC address tables on ring state transitions (or disable learning);
Configure unicast MAC address of self so that it is not purged when MAC address table is flushed;
Implement the Interface Counters and Media Counters attributes of the Ethernet Link Object to aid in network monitoring;
Implement the QoS Object with, at a minimum, DSCP marking of EtherNet/IP traffic generated by the device;
Recommended: configure access control list or another suitable mechanism to remove device’s own frames from network when received (e.g., during ring startup/restoration).
IEEE1588/CIP Sync
In order to support applications requiring time synchronisation (e.g., CIP Motion or CIP Sync), multi-port devices are recommended to support the following capabilities with respect to IEEE 1588/CIP Sync:

Implement IEEE 1588 end-to-end transparent clock;
Devices that also implement 1588 ordinary/boundary clock functionality should perform path delay measurement using Delay_Req/Delay_Resp frames whenever the ring state or network topology mode changes;
Devices with 1588 ordinary/boundary clock functionality and are connected to the ring network indirectly should perform path delay measurement using Delay_Req/Delay_Resp frames per the ODVA-specific IEEE 1588 profile signalling message.
Devices not implementing these features will suffer from poor synchronisation accuracy for short periods after a network reconfiguration.

Performance analysis

In order to provide predictable performance for DLR network fault detection and reconfiguration all DLR network nodes must dedicate the highest priority queue to DLR protocol frames and must implement strict priority scheduling for highest priority queue. With such a configuration, a DLR protocol frame will encounter at most one lower priority frame delay on each node. For performance analysis, assume that all links on network operate at 100Mbps speed and in full duplex mode.

Beacon frames are 64 bytes long including frame check sequence (FCS) and have an on wire overhead of 20 bytes of which 8 bytes are for preamble and start of frame delimiter pattern and 12 bytes are for inter frame gap. Beacon frames with 20 byte on wire overhead take approximately 7µs on wire.

Assume that Beacon frames will be delayed on most nodes by a lower priority frame with an average size of 128 bytes including FCS. In a network with mostly EtherNet/IP traffic, on some nodes Beacon frames may not be delayed at all, in some other nodes they may be delayed by 256 byte frames and in some others it may be delayed by frames between these two extremes, for an average of 128 bytes. A 128- byte frame with 20 byte on wire overhead takes approximately 12µs on wire.

Assume that DLR nodes use store-and-forward switching architecture and that each node has an average internal switching overhead delay of 5µs. Assume propagation delay for 100m copper media of to be 1µs. The total typical delay per node for Beacon frames is therefore about 25µs

Assume that the Beacon frames would also be delayed on 10% of nodes by maximum sized Ethernet frames of 1522 bytes each, or some combination of large frames on more than 10% of nodes that is equal to 10% of nodes with maximum sized frames. Such frames may be present on network for any reason including configuration, HMI, web, etc. A 1522 byte frame with 20 byte on wire overhead takes approximately 124µs on wire. The total maximum delay per node for ring Beacon frames on these nodes is therefore around 137µs.

For a DLR network comprised of 50 nodes, total maximum round trip time for Beacon frames is 1810µs.

For same network, minimum delay per node is when Beacon frame is not delayed by any other frame and would therefore be 13µs which equates to a total minimum round trip time 650µs for a 50-node example network.

In general, lower Beacon interval provides faster ring recovery performance. Beacon interval should be less than half of the fastest connection RPI in the network to prevent connection timeouts. Assume a Beacon interval of 400µs, which constitutes 1.75% of network bandwidth and is suitable for high performance CIP Motion connections with one millisecond RPI and also works for slower I/O connections.

For the network described, following will be the worst case performance for DLR nodes that rely on the Beacon frame mechanism:

Faults that are detectable in Physical layer 1885µs.

Faults that are not detectable in Physical layer 2890µs.

Network restoration to normal mode of operation 2235µs.

When the network described uses the Announce frame mechanism, the comparable figures are:
Physical layer 1885µs.

Faults that are not detectable in Physical layer 3820µs.

Network restoration 4070µs.

Conclusion

The DLR protocol is suitable for EtherNet/IP networks. A DLR network is tolerant to all single-point failures providing high network availability in a single-ring topology. The worst case fault recovery time in a 50-node DLR network is less than 3ms. The low fault recovery time allows utilisation of DLR networks in hard real time control systems.

References
1. IEEE Std 802.3 �C 2005, Part 3: Carrier sense multiple access with collision detection (CSMA/CD) access method and physical layer specifications.
2. IEEE Std 802.1D �C 2004, Media Access Control (MAC) Bridges.
3. IEEE Std 802.1Q �C 2003, Virtual Bridged Local Area Networks.
4. IEEE Std 802.1s �C 2002, Virtual Bridged Local Area Networks �C Amendment 3: Multiple Spanning Trees.
5. The CIP Networks Library, Volume 1, Common Industrial Protocol (CIP), Edition 3.5, December 2008.
6. The CIP Networks Library, Volume 2, EtherNet/IP Adaptation of CIP, Edition 1.6, December 2008.

Anatoly Moldovansky, Sivaram Balasubramanian and Brian Batke are with Rockwell Automation

The full paper was presented at the ODVA2009 CIP Networks Conference, Orlando

www.odva.org