Understanding the various components of the SNS SD-WAN
Understanding monitoring parameters
Detection method and port
Two methods for detecting link availability and performance are offered on SNS firewalls:
- TCP Probe: this method is based on requests to the TCP port used by the application server to be reached.
The availability and performance of each link are therefore tested by initiating a connection to the TCP service from the firewall by using the associated port. - ICMP: in this method, ICMP Request packets are regularly sent over each link.
If several application servers are used for traffic covered by an SD-WAN SLA, Stormshield recommends placing these servers in a network object group and using this group as the target of availability tests. In this case, the results of availability tests will be the average of the results of tests to each server.
Timeout (s)
This refers to the maximum length of time to wait for a response to a connection attempt with the chosen detection method.
Past this value, the connection attempt will be considered a failure and the number of attempts will be incremented by one unit, until it reaches the configured number of failures before the target object is declared unreachable or the link is declared degraded (only if SLA thresholds have been configured).
Interval (s)
This is the length of time between two connection attempts.
Failures before degradation
This refers to the maximum number of failed connection attempts before the target object is declared unreachable or the link is declared degraded (only if SLA thresholds have been configured).
Understanding the metrics of the SD-WAN SLA
Latency (ms)
SD-WAN latency on SNS firewalls represents the amount of time between when a packet is sent and when a response to it is received. It is therefore actually a round-trip time (RTT).
This parameter depends greatly on the type of traffic and ISPs.
The Frequency (s) parameter determines how much time passes between two latency measurements.
The latency shown in the SD-WAN real-time monitoring module corresponds to the last latency value measured for each gateway.
Jitter (ms)
Jitter represents how latency varies over time.
It is calculated based on all the latency values measured over the past 10 minutes.
The value shown in the SD-WAN real-time monitoring module therefore corresponds to the average jitter over the past 10 minutes.
Packet loss rate (%)
This is the ratio of the number of connection requests sent to the number of responses received.
On SNS firewalls, the percentage tolerated can be configured to the closest tenth.
It is calculated based on all packets lost during connection tests over the past 10 minutes.
The value shown in the SD-WAN real-time monitoring module therefore corresponds to the average packet loss rate over the past 10 minutes.
Unavailability rate
This is the ratio of how often the gateway is available to how often it is not available.
Strictly speaking, this is not an SD-WAN threshold; its main function is to show statistics about the availability of gateways.
There is therefore no need to enter a maximum value for this parameter.
The value shown in the SD-WAN real-time monitoring module therefore represents the average unavailability rate over the past 10 minutes.
Assessing the values to apply to each metric
It can be tedious and counterproductive to apply thresholds individually to objects used in a filter policy in a production environment due to traffic switching to different links regularly and for unwarranted reasons.
To assess the values to apply to each metric without disrupting production, Stormshield suggests proceeding as follows:
- Create a test router object on which you have set the recommended metric values given by your ISPs and software solution vendors (VoIP, ERP, etc.).
- Use this router object in a neutral filter rule, placed last in the security policy (before the deny all rule, if used), to trigger monitoring on the router and its gateways, and to observe behavior (changing links) relating to the values of the various metrics. To create this rule, refer to the section on Creating the filter rule for VoIP traffic.
- Refine these values until you obtain the desired behavior with regard to the traffic in question.
By doing so, when the values of the metrics change, they do not affect production traffic at all, and you will then be able to refine values as often as you need before adopting them in the filter rule that applies to production traffic.
When you observe the values recorded for the various metrics (steps 2 and 3), do note that the data shown in the SD-WAN monitoring graphs in the SNS web administration interface are stored in a local database, and are then regularly aggregated to reduce the amount of disk space used.
You are therefore advised to use an SNMP-based monitoring solution (such as Zabbix, Centreon, etc.) and on the STORMSHIELD-ROUTE-MIB v4.3.x MIB - which can be downloaded from the Downloads menu in Mystormshield - to observe the real-time values of the various metrics and store these records over longer periods so that the appropriate values can be better refined.