Updated on 2025-09-11 GMT+08:00

Overview

Enterprise IT architectures are becoming increasingly complex, and alarm information from multiple sources such as servers, network devices, and cloud services is in different formats and standards. These raw alarms directly enter into the O&M process, which often flooding systems with unnecessarily data while burying key issues. Furthermore, such data leads to inconsistent fault handling standards, affecting fault response and team collaboration efficiency.

In the standard O&M system of COC, various types of alarms need to be converted into those that can be handled based on a unified process to implement precise notifications and quick response. In addition, response plans can be configured to implement automatic fault recovery. To meet these needs, alarm conversion rules are configured for the standardized alarm handling process to handle alarms from different sources. This improves O&M efficiency and automation.

For details about the configuration items of an alarm conversion rule, see Table 1.

Table 1 Parameters for configuring a trigger rule

Parameter

Description

Trigger Type

The options are Incident and Alarm.

  • Incident: An incident ticket is generated. The on-duty personnel need to handle the incident as soon as possible and continuously track the incident until it is closed.
  • Alarm: Alarms are generated and are manually or automatically based on contingency plans.

Data Source

Select a data source.

A data source is the system where raw alarms are from.

Before configuring alarm rules, ensure that alarm data has been integrated and enabled. When all conditions of a rule are met, the alarm conversion rule is triggered. For details about how to set data sources, see Creating an Alarm Conversion Rule.

Triggering Conditions

Select the key, comparison method, and value for the trigger criteria.

A maximum of five trigger criteria can be added. For details about how to set the keys, see Table 3.

Trigger Criteria

Select a trigger rule.

Incident Level

This parameter is required only when Trigger Type is set to Incident. The options are P1, P2, P3, P4, and P5.

P1 incidents are the most critical, while P5 incidents are the least severe.

Silence Rule

This parameter is required only when Trigger Type is set to Incident. Enable or disable this rule as required.

After an incident is generated based on the alarm conversion rule, a new incident will be generated if the trigger criteria are met before the incident is completed or closed.

Alarm Severity

This parameter is required only when Trigger Type is set to Alarm. The value can be Critical, Major, Minor, or Warning.

Core Principle: Transformation from Heterogeneous Alerting to Standardized Handling

The core logic of an alarm conversion rule is to establish a standard alarm handling mechanism with alarm filtering, cleaning, conversion, distribution, and linkage to implement systematic management and control of raw alarms.

  • Heterogeneous alarm ingestion: COC adapts to interface protocols (such as SNMP, HTTP, and Syslog) of different alarm sources to connect scattered raw alarm information to the rule engine in a unified manner.
  • Cleaning and normalization: Deduplicate raw alarms (remove duplicate alarms), reduce noise (filter out invalid or low-priority alarms), and supplement fields (supplement metadata such as device information and service lines) based on preset rules to ensure information accuracy.
  • Standardization: Cleansed alarms are converted into aggregated alarms based on a unified COC data model (such as the alarm severity, impact scope, and fault type) or escalated to incident tickets that need to be manually handled. In this way, the format and semantics are standardized.
  • Precise distribution and association: Based on the responsibility assignment mechanism (such as the shift schedule and owner list) configured based on rules, standard alarms or incident tickets are pushed to the corresponding handlers. In addition, if a response plan is configured, the preset automatic fault handling actions (such as restarting services and switching to the standby node) can be automatically triggered.

Function: Intelligent and Standardized Alarm Handling Process

  • Flexible rule configuration allows you to customize alarm processing logic. You can configure triggering conditions (for example, the CPU usage exceeds 90% for 5 minutes), alarm handling methods (such as cleaning rules and conversion modes), and distribution policies on the GUI to meet personalized requirements in different service scenarios.
  • Multi-dimensional responsibility allocation mechanism
    • Allocation by shift schedule: Alarms are automatically allocated to on-duty personnel based on the O&M shift schedule, ensuring 24/7 uninterrupted response.
    • Multi-owner collaboration: Multiple owners (such as service owners and technical support personnel) can be specified to implement synchronous notification and collaborative handling of alarms, avoiding responsibility vacuum.
    • Priority-based routing: High-priority alarms can be directly pushed to core owners without following the common process, shortening the response time.
  • The response plan configuration function that is built in automated alarm response and fault handling can be used to associate specific types of alarms with automatic alarm handling processes. For example, if the alarm indicating that the number of database connections exceeds the threshold is triggered, the scale-out script will be automatically executed. If the alarm indicating that the service node breaks down is generated, services will be automatically switched to the standby node. The association between alarms, rules, and response plans enables unattended or automated handling of some faults, reducing manual intervention costs.
  • The entire process can be traced and audited. The entire lifecycle of alarms from alarm ingestion, rule matching, conversion, distribution, to handling is recorded in the audit logs, including the rule triggering time, owner receiving status, and response plan execution result. The audit logs provide basis for O&M optimization and responsibility tracing.
  • The alarm handling process applies to various alerting scenarios and is compatible with a variety of alarm sources (such as physical devices, virtual resources, cloud services, and application systems) and alarm types (such as performance alarms, security alarms, and service alarms). Flexible rule configuration meets alarm handling requirements in different scenarios, support centralized and standardized operations of COC.

Value: Key Support for Improving O&M Efficiency and Fault Response Capabilities

The alarm conversion rule provides the following benefits for you through alarm noise reduction, standard operation process, and intelligent scheduling:

  • Noise reduction: Invalid alarms are filtered out to ensure that O&M personnel focus on key issues.
  • Unified handling standards: Heterogeneous alarms are converted into standard objects to ensure standardized execution process on COC.
  • Quick fault response: Corresponding owners can be accurately notified, and response plans can be automatically associated with faults to shorten the fault detection and resolution time.
  • Efficient collaboration: The responsibility distribution and handling process are clearly defined, and the transparency of cross-team collaboration is improved.