Abstract
Protocol reverse is a solution for detecting and analyzing location or proprietary protocols, and packet clustering for protocol formats is the basic way to identify unknown protocol packets. In this paper, we propose an Unknown Protocol Packet Clustering MethodBased on Format Matching (CUPFC), which introduces the enhanced Barcos paradigm, defines Token Format Distance (TFD) and Message Format Distance (MFD) to represent the format similarity of Token and packets, and introduces Jaccard distance and an optimized sequence alignment algorithm to calculate them. Then, the MFD is used to establish a distance matrix and input it into the DBSCAN model to cluster unknown protocol packets into classes of different formats. On the two simulation datasets, the harmonic mean v measure of clustering is above 0.91, and the FMI and coverage are not less than 0.97, which has great advantages compared with previous work.
Full text
1,042 characters
· extracted from
oa-html
· click to expand
Zhihui kongzhi yu fangzhen (Jun 2025)
Clustering of unknown protocol messages based on format comparison
Abstract
Protocol reverse is a solution for detecting and analyzing location or proprietary protocols, and packet clustering for protocol formats is the basic way to identify unknown protocol packets. In this paper, we propose an Unknown Protocol Packet Clustering MethodBased on Format Matching (CUPFC), which introduces the enhanced Barcos paradigm, defines Token Format Distance (TFD) and Message Format Distance (MFD) to represent the format similarity of Token and packets, and introduces Jaccard distance and an optimized sequence alignment algorithm to calculate them. Then, the MFD is used to establish a distance matrix and input it into the DBSCAN model to cluster unknown protocol packets into classes of different formats. On the two simulation datasets, the harmonic mean v measure of clustering is above 0.91, and the FMI and coverage are not less than 0.97, which has great advantages compared with previous work.
Keywords
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.