Browse/search for people

Publication - Mr Douglas Harewood-Gill

    Exploring textures in traffic matrices to classify data center communications

    Citation

    Trois, C, Bona, LC, Oliveira, LS, Martinello, M, Harewood-Gill, D, Fabro, MDD, Nejabati, R, Simeonidou, D, Lima, JC & Stein, B, 2018, ‘Exploring textures in traffic matrices to classify data center communications’. in: Leonard Barolli, Tomoya Enokido, Marek R Ogiela, Lidia Ogiela, Nadeem Javaid, Makoto Takizawa (eds) Proceedings - 32nd IEEE International Conference on Advanced Information Networking and Applications, AINA 2018. Institute of Electrical and Electronics Engineers (IEEE), pp. 1123-1130

    Abstract

    Data analytics and scientific computing are two modern applications that in recent years have substantially changed their computation and communication needs, requiring additional processing capability and bandwidth to be able to keep pace with current demands. These applications are commonly processed within data centers, exchanging enormous volumes of data, rapidly stressing existing network infrastructures. Thus, it is crucial for data center operations and management to be able to understand and classify the communication demands of these applications. The traditional approaches for classifying application traffic are port-based and Deep Packet Inspection, both presenting issues with current network technology. Some recent works propose using machine learning plus statistical information collected from application flows to classify traffic. Applications running in data centers present communication patterns which can be recognized through their traffic matrices. So, the main contribution of this paper is a method that explores the textural information extracted from these matrices to classify the data center traffic using machine learning techniques. As a proof-of-concept, we implemented this method in a system named DCTraCS. The experimental dataset was gathered from two real data centers, collecting the traffic matrices of MapReduce and a set of scientific applications every second for a period of 30 minutes. For assessing our proposal, we compared it with other machine learning techniques for classifying application traffic found in current literature. Results show that our approach achieved the highest accuracy, classifying correctly over 99% of our data center applications.

    Full details in the University publications repository