|
TuAT7 |
MR07 |
Online - AI Applications 1 |
|
Chair: Xu, Jin | Shenyang Aerospace University |
|
08:05-08:25, Paper TuAT7.1 | |
>Intrusion Detection System Based on FastICA and Multi-Grained Cascaded Forest |
|
Fei, Jiahui | Nanjing University of Science and Technology |
Zhang, Shuangquan | School of Cyber Science and Engineering |
Lian, Zhichao | Nanjing University of Science and Technology |
Keywords: AIoT, Application of Artificial Intelligence, Machine Learning
Abstract: 随着大数据、微处理器和其他应用的进步,物联网 (IoT) 取得了长足的发展。由于缺乏必要的安全防御机制,物联网设备容易被攻击者针对和控制。他们可以操纵大量的物联网设备,对一个国家或地区的网络基础设施发动DDoS攻击,导致严重的经济损失和社会安全风险。基于深度学习的入侵检测方法通常依赖于大量高质量的训练实例,因此很难将其应用于缺乏足够标记数据的网络流量。传统的机器学习 (ML) 方法在从高维数据中提取和表示特征方面的能力有限,这使得发现数据中的底层结构和模式变得具有挑战性。针对上述问题,该文提出一种融合了快速独立分量分析(FastICA)模块和多粒度级联森林(GcForest)的入侵检测系统(IDS)。通过利用 Fa
|
|
08:25-08:45, Paper TuAT7.2 | |
>Efficient One-Shot Pruning of Large Language Models with Low-Rank Approximation |
|
Xu, Yangyan | Institute of Information Engineering, Chinese Academy of Science |
Cao, Cong | Institute of Information Engineering, Chinese Academy of Science |
Yuan, Fangfang | Institute of Information Engineering, Chinese Academy of Science |
Mi, Rongxin | National Computer Network Emergency Response Technical Team/Coor |
Sun, Nannan | Institute of Information Engineering, Chinese Academy of Science |
Wang, Dakui | Institute of Information Engineering, Chinese Academy of Science |
Liu, Yanbing | Institute of Information Engineering, Chinese Academy of Science |
Keywords: Representation Learning, Deep Learning, Machine Learning
Abstract: Model pruning, as an effective method for compressing large language models (LLMs), has recently attracted considerable attention in the field of natural language processing. However, existing LLM pruning methods have two main drawbacks: (1) Iterative pruning for LLMs with over a billion parameters requires retraining, which leads to significant pruning costs. (2) LLMs Pruning is formalized as a weight reconstruction problem that necessitates second-order information, incurring expensive computations. To address these issues, we propose a novel pruning method named Eplra: efficient one-shot pruning of large language models with low-rank approximation, which efficiently identifies sparse networks in LLMs. Specifically, we design a novel pruning metric based on input activations for the rapid one-shot compression of LLMs. We first incorporate input activations into the calculation of weight importance to promote precise pruning of low-priority weights. Then, we perform local weight comparisons across each output of linear layers to induce uniform sparsity. Next, we expand Eplra into semi-structured pruning patterns to accommodate various acceleration scenarios. Finally, we employ low-rank parametrized update matrices to fine-tune the pruned model, facilitating a swift recovery of model performance. Experimental results on various language benchmark datasets demonstrate that Eplra outperforms the state-of-the-art methods.
|
|
08:45-09:05, Paper TuAT7.3 | |
>Video-Based Examination Anomaly Action Recognition Via Channel-Temporal Model |
|
Peng, Qin | Central China Normal University |
Yao, Huaxiong | Central China Normal University |
Liu, Xinyu | Central China Normal University |
Keywords: Image Processing and Pattern Recognition, Neural Networks and their Applications, Deep Learning
Abstract: With rapid technological advancements in computer vision, the recognition of abnormal behavior during examinations has transitioned from human observation to computer-assisted recognition. Although traditional 2D Convolutional Neural Networks (CNNs) excel in computational efficiency, they need to capture crucial temporal dynamics for comprehensive video analysis more precisely. Nevertheless, 3D CNN-based methods demonstrate promising performance in temporal modeling but impose substantial computational demands and deployment costs. To overcome these challenges, this paper introduces an innovative Examination Anomaly Action Recognition Network named ReTANet. It incorporates cross-channel temporal modeling to capture temporal features within videos. It also employs Multi-Scale Channel Attention to enrich feature representation and extract channel and spatial information, thereby enhancing recognition accuracy without significantly increasing computational complexity and model parameters. Furthermore, this paper introduces the Examination Anomaly Action Dataset, also named the ExamGuard Dataset (EGD), to facilitate model training and evaluation. Remarkably, our model demonstrates superior performance compared to existing mainstream action recognition algorithms on the HMDB-51 dataset. Rigorous ablation studies conducted on the UCF-101 dataset have shown the effectiveness and significance of the proposed module.
|
|
09:05-09:25, Paper TuAT7.4 | |
>KGNet: A Legal Knowledge Enhancement and GlobalPointer Triple Extraction Network |
|
Li, Jinchen | Inner Mongolia University of Finance and Economics |
Li, Yanling | Inner Mongolia Normal University |
Fengpei Ge, Fengpei Ge | Beijing University of Posts and Telecommunications |
Xingxing, Wang | Inner Mongolia Normal University |
Keywords: AI and Applications, Application of Artificial Intelligence, Deep Learning
Abstract: Extracting entity relations is vital in legal artificial intelligence. It automates the mining of triple data from vast legal texts. Current methods face challenges in inaccurately identifying legal named entity boundaries and extracting overlapping relation triples from legal texts. We present KGNet, a model developed to address these issues effectively. Our approach introduces a Word Information Generator Based on BMES tagging combined with the Fusionformer module. This innovation enhances the incorporation of legal domain knowledge into text representations, improving the accuracy of entity recognition. Additionally, we utilize the GlobalPointer decoder, which redefines and decomposes relation triples, thus resolving the issue of overlapping entities. Performance evaluations on a specially constructed judicial document dataset show that KGNet achieves an F1 score of 66.7%, representing an average improvement of 15.3% over baseline models. These results confirm the effectiveness of KGNet in enhancing legal document processing.
|
|
09:25-09:45, Paper TuAT7.5 | |
>Research on Task Assignment of Firefighting UAVs Based on E-CARGO Model (I) |
|
Xu, Jin | Shenyang Aerospace University |
Xiang, Zhiyu | Shenyang Aerospace University |
Zhang, Senyue | Shenyang Aerospace University |
Sun, Yue | Shenyang Aerospace University |
Gao, Beihang | Shenyang Aerospace University |
Keywords: Adaptive Systems, Cooperative Systems and Control
Abstract: 消防无人机技术已成为其中之一 现代消防行动的核心工具,及其 任务 执行的规模和复杂性都在不断扩大。 面对这样的发展,更是变得更加 找到一种有效的方法来确保无人机能够 被指派执行最合适的任务。在这个 研究中,我们使用了 Environment-Class、Agent、Role、Group、 和对象(E-CARGO)模型,以系统地分析 消防无人机的任务分配(FDTA)问题,以及 引入了增强的鲸鱼优化算法 (EWOA) 优化FDTA问题中的路径规划。最后 仿真实验在多样化下进行 地形 展示效率和快速响应的条件 改进算法在不同工作负载下的能力 和环境条件。
|
|
TuAT8 |
MR08 |
Online - Affective and Cognitive Computing 1 |
Regular Papers - Cybernetics |
Chair: Yuan, Desen | ASR Microelectronics Co., Ltd.; University of Electronic Science and Technology of China |
|
08:05-08:25, Paper TuAT8.1 | |
>RPID: Boosting Transferability of Adversarial Attacks on Vision Transformers |
|
Wang, Shujuan | Nanjing University of Science and Technology |
Wang, Panpan | Nanjing University of Science and Technology |
Sun, Yunpeng | Nanjing University of Science and Technology |
Lian, Zhichao | Nanjing University of Science and Technology |
Li, Shuohao | National University of Defense Technology |
Keywords: Image Processing and Pattern Recognition, Machine Learning, Deep Learning
Abstract: Vision Transformers (ViTs) have achieved excellent performance on many computer vision tasks, which has attracted attention of many researchers for their adversarial robustness. As a kind of black-box attack, transfer-based attacks usually use adversarial examples generated by a surrogate model to attack structurally different models. It is practical and poses a certain threat to the application of ViTs in critical security areas. Existing transfer-based attacks against ViTs suffer from weak adversarial transferability and noticeable perceptibility. In this work, we propose a method called Reduce Regional Perturbation Interaction and Differentiated (RPID) attack, which employs two strategies of reducing correlation between regional perturbations and adding differentiated perturbations to produce adversarial examples. Extensive experiments demonstrate that our proposed method improves the transferability of the baseline methods for adversarial attacks against ViTs while maintaining stealthiness.
|
|
08:25-08:45, Paper TuAT8.2 | |
>LESaET: Low-Dimensional Embedding Method for Link Prediction Combining Self-Attention and Enhanced-TuckER |
|
Ding, Lichao | Qilu University of Technology (Shandong Academy of Sciences) |
Zhao, Jing | Qilu University of Technology(ShanDong Academy of Sciences) |
Lu, Kai | Qilu University of Technology (Shandong Academy of Sciences) |
Hao, Zenghao | Qilu University of Technology |
Keywords: Knowledge Acquisition, Representation Learning, Neural Networks and their Applications
Abstract: Knowledge graphs (KGs) provide a structured representation of the real world through entity-relation triples. However, current KGs are often incomplete, typically containing only a small fraction of all possible facts. This involves inferring missing content from existing information, a task known as link prediction. Existing methods in the field of link prediction struggle with controlling the dimensionality of embedding vectors or suffer from overly complex models. In order to tackle these challenges, we introduce a method in this paper, named Low-Dimensional Embedding Method for Link Prediction Combining Self-Attention and Enhanced-TuckER (LESaET). LESaET leverages both self-attention mechanisms and tensor factorization to learn expressive contextual-enhanced representations of KGs. Specifically, LESaET employs the multi-head self-attention mechanism of Transformer as an encoder to capture the mutual information between entities and relationships, and utilizes Enhanced-TuckER as a decoder, ultimately achieving expressive low-dimensional embeddings for link prediction tasks. LESaET demonstrates competitive performance compared to advanced methods on standard datasets.
|
|
08:45-09:05, Paper TuAT8.3 | |
>Towards Adversarial Robustness in Blind Image Quality Assessment with Soft Thresholding Norm |
|
Yuan, Desen | ASR Microelectronics Co., Ltd.; University of Electronic Science |
Wang, Lei | University of Electronic Science and Technology of China |
Keywords: Multimedia Computation, Deep Learning, Media Computing
Abstract: In this study, we address the issue of adversarial robustness within the context of Blind Image Quality Assessment (BIQA), an area of heightened importance due to the inherent susceptibility of Deep Neural Networks (DNNs) to adversarial assaults. Current approaches primarily rely on adversarial training, which, despite its efficacy, imposes a significant computational burden. Our research proposes an alternative strategy known as the Soft Thresholding Norm (ST Norm). This approach counters the 'feature shift' phenomenon, identified by a substantial Euclidean Distance Statistics (EDS) between original and adversarial features, through the imposition of sparse constraints on potential features following batch normalization. This novel method offers several advantages: it reduces the Lipschitz constant yielding smoother models, seamlessly integrates with existing models, and boasts inherent denoising capabilities, thereby effectively mitigating the impact of adversarial perturbations. Results suggest that our approach achieves robustness comparable to adversarial training but with significantly less computational overhead. Moreover, it consistently outperforms other adversarial defense strategies on BIQA datasets, highlighting its practical effectiveness in enhancing adversarial robustness. This study underscores the potential of the Soft Thresholding Norm within the realm of IQA tasks, positioning it as a resource-efficient alternative to traditional adversarial training methodologies.
|
|
09:05-09:25, Paper TuAT8.4 | |
>Efficient Nearest Neighbor Prompt-Based Learning for Few-Shot NER in Manufacturing |
|
Chen, JiaXin | Shenyang Aerospace University |
Wang, Peiyan | Shenyang Aerospace University |
Keywords: Application of Artificial Intelligence, Knowledge Acquisition
Abstract: The NER task in manufacturing is usually lack sufficient labeled data resources. To tackle this issue, this paper presents an effective NN-PLM framework for few-shot NER in manufacturing, which introduce a simple enhancement of the prompt-based learning model using nearest neighbor retrieval. We retrieve the morphologically similar characters for each character to be predicted and then rectifies the prediction. Moreover, we use supervised contrastive learning (SCL) and instance weighting to get better semantic representations of multi-category characters. Compared with the best baseline, our NN-PLM achieves a 7.12% F1 score average improvement on all few-shot settings in manufacturing.
|
|
09:25-09:45, Paper TuAT8.5 | |
>MJR: Multi-Head Joint Reasoning on Language Models for Question Answering |
|
Li, Shunhao | South China Normal University |
Chen, Jiale | South China Normal University |
Yan, Enliang | South China Normal University |
Zhan, Choujun | South China Normal University |
Wang, Fu Lee | Hong Kong Metropolitan University |
Hao, Tianyong | South China Normal University |
Keywords: Deep Learning, Neural Networks and their Applications, Expert and Knowledge-Based Systems
Abstract: Language Models (LMs) have achieved impressive success in various question answering (QA) tasks but have shown limited performance on structured reasoning. Recent research suggests that Knowledge Graph (KG) can augment text data by providing a structured background to enhance reasoning capabilities of LMs. Therefore, how to integrate and reason over KG representations and language context remains an open question. In this work, we propose MJR, a novel model to integrate encoded representations of LMs and graph neural network through multiple layers of feature interaction operations. Subsequently, the fused feature representations in two modalities are fed into a multi-head representation fusion module to comprehensively capture semantic and graph structure information, thereby enhancing language understanding and reasoning capabilities. In addition, we investigate the performance and applicability of different types of large language models as text encoder in the question-answering task. We evaluate our model on three common dataset: CommonsenseQA, OpenBookQA, and MedQA-USMLE datasets. The results demonstrate the advancements of MJR over existing LMs, LM+KG and LLMs models in reasoning for question answering.
|
|
TuAT9 |
MR09 |
AI Applications 8 |
Regular Papers - Cybernetics |
Chair: Liu, Shanwen | College of Computer Science, Sichuan Normal University |
|
08:45-09:05, Paper TuAT9.3 | |
>Robotic Manipulator Motion Planning Based on Global Path Guidance Reinforcement Learning in Dynamic Obstacle Environment |
|
Liu, Shixian | Chinese Academy of Sciences |
Zhang, Jinhan | Institute of Automation, Chinese Academy of Sciences |
Shanlin, Zhong | Institute of Automation, Chinese Academy of Sciences |
Chen, Jiahao | Institute of Automation, Chinese Academy of Sciences |
Zhengyu, Liu | Institute of Automation, Chinese Academy of Sciences |
Wu, Wei | Institute of Automation, Chinese Academy of Sciences |
Keywords: Application of Artificial Intelligence, Deep Learning, Machine Learning
Abstract: Friendly robots have extremely important application prospects in many fields. However, in unstructured environment, the interaction between manipulator and dynamic environments faces the problems of high uncertainty caused by random invasion of work space and computational complexity brought by multi-dimensional action space. Therefore, we propose a hierarchical planning algorithm based on global path guidance reinforcement learning to solve this problem from the decision and planning level. Specifically, the global path planning algorithm first produces a global reference path that ensures the target can be reached. Then the reference path is decomposed into consecutive local targets, which are combined with the objective function of reinforcement learning as local constraints. Finally, the reinforcement learning local planner generates the action of the manipulator based on the observed information. The simulation results show that our method is superior to the standard off-policy reinforcement learning algorithm in terms of learning speed and accuracy, which proves the effectiveness of our algorithm.
|
|
09:05-09:25, Paper TuAT9.4 | |
>MFFDR: An Advanced Multi-Branch Feature Fusion and Dynamic Reconstruction Framework for Enhancing Adversarial Robustness |
|
Liu, Shanwen | College of Computer Science, Sichuan Normal University |
Guo, Rongzuo | Sichuan Normal University |
Zhang, Xianchao | Sichuan Normal University |
Keywords: Application of Artificial Intelligence, Deep Learning, Machine Learning
Abstract: Deep Neural Networks (DNNs) are highly susceptible to adversarial noise, which can lead to erroneous predictions. In high-stakes scenarios, such as autonomous driving and medical diagnosis, DNNs inaccuracies can be dire. To address this issue, Adversarial Training (AT) has been widely adopted as an effective defense method. However, our analysis reveals two critical flaws in the traditional AT approach that hinder its adversarial robustness: (1) focus only on a subset of robust features during the training process. This narrow focus limits the model's ability to learn and perceive a diverse range of features. (2) tend to overlook potential cues in non-robust features that could be beneficial for the model to make correct predictions. These cues, referred to as "positive activations" for simplicity, contain valuable information that can enhance the model's perception and understanding of the input data. In this way, we propose a novel and plug-and-play framework called Multi-branch Feature Fusion and Dynamic Reconstruction (MFFDR), which leverages multi-branch attention mechanisms to enhance the model's perception of robust features and enrich the diversity of learned features. Moreover, we employ a dynamic weighting strategy to reconstruct non-robust features in order to utilize the positive activations embedded within them. Extensive experiments demonstrate that our method significantly improves the model's adversarial robustness and outperforms previous state-of-the-art methods.
|
|
09:25-09:45, Paper TuAT9.5 | |
>BTP-CAResNet: An Encrypted Traffic Classification Method Based on Byte Transfer Probability and Coordinate Attention Mechanism |
|
Li, Junhao | Qilu University of Technology (Shandong Academy of Sciences) |
Zhang, Wei | Qilu University of Technology (Shandong Academy of Sciences) |
Shi, Huiling | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Application of Artificial Intelligence, AI and Applications, Neural Networks and their Applications
Abstract: With the extensive application of network traffic encryption technology, the accurate and efficient classification of encrypted traffic has become a critical need for network management. Deep learning has become the predominant method for traffic classification, primarily involving the transformation of network traffic into grayscale images and their subsequent classification using Convolutional Neural Networks (CNNs). However, traditional grayscale image generation methods are plagued with issues of redundant and lost information, and conventional channel attention mechanisms are still insufficient in capturing key traffic features, collectively hindering the enhancement of classification performance. To tackle these issues, this paper introduces a classification method based on Byte Transfer Probability and Coordinate Attention Mechanism in Residual Network (BTP-CAResNet). This method, on the foundation of the classic ResNet architecture, incorporates a new grayscale image generation method that utilizes Byte Transfer Probability, effectively overcoming the deficiencies of traditional approaches. Additionally, this paper integrates a Coordinate Attention Mechanism into the ResNet model, which effectively overcomes the limitations of traditional channel attention mechanisms and further improves the performance of traffic classification. Experimental validation on the ISCX VPN-nonVPN dataset demonstrates that, compared to previous CNN-based methods, the method proposed in this paper exhibits superior performance in key metrics such as accuracy, precision, recall, and F1 score. It provides a new perspective for traffic classification based on convolutional neural networks.
|
|
TuAT13 |
Foyer |
2P - AI Applications |
|
Chair: Yagi, Naomi | University of Hyogo |
|
08:05-08:25, Paper TuAT13.1 | |
>CiRA CORE: A Low Code Platform That Makes AI Work for Industry 4.0 |
|
Loo, ChuKiong | University of Malaya |
Boonsang, Siridech | King Mongkut’s Institute of Technology |
Sasisaowapak, Thanyathep | King Mongkut’s Institute of Technology |
Chuwongin, Santhad | King Mongkut’s Institute of Technology |
Tongloy, Teerawat | King Mongkut’s Institute of Technology |
Nahavandi, Saeid | Swinburne University of Technology |
Wong, Kok Wai | Murdoch University |
Keywords: Application of Artificial Intelligence, Cloud, IoT, and Robotics Integration, AIoT
Abstract: CiRA CORE is a central hub designed to connect AI technology creation with practical application, making it easier to work with ROS (Robot Operating System) and link different systems through a user-friendly drag-and-drop interface. This approach removes the need for extensive coding, making the platform accessible to those with minimal programming experience. CiRA CORE offers a comprehensive suite of features for AI development and robot control, including algorithm creation, AI model training, and device integration commonly used in industrial settings. It supports tasks like image recognition and facilitates data storage, labeling, and integration with other systems for data-driven AI development. Overall, CiRA CORE aims to democratize AI development and robot control, simplifying AI development for Industry 4.0 applications, and leading to increased efficiency, reduced costs, and improved safety in industrial processes. This paper reports the progress of the CiRA CORE training modules funded by the SMCS TEAM Program Award. The project has completed the design of a 6-axis robot 3D training kit and simulation models for CiRA CORE training modules. The next steps involve developing 3D-printed robots and training materials. The main goal is to democratize advanced robotics and AI by simplifying integration through a visual, node-based programming interface. This approach reduces the need for complex coding, making these technologies accessible to users with limited programming experience. This initiative aims to foster widespread adoption in business and industrial settings, aligning with IEEE SMC's mission to promote professional growth and innovation in robotics and AI.
|
|
08:25-08:45, Paper TuAT13.2 | |
>Towards an Optimal Design: What Can We Recommend to Elon Musk |
|
Ceberio, Martine | The University of Texas at El Paso |
Kosheleva, Olga | University of Texas at El Paso |
Kreinovich, Vladik | University of Texas at El Paso |
Nguyen, Hung T. | New Mexico State University |
Keywords: Consumer and Industrial Applications, Large-Scale System of Systems, Manufacturing Automation and Systems
Abstract: Elon Musk's successful "move fast and break things" strategy is based on the fact that in many cases, we do not need to satisfy all usual constraints to be successful. By sequentially trying smaller number of constraints, he finds the smallest number of constraints that are still needed to succeed -- and using this smaller number of constrains leads to a much cheaper (and thus, more practical) design. In this strategy, Musk relies on his intuition -- which, as all intuitions, sometimes works and sometimes doesn't. To replace this intuition, we propose an algorithm that minimizes the worst-case cost of finding the smallest number of constraints.
|
|
08:45-09:05, Paper TuAT13.3 | |
>Development of Tracking System for Swallowing Movement Using Optical Flow |
|
Yagi, Naomi | University of Hyogo |
Nishihara, Ryosuke | University of Hyogo |
Kawamura, Naoko | Himeji Dokkyo University |
Maezawa, Hitoshi | Kansai Medical University |
Kashioka, Hideki | National Institute of Information and Communications Technology |
Hirata, Masayuki | Osaka University |
Yanagida, Toshio | National Institute of Information and Communications Technology |
Sakai, Yoshitada | Kobe University |
Hata, Yutaka | University of Hyogo |
Keywords: AI and Applications, Application of Artificial Intelligence, Computational Intelligence
Abstract: Currently, population in Japan has been aging at a speed unparalleled in other countries, and countermeasures against aging population and the worsening of disease for people with disabilities have become urgent issues. Pneumonia and aspiration pneumonia are the leading causes of death. It is said that swallowing function tends to decline from around age of 40, however it is important to keep it in good condition without deteriorating function as much as possible. The gold standard for swallowing functional evaluation is swallowing contrast testing, however X-ray exposure disables to repeat testing. In addition, Repetitive Saliva Swallowing Test (RSST) of screening test is difficult for self-check. Therefore, in this study, we develop a system to self-evaluate swallowing ability for keeping swallowing function healthy. It is proposed by applying optical flow and artificial intelligence of DeepLabcut. As a result, we were able to visualize movement of the larynx during swallowing.
|
|
09:05-09:25, Paper TuAT13.4 | |
>The Improved Mango Plant Detection Model Based on Attention Module Mechanism |
|
Sung, Wen-Tsai | National Chin-Yi University of Technology |
Isa, Indra Griha Tofik | National Chin-Yi University of Technology |
Keywords: AIoT, Computational Intelligence, Soft Computing, Socio-Economic Cybernetics
Abstract: Agriculture is one of the sources of income a region can rely on to support its economy. Traditional agriculture relies primarily on human performance and observation, resulting in greater production costs and, subsequently, higher selling prices. Artificial intelligence-based technology can be used to reduce production costs, increase productivity, and provide consumer convenience. An indicator that is easy to interpret in measuring the quality and optimization of plant growth is the visualization of the condition of the leaves. The artificial intelligence technique that can be implemented in this regard is the object detection model. However, the challenge is the complex, multi-object, and multi-intersection condition of the leaves, which causes the model to be less optimal in conducting classification and detection tasks regarding whether the leaf condition is good or not. A YOLOv7 model will be employed in order to detect leaf quality, whether in an “optimal” or “not optimal” condition. To enhance the model's performance by improving accuracy through feature extraction enhancement, YOLOv7 will be integrated with the attention module, called the convolutional block attention module (CBAM). The case study in this research is detecting a mango plant which is one of the plants that can provide a high economic impact and the object observed is the mango plant leaf. Several previous studies related to the implementation of attention modules in object detection include the improved pest-YOLO for real-time pest detection by combining YOLOv3 with efficient channel attention (ECA) and a transformer encoder. The ECA module and transformer encoder were integrated into the backbone and neck block systems of YOLO [1]. The lightweight YOLO model combined with SE-CSPGhostnet by improving the backbone block which employs squeeze-and-excitation networks (SENet) and a convolution technique consisting of regular convolution and ghost convolution [2]. There is a highlighted improvement of YOLOv7 compared to the previous version of YOLO, which is Extended Efficient Layer Aggregation Networks (E-ELAN). YOLOv7's learning ability is enhanced by using this network while maintaining the transition layer's architecture. E-ELAN enhance
|
|
09:25-09:45, Paper TuAT13.5 | |
>AI-Enhanced Web Form Development: Tackling Accessibility Barriers with Generative Technologies |
|
Saraswathi, Pradeep Kumar | Salesforce |
Keywords: Assistive Technology, User Interface Design, Companion Technology
Abstract: Web forms play a pivotal role in digital interfaces but frequently pose significant accessibility challenges. This paper explores the main barriers to creating accessible web forms and investigates how generative AI technologies can provide solutions. We highlight core issues such as accurate labeling, keyboard navigation, error management, focus control, visual design factors, placeholder text usage, assistive technology compatibility, handling of complex inputs, responsive design, cognitive load reduction, and ongoing testing. For each of these challenges, we assess its effect on accessibility and present innovative AI-driven strategies. Our findings illustrate how AI can streamline the development process by automating label generation, improving tab indexing, enhancing real-time error detection, refining focus control, offering contrast improvement suggestions, and simulating interactions with assistive technologies. We conclude that incorporating generative AI into web form development can markedly improve accessibility, making digital experiences more inclusive for users of all abilities. This not only supports compliance with legal and ethical standards but also fosters a more inclusive online environment, enhancing user satisfaction and overall experience.
|
|
TuAPSR |
Room T14 |
Poster Presentation - Session 1 |
Poster Session |
|
08:05-09:45, Paper TuAPSR.1 | |
>UAVs for Sustainable Palm Oil Production: An Ant Colony Approach to Efficient Path Planning |
|
Lai, Weng Kin | Tunku Abdul Rahman University of Management and Technology |
Chen, Pak Hen | Tunku Abdul Rahman University of Management and Technology |
Lim, Li Li | Tunku Abdul Rahman University of Management and Technology |
Lee, Patrick Sheng Siang | AONIC |
Keywords: Application of Artificial Intelligence, Swarm Intelligence, AI and Applications
Abstract: The production of palm oil on a commercial scale is labour intensive with many of its processes handled by humans. In some countries, there can be as many as 500,000 plantation workers in the palm oil sector involved in labour intensive work in large plantations. However, such dependence on humans for low skill manual work has led to many problems. Unmanned aerial vehicles (UAVs) have been seen as a possible alternative to support some of these processes that require low skills in the palm oil industry. However, the flying time of the UAVs is finite and hence it is important to maximize the number of palm trees that it can service. In this paper, an Ant Colony System (ACS) with a novel path constructor was used to identify good flight paths for UAVs in large palm oil plantations to help improve the efficiency for some of the agricultural activities. Good results were obtained for various data sets especially when compared with the standard ACS as well as those by the human experts.
|
|
08:05-09:45, Paper TuAPSR.2 | |
>Incremental Learning Algorithms for Broad Learning System with Node and Input Addition |
|
Chen, Guang-Ze | University of Macau |
Jin, Junwei | Henan University of Technology |
Sun, Hai-Wei | University of Macau |
Chen, C. L. Philip | University of Macau |
Keywords: Computational Intelligence, AI and Applications, Machine Learning
Abstract: The Broad Learning System (BLS) has been established as an effective flat network alternative to Deep Neural Networks (DNNs), delivering high efficiency while achieving competitive accuracy. Despite its advantages, the incremental learning methods of BLS face challenges in stability and computation when expanding with new nodes or input. We introduce two novel incremental learning algorithms based on factorization updates for BLS that optimize node and input additions to overcome these limitations. Our node addition algorithm utilizes QR decomposition and Cholesky factorization, using the update of the Cholesky factor instead of pseudo-inverse computations. For input addition, we propose an iterative Cholesky factor update algorithm. Our algorithms demonstrate not only faster computation compared to the existing BLS but also improved testing accuracy on the MNIST or Fashion-MNIST dataset. This work presents a significant step forward in the practical application and scalability of BLS in various data-dense environments.
|
|
08:05-09:45, Paper TuAPSR.3 | |
>RTS-DETR: Efficient Real-Time DETR for Small Object Detection |
|
Li, Wenqiang | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Aimin | Qilu University of Technology |
Li, Zhiyao | Qilu University of Technology (Shandong Academy of Sciences) |
Kong, Xiaotong | Qilu University of Technology (Shandong Academy of Sciences) |
Zhang, Yuechen | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Deep Learning, AI and Applications
Abstract: In recent years, object detection models DETRs based on Transformer architecture have played a huge role in various fields. However, the DETR series models are not satisfactory in small object detection. Mainly due to the huge amount of calculation of DETR, a lot of feature information will be lost in the feature fusion stage and the low tolerance of small objects to Intersection over Union (IoU). In order to solve the above problems, we propose a near real-time detection model RTS-DETR. In this paper, we revisit RT-DETR, which effectively handles multi-scale features by decoupling intra-scale interactions and cross-scale fusion, but this will lose a lot of positive local information. To this end, we have improved the efficient hybrid encoder. We propose a new positional encoding method that enables the hybrid encoder to more accurately convert the input feature sequence into a high-dimensional representation, and propose a new feature fusion module to enhance the model's ability to capture local features. Furthermore, in order to improve the tolerance of small objects to IoU, we combine Normalized Wasserstein Distance (NWD) with Shape-IoU for the optimization model. This method more accurately takes into account the shape and size of objects, thereby improving detection accuracy. Our model achieves an accuracy of 38.8% (in terms of mAP_{@0.5}) on the widely used VisDrone dataset, which improves the accuracy by 2.5% compared to RT-DETR with ResNet-18 as the backbone network.
|
|
08:05-09:45, Paper TuAPSR.4 | |
>Synergizing Internal and External Knowledge: Prompt Engineering for Efficient and Effective Large Language Model Reasoning |
|
Lu, Gewei | Shanghai Jiao Tong University |
He, Chaofan | Shanghai Jiao Tong University |
Shen, Liping | Shanghai Jiao Tong University |
Keywords: Application of Artificial Intelligence, Deep Learning, Knowledge Acquisition
Abstract: Large language models (LLMs), such as ChatGPT, have demonstrated remarkable capability in question answering but face challenges when it comes to knowledge-based reasoning, such as limited training data and hallucination. To address these challenges, integrating LLMs with knowledge graphs (KGs) has emerged as a promising solution. However, the cost associated with training and inference of LLMs is high. Our method integrates the Retrieval-Augmented Generation (RAG) paradigm, incorporating relevant information from KGs alongside the question to enhance LLMs' reasoning process without training. Moreover, we propose a novel concept of self-knowledge motivation to reduce the overhead of inference, which prompts LLMs to integrate retrieved information with their internal knowledge for reasoning before seeking additional queries to KGs. Experimental results showcase improvements in answer accuracy and a reduction in LLMs' API calls compared to the latest published state-of-the-art (SOTA) method employing an identical paradigm, underscoring the efficiency and effectiveness of our method.
|
|
08:05-09:45, Paper TuAPSR.5 | |
>Try-Then-Eval: Equipping an LLM-Based Agent with a Two-Phase Mechanism to Solve Computer Tasks |
|
Cao, Thanh-Duy | Ho Chi Minh University of Science, VNU-HCM |
Nguyen, Phong Phu | University of Science - VNUHCM |
Le, Vy | University of Information Technology |
Nguyen, Long | University of Science, Ho Chi Minh City, Vietnam |
Nguyen, Vu | University of Science, Vietnam National University |
Keywords: Application of Artificial Intelligence, Computational Intelligence, Neural Networks and their Applications
Abstract: Building an autonomous intelligent agent capable of carrying out web automation tasks from descriptions in natural language offers a wide range of applications, including software testing, virtual assistants, and task automation in general. However, recent studies addressing this problem often require manually constructing of prior human demonstrations. In this paper, we approach the problem by leveraging the idea of reinforcement learning (RL) with the two-phase mechanism to form an agent using LLMs for automating computer tasks without relying on human demonstrations. We evaluate our LLM-based agent using the MiniWob++ dataset of web-based application tasks, showing that our approach achieves 85% success rate without prior demonstrations. The results also demonstrate the agent's capability of self-improvement through training.
|
|
08:05-09:45, Paper TuAPSR.6 | |
>Decrease the Prompt Uncertainty: Adversarial Prompt Learning for Few-Shot Text Classification |
|
Weng, Jinta | School of Cyber Security, University of Chinese Academy of Scien |
Zhang, Zhaoguang | Guangzhou University |
Jing, Yaqi | National Computer Network Emergency Response Technical Team/Coor |
Niu, Chenxu | China |
Huang, Heyan | School of Computer Science and Technology, Beijing Institute Of |
Hu, Yue | School of Cyber Security, University of Chinese Academy of Scien |
Keywords: Artificial Social Intelligence, AI and Applications, Machine Learning
Abstract: With few-shot learning abilities, pre-trained language models (PLMs) have achieved remarkable success in classification tasks. However, recent studies have shown that the performance of PLM is vulnerable due to different prompts and the instability of the prompt-based learning process. To address this challenge, we explore appropriate perturbation addition of adversarial training and integrate the global knowledge of the full-parameter fine-tuned pre-trained language model(PLM). Specifically, we propose an adversarial prompt learning model (ATPET) and ATPET with fine-tuning(ATPETFT), incorporating ATPET with fine-tuning knowledge into the prompt learning process. Through extensive experiments on several few-shot classification tasks and challenging data settings, we demonstrate that our methods consistently improve the robustness while maintaining the effectiveness of PLMs.
|
|
08:05-09:45, Paper TuAPSR.7 | |
>Enhancing Autofocus Performance through Predictive Motion-Targeting and Self-Attention in a Deep Reinforcement Learning Framework |
|
Wei, Xiaolin | Chongqing University |
Yang, Ruilong | Chongqing University |
Wu, Xing | Chongqing University |
Wang, Chengliang | Chongqing University |
Wang, Haidong | Southwest Hospital of Army Medical University |
Wang, Hongqian | Southwest Hospital of Army Medical University |
Tang, Tao | Chongqing University |
Keywords: Image Processing and Pattern Recognition, AI and Applications, Neural Networks and their Applications
Abstract: In focusing tasks on moving targets, traditional methods that rely on maximizing contrast struggle to capture moving objects due to insufficient focusing speed. Deep learning-based methods have attempted to directly predict the optimal focal length for the target; however, due to low prediction accuracy, they often lead to out-of-focus situations when capturing moving objects. In recent years, some approaches have utilized reinforcement learning to automatically explore focal length adjustment patterns, thus achieving better results than traditional methods. However, these approaches have not considered the motion characteristics of the targets, leading to a need for further improvement in focusing performance. To overcome these limitations, we introduce a motion-based feature and deep reinforcement learning-driven autofocus algorithm named MF-DRLAF for moving targets. This novel method tracks the object, predicts its motion state through feature extraction, and uses deep reinforcement learning to dynamically adjust the focus. We utilize a self-attention mechanism to adaptively learn various motion patterns and employ a feature pool structure to enhance processing efficiency. Experiments and real-world testing on a Google Pixel3 demonstrate that our approach significantly enhances autofocus performance on moving objects, highlighting its potential for broader imaging applications. This approach offers a promising direction for future development in autofocus technology.
|
|
08:05-09:45, Paper TuAPSR.8 | |
>Fractional Order Controller Design for LFC of Two-Area Interconnected Power System with Time Delay Based on IMC Approach |
|
K, Gnaneshwar | PDPM IIITDM Jabalpur |
Padhy, Prabin Kumar | PDPM IIITDM Jabalpur |
Keywords: System Modeling and Control, Intelligent Power Grid, Control of Uncertain Systems
Abstract: Load frequency control (LFC) of a two-area connected electric power system is vital for maintaining grid stability and reliability by matching power generation with load demand. Thus, this work proposes an analytical approach for designing a fractional order (FO) controller to regulate the LFC of a two-area connected electrical power system with time delay. First, the interconnected electrical power system is accurately modelled as a FO system with time delay. Then, the FO controller is designed using the internal model control (IMC) technique, where a low-pass filter (LPF) is considered to mitigate the effect of the disturbances. The tuning parameter of the designed FO involves a single tuning parameter, which is analytically designed using gain crossover frequency criteria. The disturbance and parametric uncertainty analyses have been carried out to analyze the efficacy of the proposed method under the variation of tuning parameter. Then, the frequency and tie-line power fluctuations are estimated under nominal and parametric uncertainty conditions. Also, its performance has been compared to recent state-of-the-art techniques for precise efficacy analysis.
|
|
08:05-09:45, Paper TuAPSR.9 | |
>SELus: Towards Spatio-Temporal Modeling and Quantitative Evaluation for Cyber-Physical Systems |
|
Zhang, Quanguo | East China Normal University |
Liu, Jing | East China Normal University |
Liu, Mingxing | Nuclear Power Institute of China |
Huang, Yanhong | East China Normal University |
Hou, Rongbin | Nuclear Power Institute of China |
Shi, Jianqi | East China Normal University |
Keywords: System Modeling and Control, Cyber-physical systems, Modeling of Autonomous Systems
Abstract: Synchronous language is routinely used to model safety-critical control systems. In recent years, it is gradually being applied to cyber-physical systems (CPS) which emphasise high levels of correctness and safety. It is based on the assumption that the system reacts instantaneously to input events and can compute the output before the next input event, so it is well suited for expressing temporal logic. However, it lacks effective constructs for expressing spatial properties in CPS. Moreover, spatio-temporal properties in CPS are indispensable, requiring not only qualitative analysis but also quantitative analysis. Therefore, we propose SELus, a new synchronous language based on Lustre, to provide the capability of modeling spatio-temporal properties in CPS, enabling the representation of spatial topological relationships and the performance of quantitative analysis on them. To formally verify the SELus model, we introduce a set of mapping rules to transform the SELus model into the Ptolemy II model. The resulting Ptolemy II model is used in Ptolemy II to perform quantitative analysis of the SELus model. Experiments are conducted on lane changing system, showcasing the usability and effectiveness of our language.
|
|
08:05-09:45, Paper TuAPSR.10 | |
>Wheeled Mobile Robots on Rough Terrains As Stochastic Nonholonomic Systems |
|
Gzenda, Vaughn | Carleton University |
Chhabra, Robin | Carleton University |
Keywords: Control of Uncertain Systems, Modeling of Autonomous Systems, Robotic Systems
Abstract: In this paper, we investigate the motion of wheeled mobile robots on rough terrains modeled as noisy nonholonomic constraints. Such constraints are the natural extension of ideal nonholonomic constraints when the Stratonovich process is directly introduced in the constraint equations. The resulting stochastic model can capture motion on rough surfaces, random deformation in the wheel-ground contact, or stochastic loss/gain of traction. We study a differential robot with ideal noisy and affine noisy constraints, where each case models a certain aspect of motion on rough terrains. We then investigate their corresponding stochastic dynamics and the propagation of mean and covariance through Monte-Carlo simulations. The proposed model for roving rough terrains has the potential to serve as the stochastic model employed in model-based motion planning, pose estimation, and control of rover systems. The main challenge will be dealing with the nonlinear appearance of the noise and its feedback in the equations of motion.
|
|
08:05-09:45, Paper TuAPSR.11 | |
>Energy-Efficient Hybrid Model Predictive Trajectory Planning for Autonomous Electric Vehicles |
|
Ding, Fan | Monash University |
Luo, Xuewen | Monash University |
Li, Gaoxuan | Monash University |
Tew, Hwa Hui | Monash University Malaysia |
Loo, Junn Yong | Monash University Malaysia |
Chor, Wai Tong | Tunku Abdul Rahman University of Management and Technology |
Bakibillah, A. S. M. | Tokyo Institute of Technology |
Zhao, Ziyuan | I2R,A*STAR |
Tao, Zhiyu | National Science Library, Chinese Academy of Sciences; Departmen |
Keywords: Autonomous Vehicle, System Modeling and Control, Modeling of Autonomous Systems
Abstract: To tackle the twin challenges of limited battery life and lengthy charging durations in electric vehicles (EVs), this paper introduces an Energy-efficient Hybrid Model Predictive Planner (EHMPP), which employs an energy-saving optimization strategy. EHMPP focuses on refining the design of the motion planner to be seamlessly integrated with the existing automatic driving algorithms, without additional hardware.It has been validated through simulation experiments on the Prescan, CarSim, and Matlab platforms, demonstrating that it can increase passive recovery energy by 11.74% and effectively track motor speed and acceleration at optimal power. To sum up, EHMPP not only aids in trajectory planning but also significantly boosts energy efficiency in autonomous EVs.
|
|
08:05-09:45, Paper TuAPSR.12 | |
>A Novel Information-Theoretic Metric for Evaluating LiDAR Setups of Autonomous Vehicles |
|
Hafemann, Philipp | Technical University Munich |
Song, Xulin | Technical University Munich |
Brecht, David | Technical University of Munich |
Keywords: Autonomous Vehicle, Modeling of Autonomous Systems, Intelligent Transportation Systems
Abstract: The sensor configuration of an autonomous vehicle (AV) is determined in the early development phase when specific perception algorithms are not yet available. Therefore, approaches based on synthetic raw data are necessary to evaluate different configurations. One sensor type used in AV is LiDAR, but developers should carefully consider the amount and placement of the sensors due to their high costs. In this contribution, we propose the Omni-Lidar Evaluation Score (OLES), a novel metric to evaluate different LiDAR configurations based on their simulated raw data. Our OLES metric combines information-theoretic quantities with coverage-based metrics, considering both the spatial coverage and the uniformity of a LiDAR point cloud distribution. We show the need for a new metric and provide details on implementing OLES using the open-source simulator textit{CARLA}. We demonstrate the effectiveness of our new metric in a simulation study and highlight its usefulness in the early phases of vehicle development. This research provides a means to evaluate the quality of LiDAR configurations and provides a basis for further optimizing sensor setups for AVs.
|
|
08:05-09:45, Paper TuAPSR.13 | |
>The Eco-Label Strategy of Green Manufacture under the Influence of Consumers’ Intrinsic Preferences |
|
Hou, Yingjie | Northwestern Polytechnical University |
Guo, Peng | Northwestern Polytechnical University |
Zhao, Jing | Northwestern Polytechnical University |
Keywords: Consumer and Industrial Applications
Abstract: Considering two eco-label strategies, self-label and certification-label, we construct a duopoly competition model encompasses both green product and ordinary product manufacturing enterprises. By Investigating the optimal eco-label standards, we explore the product pricing, and profits for enterprises facing green-sensitive consumers and price-sensitive consumers. The we analyze the optimal eco-label selection for green enterprises in different preference markets. Research indicates that the green quality standards and product prices under certification labels are invariably higher than those under self-label. However, the choice of eco-label by enterprises is influenced by consumers' individual intrinsic preferences; in price-sensitive markets, enterprises tend to adopt self-label; In green-sensitive markets, when the value of consumers' individual intrinsic preferences is below a certain threshold, enterprises will prioritize certification labels. Additionally, the profits of enterprises in green-sensitive markets are generally higher than those in price-sensitive markets, enterprises should highlight the advantages of green quality and guide consumers to prefer green attributes more when formulating promotional strategies.
|
|
08:05-09:45, Paper TuAPSR.14 | |
>AutoForma: A Large Language Model-Based Multi-Agent for Computer-Automated Design |
|
Liao, JianXing | Shenzhen Institute for Advanced Study, University of Electronic |
Xu, Junyan | University of Electronic Science and Technology of China |
He, Sicheng | University of Electronic Science and Technology of China |
Chen, Zeke | UESTC |
Yu, Shui | Shen Zhen Institute for Advanced Study, UESTC |
Li, Yun | Shenzhen Institute for Advanced Study, University of Electronic |
Keywords: Consumer and Industrial Applications, System Architecture
Abstract: With the proliferation of artificial intelligence, Computer-Aided Design (CAD) is being transformed into Computer-Automated Design (CAutoD). In this paper, the advent of Large Language Models (LLMs) introduces new opportunities for CAutoD. This study develops AutoForma, an LLM-based multi-agent system, for automatic conversion from natural language descriptions to 3D models. By harnessing the comprehension capabilities of LLMs, AutoForma streamlines the CAutoD workflow by efficiently translating design intents into precise models in CAD. Through a comprehensive set of evaluations, AutoForma is seen to offer automation performance across various design tasks, particularly in generating non-standard parts that meet specific requirements, with higher efficiency and accuracy than using just an LLM like GPT-4.
|
|
08:05-09:45, Paper TuAPSR.15 | |
>Hybrid Data-Mechanism Modeling for Tire Response Dynamics in Estimating Tire–Road Friction Coefficient |
|
Lu, Jiaxing | Tongji University |
Cheng, Liangzhu | Dongfeng Automotive Technology Center |
Liang, Jun | Dongfeng Automotive Technology Center |
Wang, Nian | Dongfeng Motor Corporation |
Li, Bin | College of Electronic and Information Engineering, Tongji Univer |
Zhang, Lin | Tongji University |
Chen, Hong | Tongji University |
Keywords: System Modeling and Control, Electric Vehicles and Electric Vehicle Supply Equipment, Autonomous Vehicle
Abstract: Advanced control and safety systems are crucial for electric vehicles, and the accurate estimation of the tire-road friction coefficient (TRFC) is crucial for developing effective safety control strategies. The hybrid data-mechanism model (HDMM), introduced in this paper, addresses the performance challenges posed by the inaccuracies of physical models and the limited interpretability of data-driven models in tire force estimation for TRFC estimation.Tire dynamics often exhibit transient responses, while mechanism-based models(MBM) typically reflect steady-state characteristics. Neglecting transient characteristics leads to a decrease in model accuracy.A neural network is used to learn the transient response characteristics of tire dynamics.These characteristics are then integrated with the steady-state tire forces from MBM to estimate the lateral and vertical forces acting on the wheel.The estimated tire forces serve as virtual measurements to calibrate parameters in the TRFC estimator, based on the Unscented Kalman Filter (UKF). During real-world vehicle tests, the proposed method reduced the Mean Error (ME) in lateral and vertical forces by 1271.85 N and 996.7 N, respectively, compared to the estimated tire forces from MBM. Additionally, the estimated TRFC converged to the reference value approximately 40ms earlier than the result from the MBM, with an estimated deviation within 0.1.
|
|
08:05-09:45, Paper TuAPSR.16 | |
>FLSTAGCN: Traffic Flow Prediction Based on Federated Learning and Attention Graph Convolutional Network |
|
Shi, Lei | Zhengzhou University, School of Cyber Science and Engineering |
Yuan, Shaohua | Zhengzhou University |
Lian, Huijuan | Zhengzhou University |
Gao, Yufei | Zhengzhou University |
Wei, Lin | Zhengzhou University |
Wang, Qilong | Zhengzhou University |
Keywords: Intelligent Transportation Systems, Distributed Intelligent Systems, Smart Buildings, Smart Cities and Infrastructures
Abstract: Traffic flow prediction assumes a pivotal role in aiding governments and companies accurately forecast changes in vehicle volume, consequently enhancing transportation efficiency and facilitating vehicle travel. Presently, the majority of traffic flow prediction methods rely on centralized learning strategies, which entail the transmission of substantial data and may jeopardize user privacy. To address this issue, we propose a Federated Learning-based Attention Graph Convolutional Network (FLSTAGCN) algorithm for traffic flow prediction. Firstly, we develop a Spatial-Temporal Attention Graph Convolutional Network (STAGCN) method that employs attention mechanism to proficiently extract spatial-temporal features from traffic flow data, augmenting the model's learning capabilities. Subsequently, within the aggregation mechanism of Federated learning, we devise a bespoke optimal selection to enhance training accuracy and reduce communication costs in traffic flow prediction scenarios. Finally, we integrate Federated Learning with STAGCN and utilize the optimal selection protocol to designate participants for transmitting optimal parameters. The Experimental results substantiate that our approach outperforms advanced deep learning approaches in terms of traffic flow prediction performance while ensuring the privacy and security of traffic data.
|
|
08:05-09:45, Paper TuAPSR.17 | |
>Steering Control Considering Motion Sickness and Vehicle Performance Via DDPG Algorithm and 6-DoF-SVC Model |
|
Kawakami, Uta | The University of Electro-Communications |
Sawada, Kenji | The University of Electro-Communications |
Keywords: Autonomous Vehicle, Decision Support Systems, Adaptive Systems
Abstract: Autonomous driving demands sophisticated control systems that optimize safety, performance, passenger comfort, and fuel efficiency. This study proposes a steering control system that integrates the Deep Deterministic Policy Gradient (DDPG) for speed planning with a novel feedback mechanism based on Subjective Vertical Conflict (SVC) in the reward function. Using simulations in MATLAB and Simulink, we evaluate the system's performance across various thresholds of SVC, examining its impact on ride comfort, fuel efficiency, and vehicle behavior during lane changes. Results reveal a trade-off relationship between ride comfort and fuel efficiency, with lower SVC thresholds generally improving comfort but potentially increasing steering input. Additionally, excessively low SVC thresholds degrade target-reaching performance and lengthen lane change distances, highlighting the need for careful parameter tuning. Overall, our findings demonstrate the potential of reinforcement learning-based steering control systems to optimize multiple evaluation criteria simultaneously while emphasizing the importance of balancing trade-offs in autonomous driving scenarios.
|
|
08:05-09:45, Paper TuAPSR.18 | |
>Robust Controller for Varying Speed Autonomous Ground Vehicles Considering System Uncertainties and Road Conditions |
|
Rahim, Md Abdur | Deakin University |
Arogbonlo, Adetokunbo | Deakin University |
Pappu, Mohammad Rokonuzzaman | Deakin University |
Abu Alqumsan, Ahmad | Deakin University |
Keywords: Autonomous Vehicle, Control of Uncertain Systems, System Modeling and Control
Abstract: This paper presents a novel robust path-tracking controller for autonomous ground vehicles. Environmental and vehicle factors like variation in road conditions and varying speed can adversely affect autonomous ground vehicles' path-tracking capability. A polytopic linear parameter varying model for autonomous ground vehicle that accounts for system uncertainties with varying speeds and road conditions is formulated. Then, an H_∞ based robust path-tracking controller is developed using this model to minimise the vehicle's lateral velocity, heading error, and slip angle. Simulation results comparing the proposed controller with a conventional robust controller are presented. The findings show that the proposed controller performs well and is more effective than the conventional robust controller.
|
|
08:05-09:45, Paper TuAPSR.19 | |
>Safety Verification of Advanced Driver Assistance Systems Using Hybrid Automaton Reachability |
|
Liu, Lu | Huazhong University of Science and Technology |
Sun, Qi | Huazhong University of Science and Technology |
Yang, Liren | Huazhong University of Science and Technology |
Li, Yahui | Huazhong University of Science and Technology |
Zhou, Chunjie | Huazhong University of Science and Technology |
Keywords: Autonomous Vehicle, Modeling of Autonomous Systems, Cooperative Systems and Control
Abstract: Advanced driver assistance system (ADAS) is effectively promoting the vehicular automation level and it is critical to ensure its functional safety. While existing analysis mainly focuses on individual applications of ADAS, safety violations in the overall system can be found by extensive road tests, which are not only costly in terms of time and money but also lack a formal safety guarantee. This is because tests may not cover all driving scenarios, especially the ones that involve discrete mode switching. In this paper, we focus on the longitudinal vehicle motion and provide a pipeline to perform safety verification for all the related ADAS applications. To that end, we specify safety constraints and boundaries for a vehicle’s longitudinal cruising and collision avoidance and validate a longitudinal dynamic model against the high-fidelity simulation software CarSim. Then we define hybrid automata to describe the closed-loop system composed of the vehicle dynamics and the ADAS. Finally, by computing the reachable sets of the hybrid automata and comparing them with the specified safety boundaries, the ADAS is verified. Numerical experiments demonstrate the efficacy of the proposed approach.
|
|
08:05-09:45, Paper TuAPSR.20 | |
>Multi-Segment Fusion-Enhanced Spatial-Temporal Graph Convolutional Network for Traffic Flow Prediction (I) |
|
Zhang, Wei | Chongqing University of Posts and Telecommunications |
Tang, Peng | Southwest University |
Keywords: Intelligent Transportation Systems
Abstract: Accurate traffic Flow Prediction can assist in traffic management, route planning, and congestion mitigation, which holds significant importance in enhancing the efficiency and reliability of intelligent transportation systems (ITS). However, existing traffic flow prediction models suffer from limitations in capturing the complex spatial-temporal dependencies within traffic networks. In order to address this issue, this study proposes a multi-segment fusion-enhanced spatial-temporal graph convolutional network (MS-STGCN) for traffic flow prediction with the following three-fold ideas: a) building a unified spatial-temporal graph convolutional framework based on Tensor M-product, which capture the spatial-temporal patterns simultaneously; b) incorporating hourly, daily, and weekly components to model multi temporal properties of traffic flows, respectively; c) fusing the outputs of the three components by attention mechanism to obtain the final traffic flow prediction results. The results of experiments conducted on two traffic flow datasets demonstrate that the proposed MS-STGCN outperforms the state-of-the-art models.
|
|
TuBT1 |
MR01 |
Computational Intelligence and Soft Computing 2 |
Regular Papers - Cybernetics |
Chair: Yu, Baijiang | South China University of Technology |
|
11:00-11:20, Paper TuBT1.1 | |
>Incremental Evolution of Three Degree-Of-Freedom Arachnid Gaits |
|
Parker, Gary | Connecticut College |
Isak, Manan Basil Masaru | Connecticut College |
O'Connor, Jim | Connecticut College |
Keywords: Evolutionary Computation, Computational Intelligence, Application of Artificial Intelligence
Abstract: In this research, we evolve gaits for an arachnid-inspired robot. The method used is an expansion upon previous research on the incremental evolution of gaits for hexapod robots with two degrees of freedom per leg, which we now apply to a more complex, eight-legged robot with three degrees of freedom per leg. Incremental evolution handles gait generation for legged robots in two discrete increments. The first increment uses a cyclic genetic algorithm to learn the activations (pulse instructions to the servos) required for each leg to perform a single-leg cycle. This learning program takes into account the way each leg is mounted on the body and the range of movement provided by the three servos on each leg to produce a smooth, straight, and efficient leg cycle. The second increment uses a genetic algorithm to select the best combination of leg cycles for each leg and to learn the timing to execute each leg cycle to coordinate them all together into a single gait. In this work, we learn the gait incrementally in a simulation and transfer the final gaits to the real robot to confirm the method’s viability.
|
|
11:20-11:40, Paper TuBT1.2 | |
>Individual-Level Dominant Exemplar Selection for Particle Swarm Optimization |
|
Wang, Hu-Long | Nanjing University of Information Science and Technology |
Duan, Danting | Key Laboratory of Media Audio & Video, Communication University |
Yang, Qiang | Nanjing University of Information Science and Technology |
Gao, Xu-Dong | Nanjing University of Information Science and Technology |
Xu, Peilan | Nanjing University of Information Science and Technology |
Lin, Xin | Nanjing University of Information Science and Technology |
Lu, Zhen-Yu | Nanjing University of Information Science and Technology |
Zhang, Jun | Hanyang University |
Keywords: Swarm Intelligence, Evolutionary Computation, Computational Intelligence
Abstract: Leading exemplars play significant roles in updating particles to seek optimal solutions for Particle Swarm Optimization (PSO). Along this road, this paper devises an Individual-level Dominant Exemplar Selection (IDES) framework for PSO, giving rise to a new PSO variant named IDESPSO. Specifically, instead of using their own personally best positions and the globally best position of the entire swarm to update particles, IDES first randomly chooses two different exemplars for each particle from all personally best positions. Then, it compares the two selected exemplars with the personally best position of this particle. Based on the comparison results, different updating strategies are utilized to update different particles. This method notably enriches the variety among the chosen leading exemplars, thereby substantially bolstering the updating diversity of particles. Under IDES, this paper further develops seven selection strategies to help IDESPSO pick up promising exemplars for particles to evolve. Specifically, the seven selection schemes are the roulette wheel selection, the tournament selection, and five hybridizations of two basic models. A series of experiments have been undertaken on the universally used CEC2014 problem suite to compare IDESPSO with the seven selection schemes and two classic PSOs. The empirical results show that IDESPSO paired with anyone of the seven selection methods, markedly outperforms the two classical PSO variants, highlighting its significant performance.
|
|
11:40-12:00, Paper TuBT1.3 | |
>EARL-Light: An Evolutionary Algorithm-Assisted Reinforcement Learning for Traffic Signal Control |
|
Chen, JingYuan | South China University of Technology |
Wei, Feng-Feng | South China University of Technology |
Chen, Tai-You | South China University of Technology |
Hu, Xiao-Min | Guangdong University of Technology |
Jeon, Sang-Woon | Hanyang University |
Wang, Yang | Northwestern Polytechnical University |
Chen, Wei-Neng | South China University of Technology |
Keywords: Evolutionary Computation, Computational Intelligence, Machine Learning
Abstract: Traffic signal control (TSC) problems have re- ceived increasing attention with the development of the smart city. Reinforcement learning (RL) models TSC as a Markov decision process and learns the timing relationship of traffic scheduling from massive historical data. Due to the uncertainty and mutability of TSC problems, existing RL methods face bottlenecks in diversity and are easy to be trapped into local optima. To alleviate this predicament, this paper combines evolutionary optimization and RL to propose an evolution- ary algorithm-assisted reinforcement learning (EARL-Light) method for TSC problems. EARL-Light is a population-based algorithm, in which one individual represents a policy and a population of individuals are evolved to search for near-optimal policies. The diversified search ability of evolutionary optimiza- tion can help the algorithm get rid of local optima for global optimization and the rapid learning based on the gradient of RL can achieve fast convergence. Extensive experiments on seven real-world traffic datasets demonstrates that EARL-Light achieves shorter travel time with fast convergence.
|
|
12:00-12:20, Paper TuBT1.4 | |
>Evolutionary Reinforcement Learning with Double Replay Buffers for UAV Online Target Tracking |
|
Yu, Baijiang | South China University of Technology |
Wei, Feng-Feng | South China University of Technology |
Hu, Xiao-Min | Guangdong University of Technology |
Jeon, Sang-Woon | Hanyang University |
Luo, Wenjian | Harbin Institute of Technology, Shenzhen |
Chen, Wei-Neng | South China University of Technology |
Keywords: Evolutionary Computation, Computational Intelligence, Application of Artificial Intelligence
Abstract: Target tracking has broad applications like disaster relief, and unmanned aerial vehicles (UAVs) have been universally applied in target tracking in recent years. Due to the strong responsiveness to deceptive reward signals and diverse exploration, evolutionary reinforcement learning (ERL) is a more noteworthy option for training UAVs than common reinforcement learning. However, for ERL contains too many agents, its training efficiency is not satisfactory enough. To address this shortcoming, this paper proposes an evolutionary reinforcement learning with double replay buffers (ERLDRB) for UAV online target tracking problem. Firstly, considering the energy consumption and the possible delay of feedback signals to the UAV, a more realistic model of UAV online target tracking problem is designed. Then based on the problem formulation, ERLDRB utilizes a double experience replay buffers technique to increase learning efficiency in the training stage, which can better solve real-world UAV online target tracking problem. Simulation results show that ERLDRB outperforms multiple contrasting algorithms on the designed model.
|
|
12:20-12:40, Paper TuBT1.5 | |
>Matrix-Based Ant Colony System for Traveling Salesman Problem |
|
Li, Xu | South China University of Technology |
Li, Jian-Yu | South China University of Technology |
Chen, Chun-Hua | South China University of Technology |
Zhan, Zhi-Hui | South China University of Technology |
Kwong, Sam Tak Wu | Lingnan University |
Zhang, Jun | Hanyang University |
Keywords: Evolutionary Computation, Swarm Intelligence, Computational Intelligence
Abstract: Ant colony system algorithm (ACS), as an important evolutionary computation (EC) algorithm, has demonstrated significant advantages in solving complex optimization problems. However, traditional EC algorithms and traditional ACS algorithm often face the challenge of slow computational speed when dealing with large-scale problems. In recent years, matrix-based EC approaches have been proposed to accelerate the computational speed, which has obtained promising results in dealing with large-scale problems. However, most existing matrix-based EC algorithms are designed for continuous optimization problems, while the matrix-based approach integrated with ACS has not attracted enough attention, which will be efficient for solving large-scale discrete optimization problems. Therefore, in this paper, we propose a matrix-based ACS (MACS) algorithm and apply it to solve the traveling salesman problem (TSP). MACS is an innovative improvement over the traditional ACS algorithm, utilizing matrix operations to parallelly let ants select city and update pheromone. Experimental results show that the MACS algorithm has significantly better efficiency in accelerating computational speed while maintaining the remarkable problem-solving ability in solving large-scale TSP.
|
|
12:40-13:00, Paper TuBT1.6 | |
>Building Consensus in Group Decision-Making with Intuitionistic Reciprocal Preference Relations: An Analysis of Various Protocols of Information Granularity Distribution |
|
González-Quesada, Juan Carlos | University of Granada |
Cabrerizo, Francisco Javier | University of Granada (Q1818002F) |
Herrera Viedma, Enrique | University of Granada (Spain) |
Pedrycz, Witold | University of Alberta |
Keywords: Fuzzy Systems and their applications, Computational Intelligence
Abstract: On the one hand, to model experts' preferences in group decision-making, intuitionistic reciprocal preference relations have widely been used because they allow for accommodating hesitation degrees, which are inherent to all decision-making processes. On the other hand, an optimization of information granularity distribution has recently been applied to establish consensus during group decision-making processes. Concretely, a symmetric and uniform distribution of information granularity has been considered for intuitionistic reciprocal preference relations. However, there exist other protocols of information granularity distribution that could be used. Therefore, we aim to analyze all the information granularity distribution protocols and determine their effectiveness in building consensus through intuitionistic reciprocal preference relations. The performance of the different protocols is discussed by conducting some numerical experiments that help provide insights into the effectiveness of the protocols to build consensus.
|
|
TuBT2 |
MR02 |
Deep Learning and Neural Networks 5 |
Regular Papers - Cybernetics |
Chair: Raju, S M Taslim Uddin | University of Waterloo |
|
11:00-11:20, Paper TuBT2.1 | |
>CMA-BP: A Clustered Multi-Task Learning and Branch Attention Based Branch Predictor |
|
Ming, Li | University of Electronic Science and Technology of China |
Rucong, Xu | University of Electronic Science and Technology of China |
Zhang, Hexu | University of Electronic Science and Technology of China |
Li, Lin | Qingdao Agriculture University |
Li, Yun | Shenzhen Institute for Advanced Study, University of Electronic |
Keywords: Neural Networks and their Applications, Machine Learning, AI and Applications
Abstract: Branch prediction stands as a key bottleneck in enhancing CPU performance, particularly evidenced by an average of around 10 mispredicted hard-to-predict(H2P) branches per benchmark in SPEC 2017 by current neural network methods. To improve, this paper proposed a Clustered Multitask Learning and Branch Attention Mechanism-Based Branch Predictor (CMA-BP). Clustered multi-task learning enhances model generalization, and branch attention extracts preferences of different branches for global history. Thus, CMA-BP efficiently aggregates branches with similar features, reducing training complexity. Experimental results show that CMABP outperforms existing predictors in accuracy significantly and in the number of parameters required. By advancing the state-of-the-art in branch prediction, this work has important implications for future high-performance computer architecture design
|
|
11:20-11:40, Paper TuBT2.2 | |
>RS-DETR: An Improved DETR for High-Resolution Remote Sensing Image Object Detection |
|
Cao, Feng | Shanxi University |
Wang, Ruoyu | Shanxi University |
Li, Deyu | Shanxi University |
Hu, ZhiGuo | Shanxi University |
Keywords: Deep Learning, Machine Learning, Neural Networks and their Applications
Abstract: High-resolution remote sensing image object detection is an important research area in remote sensing information processing and has substantial practical applications. This domain presents unique challenges, including variable object scales, complex backgrounds, prevalent small objects, and densely arranged items, distinguishing it from traditional object detection in natural images. This paper proposes a novel object detection algorithm(RS-DETR), which builds upon the DETR framework and integrates the Swin Transformer. The algorithm features a dual-branch structure in its feature extraction module, markedly improving detection accuracy, especially for objects of varying scales. The addition of the GAM convolutional attention mechanism allows the model to concentrate more effectively on relevant regions, minimizing background complexities. Moreover, we have included the scale-invariant intersection over union (SIoU) loss function to enhance the precise localization of closely packed objects. To demonstrate the efficacy of the algorithm, RS-DETR was applied to the HRSC2016 and NWPU VHR-10 datasets. The results show average detection accuracies of 86.1% and 57.9% on these datasets, respectively, outperforming the baseline models by 1.1% and 0.9%, respectively.
|
|
11:40-12:00, Paper TuBT2.3 | |
>TransUAAE-CapGen: Caption Generation from Histopathological Patches through Transformer and UNet-Based Adversarial Autoencoder |
|
Raju, S M Taslim Uddin | University of Waterloo |
Mohammad, Abdul Raqeeb | University of Waterloo |
Islam, Md. Milon | University of Waterloo |
Karray, Fakhreddine | University of Waterloo |
Keywords: Deep Learning, Neural Networks and their Applications, Machine Learning
Abstract: Captioning Whole Slide Images (WSIs) for pathological analysis is an essential but not extensively explored aspect of computer-aided pathological diagnosis. Challenges arise from insufficient datasets and the effectiveness of model training. Generating automatic caption reports for various gastric adenocarcinoma images is another challenge. In this paper, we introduce a hybrid method referred to as TransUAAECapGen to generate histopathological captions from WSI patches. The TransUAAE-CapGen architecture consists of a hybrid UNet-based Advereasrial Autoencoder (AAE) for feature extraction and a transformer for caption generation. The hybrid UNet-based AAE extracted complex tissue properties from histopathological patches, transforming them into lowdimensional embeddings. The embeddings are then fed into the transformer to generate concise captions. Our proposed method is validated using the PatchGastricADC22 dataset. The TransUAAE-CapGen model provides the best estimated accuracy of BLEU-4 = 86.8%, METEOR = 59.6%, a ROUGE = 89.3%, and CIDEr = 7.72%. Experimental analysis indicates that the TransUAAE-CapGen architecture outperforms the traditional LSTM-based model for the caption generation task. Our findings reveal that the proposed architecture can effectively generate accurate and precise reports for medical image analysis.
|
|
12:00-12:20, Paper TuBT2.4 | |
>Learned Image Compression with Transformer-CNN Mixed Structures and Spatial Checkerboard Context |
|
Ji, Kexin | Hohai University |
Keywords: Deep Learning, Machine Vision, Image Processing and Pattern Recognition
Abstract: Learning-based image compression techniques combined with current Transformer models and with checkerboard context models have shown the excellent Rate-Distortion performance. However, the mixed structure still has room for optimization in terms of redundancy information and decoding efficiency, while the checkerboard context model has redundancy in capturing correlations between latent representations. To solve these problems, we propose an innovative framework that combines a mixed Transformer-CNN structure with a checkerboard context model. Specifically, we introduce a ``Checkerboard Channel-wise Entropy Module" to improve coding efficiency of utilizing contexts through a two-channel decoding method with checkerboard contexts. Then, we propose the ``In-slice Odd-even Context", which improves the handling of spatial redundancy information by adding additional spatial contexts by introducing a checkerboard context model to the original mixed structure with channel contexts and global contexts. Extensive experimental results demonstrate that our proposed method outperforms JPEG, BPG and previous learned image compression on the Kodak dataset.
|
|
12:20-12:40, Paper TuBT2.5 | |
>Multi-Kernel Broad Learning System Based on Elastic-Net with Random Fourier Features |
|
Zhang, Qihuai | Beijing Normal University |
Zhao, Xiaojie | Beijing Normal University |
Keywords: Machine Learning, Neural Networks and their Applications
Abstract: The Broad Learning System (BLS) features a simple yet efficient network structure, with its core being the fast and random generation of hidden layers; however, this generation method not only fails to effectively capture the nonlinear characteristics in the task, but also generates certain 'redundant nodes', which can negatively affect its learning capabilities. In this study, we propose an improved version of BLS, named the KEFBLS, aimed at enhancing the feature extraction capability of the hidden layer through the integration of multi-kernel technology and network sparsification strategies, complemented by deeper feature extraction using random Fourier features. the KEFBLS first combines polynomial and wavelet kernels to boost the nonlinear mapping capabilities of data; then, it applies the elastic-net method to refine the BLS objective function, removing low-impact hidden layer nodes to reduce redundancy and create a more streamlined network; finally, KEFBLS employs random Fourier features to map the processed hidden layers, further enhancing the network's feature extraction capabilities, constructing a new learning model. Our experimental results on three UCI regression datasets demonstrate that KEFBLS surpasses other methods in terms of learning efficiency and model performance.
|
|
12:40-13:00, Paper TuBT2.6 | |
>SFAM-Net: A Novel Dual-Branch Network Based on Spectral Feature and Attention Machine for Building Change Detection in Remote Sensing Imagery |
|
Li, Jiequn | Taiyuan University of Technology |
He, Zhisen | Taiyuan University of Technology |
Lv, Yanfang | Taiyuan University of Technology |
Yan, Chen | Taiyuan University of Technology |
Wang, XingKui | Taiyuan University of Technology |
Keywords: Neural Networks and their Applications, Deep Learning, Machine Vision
Abstract: Deep learning techniques have significantly advanced change detection in remote sensing imagery. However, building change detection presents challenges due to the varied appearance of buildings and the complexity of scenes in remote sensing images. Current deep learning-based methods encounter three primary issues. Firstly, CNN-based approaches struggle to model crucial global contextual information essential for remote sensing building images analysis. Transformer-based methods may inadvertently degrade local features. Secondly, traditional attention mechanisms fall short in effectively modeling spatial and spectral features. Thirdly, certain channel attention methods extract excessive redundant information.To address these challenges, this study proposes SFAM-Net, a two-branch hybrid architecture. Our approach initially employs orthogonal methods to minimize redundant information extracted from channels and spaces. Subsequently, we leverage the parallel structure of convolutions and visual transformers to enhance images representation, integrating local features and global representations through cross-attention to better coordinate building and background features. In the CNN and Transformer branches, we adopt spatial-spectral feature coordination and spectral multi-head attention coordination strategies to improve performance in complex scenes. Additionally, we introduce a novel loss function combining edge and center guidance, focusing on changing image edges and centers to enhance sensitivity and accuracy in change area detection. Extensive experiments on widely used LEVIR-CD and WHU-CD datasets validate the effectiveness and efficiency of our network.
|