학술논문

融合多种语义特征的代码摘要生成方法 / Combining Multiple Semantic Features for Code Summarization Generation
Document Type
Academic Journal
Source
中文信息学报 / Journal of Chinese Information Processing. 37(11):81-90
Subject
代码摘要
Transformer
API文档
傅里叶变换
code summarization
transformer
API documentations
Fourier transform
Language
Chinese
ISSN
1003-0077
Abstract
代码摘要生成任务旨在实现全自动化地产生自然语言描述源代码的功能,使其便于软件维护和程序理解.目前,主流的基于Transformer的方法只考虑源代码的文本和结构化语义特征,忽略了与源代码密切相关的 API文档等外部语义特征;其次,在使用大规模数据的情况下,由于 Transformer结构的自注意力模块需要计算所有相似度分数,因此存在计算成本高和内存占用量大的问题.为解决以上问题,该文提出了一种基于改进 Transformer结构的融合多种语义特征的代码摘要生成方法.该方法采用三个独立编码器充分学习源代码的多种语义特征(文本、结构和外部 API文档),并使用非参数化傅里叶变换替代编码器中的自注意力层,通过线性变换降低使用Transformer结构的计算时间和内存占用量,在公开数据集上的实验结果证明了该方法的有效性.
Code summarization aims to automatically generate the natural language description of source code snip-pets,which facilitates software maintenance and program understanding.Recent studies have shown that the popular methods utilizing Transformer-ignores the external semantic information such as API documents.Therefore,we propose an automatic code summary generation method based on an improved Transformer integra-ting multiple semantic features.This method uses three independent encoders to extract multiple semantic features of source code(text,structure and external API documentations information),and the non-parametric Fourier transform is used to replace the self-attention layer in the encoder.The computation time and memory usage of the Transformer structure are reduced by a linear transformation.Experimental results on open datasets prove the effec-tiveness of the method.