Code similarity and clone search in large-scale source code data
My research focuses on scalability of code clone detection.
Scalable code clone detection is increasingly important nowadays due to
the popularity of code reuse from online sources such as Stack Overflow
or GitHub. Previous studies have shown that cloning code snippets from
Stack Overflow not only potentially introduces vulnerabilities to the software but
also causes licensing conflicts.
I have built a scalable code search tool (Siamese) that instantly retrieves clone snippets from online sources. It incorporates novel techniques of multiple code representations and query reduction to accurately retrieve clones within hundred millions line of code within seconds. The tool facilitates developers and SE researchers in many ways, such as finding clones during commit or code review time, finding similar code examples, or software plagiarism detection.
M. Paixao, J. Krinke, D. Han, C. Ragkhitwetsagul, M. Harman.
The Impact of Code Review on Architectural Changes.
Transactions on Software Engineering (TSE), 2019.
Download: DOI 10.1109/TSE.2019.2912113
C. Ragkhitwetsagul, J. Krinke. Siamese: Scalable and Incremental Code Clone Search via Multiple Code Representations. Empirical Software Engineering (EMSE), 2019.
Download: Preprint DOI 10.1007/s10664-019-09697-7 Website: github.com/UCL-CREST/Siamese
C. Ragkhitwetsagul. Code similarity and clone search in large-scale source code data. PhD thesis, 2018.
Download: PhD thesis
C. Ragkhitwetsagul, J. Krinke, M. Paixao, R. Oliveto, G. Bianco (2018). Toxic Code Snippets on Stack Overflow. Transactions on Software Engineering (TSE), 2018.
Download: Preprint: Preprint DOI: 10.1109/TSE.2019.2900307 Website: ucl-crest.github.io/cloverflow-web
C. Ragkhitwetsagul, J. Krinke, R. Oliveto (2017). Awareness and Experience of Developers to Outdated and License-Violating Code on Stack Overflow: An Online Survey. UCL Computer Science Research Note (RN/17/10), 2017.
Download: Research Note arXiv: 1806.08149
J. Wilkie , Z. Al Halabi , A. Karaoglu , J. Liao , G. Ndungu, C. Ragkhitwetsagul, M. Paixão , J. Krinke (2018). Who's this? Developer identification using IDE event data. In 15th International Conference on Mining Software Repositories -- Mining Challenge (MSR 2018), 2018. Gothenburg, Sweden <To Appear>, 2018.
Download: Preprint DOI: 10.1145/3196398.3196461
C. Ragkhitwetsagul, J. Krinke, B. Marnette (2017). A picture is worth a thousand words: code clone detection based on image similarity. In 12th International Workshop on Software Clones, 2018. Campobasso, Italy, 2018.
Download: Preprint DOI: 10.1109/IWSC.2018.8327318
C. Ragkhitwetsagul, J. Krinke, D. Clark (2017), A comparison of code similarity analysers, Empirical Software Engineering, vol. 23, no. 4, pp. 2464–2519, Aug. 2018.
Download: Preprint DOI: 10.1007/s10664-017-9564-7 Slideshow: Slideshare
M. Paixao, J. Krinke, D. Han, C. Ragkhitwetsagul and M. Harman (2017). Are Developers Aware of the Architectural Impact of Their Changes? In the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017), Illinois, USA
Download: Preprint DOI: 10.1109/ASE.2017.8115622
C. Ragkhitwetsagul, J. Krinke (2017). Using Compilation/Decompilation to Enhance Clone Detection. In 11th International Workshop on Software Clones, 2017. Klagenfurt, Austria -- Won the People's Choice Award!
Download: Preprint DOI: 10.1109/IWSC.2017.7880502 Slideshow: SlideShare
C. Ragkhitwetsagul, J. Krinke, D. Clark (2016). Similarity of Source Code in the Presence of Pervasive Modifications. In 16th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM), 2016. North Carolina, USA.
Download: Preprint DOI: 10.1109/SCAM.2016.13 Slideshow: SlideShare
C. Ragkhitwetsagul (2016). Measuring Code Similarity in Large-scaled Code Corpora. In 32nd International Conference on Software Maintenance and Evolution (ICSME): Doctoral Symposium, 2016. North Carolina, USA.
Download: Preprint DOI: 10.1109/ICSME.2016.18
C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, and J.H. Drake (2016). Searching for Configurations in Clone Evalution: A Replication Study. In 8th International Symposium on Search-based Software Engineering (SSBSE): Challenge Track, 2016. North Carolina, USA.
Download: Preprint DOI: 10.1007/978-3-319-47106-8_20 Slideshow: SlideShare
The publications listed from here are undergraduate student projects that I advised at the faculty of ICT, Mahidol University:
P. Janviriya, T. Ongarjithichai, P. Numruktrakul, C. Ragkhitwetsagul (2014). CloudyDays : Cloud Storage Integration System. In Student Project Conference (ICT-ISPC), 2014 Third ICT International (pp. 125–128). Nakhonpathom, Thailand.
Download: DOI: 10.1109/ICT-ISPC.2014.6923233
P. Hathaiwichian, L. Siriwittayacharoen, A. Wongwachirawanich, C. Ragkhitwetsagul (2014). Android Application for Event Management and Information Propagation. In Student Project Conference (ICT-ISPC), 2014 Third ICT International (pp. 139–142). Nakhonpathom, Thailand.
Download: DOI: 10.1109/ICT-ISPC.2014.6923236
21/2/2017: Using Compilation/Decompilation to Enhance Clone Detection: The slides of my talk at IWSC '17.
3/10/2016: Similarity of Source Code in the Presence of Pervasive Modifications: The slides of my talk at SCAM '16.
9/10/2016: Searching for Configurations in Clone Evalution: A Replication Study: The slides of my talk at SSBSE '16 (Challenge Track).
15/06/2016: Similarity of Source Code in the Presence of Pervasive Modifications: The slides of my talk at the 12th International Summer School on Software Engineering (Student Talk) covering the complete results of CloPlag experiment.
01/06/2015: CloPlag: A Study of Effects of Code Obfuscation to Similarity Detection Tools: Latest update of CloPlag study with more results! It was given at COW 42 Annual Research Review of CREST.
06/02/2015: CloPlag: A Study of Effects of Code Obfuscation to Clone/Plagiarism Detection Tools: A presentation of intial results from the experiment on effects of code obfuscation to current similarity detection tools. It was given at a CREST Monthly Meeting.
Interesting papers or articles regarding doing a PhD and conducting reserach in general.
- Schwartz, M. A., THE IMPORTANCE OF STUPIDITY IN SCIENTIFIC RESEARCH. Journal of Cell Science. 2008; 121, 1771.
- Harman, M., Draft Guidelines for My Students on Writing Software Engineering Research Papers.
- Jeff Offutt, Editorial: Standards for reviewing papers. Softw. Test. Verif. Reliab. 2007; 17:135–136
- Virginia Gewin, How to write a first-class paper
Videos or slides that I found very valuable for doing research.
- Jones, S. P., How to write a great research paper.
- Jones, S. P., Hughes J., Launchbury, J., How to give a great research talk. 1993.
- Andreas Zeller, On Impact in Software Engineering Research
- Andreas Zeller, Relevance, Simplicity, and Innovation: Stories and Takeaways from SE research (from 42 minutes onwards)
- Alexander Serebrenik, Peer Reviews (and how to survive them)