Software engineering tools and research initiatives.
A collection of tools, research projects, and collaborative initiatives focused on code analysis, testing, and software quality.
Funded research efforts and notable studies, most recent first.
Explore metamorphic testing as a rigorous way to ensure the robustness of large language models across languages.
Apply state-of-the-art ML and SE to develop a novel approach for software maintenance. Led by Matheus Paixao (UECE, Brazil).
Build a code-similarity-driven approach and an automated tool that detects similar code and recommends improvements.
Thai–UK collaboration transferring state-of-the-art ASE tools and techniques from UK experts to Thai software companies. Project site ↗
Clone detection across 72k Java snippets and 111 OSS projects — found 66% of matches outdated and potentially harmful to reuse. Project site ↗
Detect clones via image similarity — capture the visual perception of syntax-highlighted code as humans see it. Project site ↗
Evaluate 30 similarity tools across 5 scenarios on Java source — strong evidence for compile/decompile as a normalization step. Project site ↗
Combine clones found before and after decompilation — higher recall, no loss in precision. Project site ↗
The largest comparison to date of 30 similarity tools against pervasive code modifications. Project site ↗
Replication study of EvaClone — a GA-based parameter optimizer that beats default tool settings by 19.9–66.4%. Project site ↗
Scalable code-search framework built on IR, tokenization, code normalization, and variable-length grams.
Tool demonstration papers — open-source tools and prototypes built with my students and collaborators. Pulled live from the publications list.
Coursework and capstone projects from Carnegie Mellon University (2007–2008).
ML-based intelligent agent that extracts meeting information from emails and drafts calendar entries. Report ↗
Firefox extension that scans incoming pages for hidden web bugs and implements the P3P Compact Policy component. Report ↗
Use genetic algorithms to search for the optimal combination of similarity functions, framing record linkage as classification. Report ↗
Distributed file system with scalable MDS and fault-tolerant DS clusters; servers can be added at runtime without disruption. Report ↗