

<?xml version="1.0" encoding="UTF-8"?>
<record>
  <title>Semantic Similarity, Phrase Analysis, and Expert Evaluation of Human versus LLM-Generated Abstracts</title>
  <journal>Journal of Digital Information Management</journal>
  <author>Pit Pichappan</author>
  <volume>24</volume>
  <issue>1</issue>
  <year>2026</year>
  <doi>https://doi.org/10.6025/jdim/2026/24/1/40-61</doi>
  <url>https://www.dline.info/fpaper/jdim/v24i1/jdimv24i1_3.pdf</url>
  <abstract>This research examines abstracts in scientific papers and how AI generates them. Abstracts are crucial to
information use because they are the first sources researchers consult to decide whether a paper is worth
checking out. The study analyses the abstracts from the December 2025 issues of Antioxidants and PLOS
Computational Biology. These abstracts are generated by the authors themselves, ChatGPT, or Qwen. To evaluate,
we used semantic similarity (Jaccard index), phrase occurrence frequency, and expert scores. It seems
this covers quality, detectability, and what it all means for science writing.
The results showed that the AI-generated abstracts were more similar to one another than to human generated
abstracts. The mean Jaccard index was around 0.66 to 0.68 for the AIs compared to themselves, but lower
with the author written stuff. That points to both AIs writing in a similar style, regardless. Domain specific
terms appeared in both humans and AIs, but the way they used them, such as frequency and exact types,
differed between ChatGPT and Qwen. Expert scoring assigned higher grades to AI abstracts based on clarity,
structure, scientific sense, and originality or relevance. Qwen got a mean of 9.29, ChatGPT 9.02, while all
human authored ones averaged 7.75. The ANOVA test reflects that both human and resulted in about 79 per
cent of the variation in the scores. This finding suggests that AI can generate more impressive and
comprehensive summaries than humans. Still, there are ethical problems to consider, such as how AI might
fabricate references, spread misinformation, or even hijack peer review. The analysis estimates that 10 to 14
per cent of recent biomedical abstracts indicate AI assistance. It indicates that we need better ways to detect
it, clearer rules, and to focus more on the actual research ideas rather than on how slick the writing is.</abstract>
</record>
