Search for contacts, projects,
courses and publications

Conversation disentanglement as-a-service

Additional information

Type
Article in conference proceedings
Year
2023
Language
English
Abstract
Modern instant messaging applications (e.g., Gitter, Slack, Discord) provide users with real-time communication means. Developers use them for collaborative development, to ask for code reviews, and to have software-related discussions. In short, a (potential) treasure trove for program comprehension. However, as with any high-throughput “chat application”, messages interleave, leading to concurrent conversations. Associating messages to conversations is called conversation disentanglement, a useful and necessary pre-processing step to analyze datasets of instant messages. Although various conversation disentanglement algorithms have been proposed, it is cumbersome to set up proper execution environments and hard to ensure input data format consistency, calling for better practices and tool support. We present CODI, a RESTful API micro-service and web interface for conversation disentanglement. It provides an easy way to disentangle conversation transcripts with pre-trained models or to train new ones on custom datasets, features, and hyper-parameters. CODI achieves state-of-the-art performances on transcripts of IRC, Slack, and Discord conversations. We show how CODI can provide a significant improvement to reusability (and replicability) of research results, while reducing the efforts and potential mistakes due to configuration, setup, and execution.
Keywords
CoDi, Conversation disentanglement, Instant messaging, Micro-services
Conference proceedings
IEEE/ACM 31st International Conference on Program Comprehension (ICPC), 15-16 May 2023
Pages (or article number)
59-63

Diffusion

License
Rights reserved
Visibility
Public
Status open access
Green