Grouping of domain-aware mashup services based on LDA models and topics from multiple data sources

Mashup is emerging as a promising software development method for allowing software developers to compose existing Web APIs to create new or value-added composite Web services. However, the rapid growth in the number of available Mashup services makes it difficult for software developers to select a suitable Mashup service to satisfy their requirements. Even though clustering based Mashup discovery technique shows a promise of improving the quality of Mashup service discovery, Mashup service clustering with high accuracy and good efficiency is still a challenge problem.

This paper proposes a novel domain-aware Mashup service clustering method with high accuracy and good efficiency by exploiting LDA topic model built from multiple data sources, to improve the quality of Mashup service discovery.

The proposed method firstly designs a domain-aware Mashup service feature selection and reduction process by refining characterization of their domains to consolidate domain relevance. Then, it presents an extended LDA topic model built from multiple data sources (include Mashup description text, Web APIs and tags) to infer topic probability distribution of Mashup services, which serves as a basis of Mashup service similarity computation. Finally, K-means and Agnes algorithm are used to perform Mashup service clustering in terms of their similarities.

Compared with other existing Mashup service clustering methods, experimental results show that the proposed method achieves a significant improvement in terms of precision, recall, F-measure, purity and entropy.

The results of the proposed method help software developers to improve the quality of Mashup service discovery and Mashup-based software development. In the future, there will be a need to extend the method by considering heterogeneous network information among Mashup, Web APIs, tags, users, and applying it to Mashup discovery for software developers.

Mashup service LDA Multiple data sources Domain feature selection and reduction Service clustering