vJUG24 Session: “So, what they are talking about these days ?” – Analyzing and Visualizing topics with Graph

Abstract:

“As data sources are more and more numerous, value is extracted from it by recognizing the relationships and patterns between them. Machine Learning is a booming field that attempts to tackle this challenge and help companies to extract value from their data. Graphs model naturally relationships and patterns between data elements by being able to link them through edges, or cluster them around nodes. Applying Machine Learning algorithm designed for graphs naturally takes the pre-existing relationships in the modeled graph into account and can help to extract more. Once that this pattern extraction has been performed, graph visualization can give new insights on what has been discovered.

An example of this kind of pattern extraction in data is topic modeling: a company may, for example, wants to get the most important topics discussed by its customers with its support team, and track them along time; or someone may be interested in what are the most discussed topics on the stackexchange forums, where thousands of people discuss about programming and sciences each day.
In this talk, we will present how we can perform topic modeling, taking the data from the public stackoverflow data, and extract topics out of it, their evolution along time and visualize the extracted information. For that, we will show how to do it with an Apache Spark based workflow with the widely used LDA algorithm; and also with a more graph-based approach using the relaxmap algorithm.

We will show that modeling the data as a graph and applying a graph algorithm to perform topic extraction leads to equally good or more meaningful topics than the approach with LDA, and that the graph visualization makes it easy to present the value extracted.”

Speaker: Julia Kindelsberger