Widespread machine learning methods behind ‘link prediction’ are performing very poorly, researchers find

Credit: CC0 Public Domain

As you scroll through any social media feed, you are likely to be prompted to follow or friend another person, expanding your personal network and contributing to the growth of the app itself. The person suggested to you is a result of link prediction: a widespread machine learning (ML) task that evaluates the links in a network—your friends and everyone else’s—and tries to predict what the next links will be.

Beyond being the engine that drives social media expansion, link prediction is also used in a wide range of scientific research, such as predicting the interaction between genes and proteins, and is used by researchers as a benchmark for testing the performance of new ML algorithms.

New research from UC Santa Cruz Professor of Computer Science and Engineering C. “Sesh” Seshadhri published in the journal Proceedings of the National Academy of Sciences establishes that the metric used to measure link prediction performance is missing crucial information, and link prediction tasks are performing significantly worse than popular literature indicates.

Seshadhri and his co-author Nicolas Menand, who is a former UCSC undergraduate and masters student and a current Ph.D. candidate at the University of Pennsylvania, recommend that ML researchers stop using the standard practice metric for measuring link prediction, known as AUC, and introduce a new, more comprehensive metric for this problem. The research has implications for trustworthiness around decision-making in ML.

AUC’s ineffectiveness

Seshadhri, who works in the fields of theoretical computer science and data mining and is currently an Amazon scholar, has done previous research on ML algorithms for networks. In this previous work, he found certain mathematical limitations that were negatively impacting algorithm performance, and in an effort to better understand the mathematical limitations in context, dove deeper into link prediction due to its importance as a testbed problem for ML algorithms.

‘”The reason why we got interested is because link prediction is one of these really important scientific tasks which is used to benchmark a lot of machine learning algorithms,” Seshadhri said.

“What we were seeing was that the performance seemed to be really good… but we had an inkling that there seemed to be something off with this measurement. It feels like if you measured things in a different way, maybe you wouldn’t see such great results.”

Link prediction is based on the ML algorithm’s ability to carry out low dimensional vector embeddings, the process by which the algorithm represents the people within a network as a mathematical vector in space. All of the machine learning occurs as mathematical manipulations to those vectors.

AUC, which stands for “area under curve” and is the most common metric for measuring link prediction, gives ML algorithms a score from zero to one based on the algorithm’s performance.

In their research, the authors discovered that there are fundamental mathematical limitations to using low dimensional embeddings for link predictions, and that AUC can not measure these limitations. The inability to measure these limitations caused the authors to conclude that AUC does not accurately measure link prediction performance.

Seshadhri said these results call into question the widespread use of low dimensional vector embeddings in the ML field, considering the mathematical limitations that his research has surfaced on their performance.

More information:
Menand, Nicolas et al, Link prediction using low-dimensional node embeddings: the measurement problem, Proceedings of the National Academy of Sciences (2024). DOI: 10.1073/pnas.2312527121.

Provided by
University of California – Santa Cruz

Widespread machine learning methods behind ‘link prediction’ are performing very poorly, researchers find (2024, February 12)
retrieved 13 February 2024

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Comments are closed