In computer science the notion of highly connected subgraphs appears frequently. This notion can be formalized as follows. Let G = ( E , V ) be an undirected graph and let S = ( E S , V S ) be a subgraph of G . Then the density of S is defined to be d ( S ) = | E S | | V S | .
The densest subgraph problem is that of finding a subgraph of maximum density. In 1984, Andrew V. Goldberg developed a polynomial time algorithm to find the maximum density subgraph using a max flow technique.
There are many variations on the densest subgraph problem. One of them is the densest k subgraph problem, where the objective is to find the maximum density subgraph on exactly k vertices. This problem generalizes the clique problem and is thus NP-hard in general graphs. There exists a polynomial algorithm approximating the densest k subgraph within a ratio of n 1 / 4 + ϵ for every ϵ > 0 , while it does not admit PTAS unless N P ⊆ ⋂ ϵ > 0 B P T I M E ( 2 n ϵ ) . The problem remains NP-hard in bipartite graphs and chordal graphs but is polynomial for trees and split graphs. It is open whether the problem is NP-hard or polynomial in (proper) interval graphs and planar graphs; however, a variation of the problem in which the subgraph is required to be connected is NP-hard in planar graphs.
The objective of the densest at most k problem is to find the maximum density subgraph on at most k vertices. Anderson and Chellapilla showed that if there exists an α approximation for this problem then that will lead to an Θ ( α 2 ) approximation for the densest k subgraph problem.
The densest at least k problem is defined similarly to the densest at most k subgraph problem. There is a 2-approximation due to Anderson. It is also NP-complete.
Charalampos Tsourakakis introduced the k -clique densest subgraph problem. This variation of the densest subgraph problem aims to maximize the average number of induced k cliques d k ( S ) = | C k ( S ) | | V S | , where C k ( S ) is the set of k -cliques induced by S . Notice that the densest subgraph problem is obtained as a special case for k = 2 . This generalization provides an empirically successful poly-time approach for extracting large near-cliques from large-scale real-world networks.