In computer science the notion of highly connected subgraphs appears frequently. This notion can be formalized as follows. Let
G
=
(
E
,
V
)
be an undirected graph and let
S
=
(
E
S
,
V
S
)
be a subgraph of
G
. Then the density of
S
is defined to be
d
(
S
)
=
|
E
S
|
|
V
S
|
.
The densest subgraph problem is that of finding a subgraph of maximum density. In 1984, Andrew V. Goldberg developed a polynomial time algorithm to find the maximum density subgraph using a max flow technique.
There are many variations on the densest subgraph problem. One of them is the densest
k
subgraph problem, where the objective is to find the maximum density subgraph on exactly
k
vertices. This problem generalizes the clique problem and is thus NP-hard in general graphs. There exists a polynomial algorithm approximating the densest
k
subgraph within a ratio of
n
1
/
4
+
ϵ
for every
ϵ
>
0
, while it does not admit PTAS unless
N
P
⊆
⋂
ϵ
>
0
B
P
T
I
M
E
(
2
n
ϵ
)
. The problem remains NP-hard in bipartite graphs and chordal graphs but is polynomial for trees and split graphs. It is open whether the problem is NP-hard or polynomial in (proper) interval graphs and planar graphs; however, a variation of the problem in which the subgraph is required to be connected is NP-hard in planar graphs.
The objective of the densest at most
k
problem is to find the maximum density subgraph on at most
k
vertices. Anderson and Chellapilla showed that if there exists an
α
approximation for this problem then that will lead to an
Θ
(
α
2
)
approximation for the densest
k
subgraph problem.
The densest at least
k
problem is defined similarly to the densest at most
k
subgraph problem. There is a 2-approximation due to Anderson. It is also NP-complete.
Charalampos Tsourakakis introduced the
k
-clique densest subgraph problem. This variation of the densest subgraph problem aims to maximize the average number of induced
k
cliques
d
k
(
S
)
=
|
C
k
(
S
)
|
|
V
S
|
, where
C
k
(
S
)
is the set of
k
-cliques induced by
S
. Notice that the densest subgraph problem is obtained as a special case for
k
=
2
. This generalization provides an empirically successful poly-time approach for extracting large near-cliques from large-scale real-world networks.