These algorithms rank pages based on the web's link structure.
PageRank is a link analysis algorithm developed by Google to measure the importance or authority of web pages. It operates on the principle that more important pages are likely to receive more links from other important pages. It models the behavior of a "random surfer" who randomly clicks on links.
The algorithm starts with an initial PageRank (e.g., 1/N for N pages) for all pages and iteratively updates scores until they converge.
Consider a simple web graph with pages A, B, C, D and links: A→B, A→C, B→C, C→A, D→C. Let d = 0.85.
Initial Scores (Iteration 0):
PR(A)=0.25, PR(B)=0.25, PR(C)=0.25, PR(D)=0.25 (since N=4 pages)
Iteration 1:
- PR(A) = (1-0.85) + 0.85 * (PR(C)/C(C)) = 0.15 + 0.85 * (0.25/1) = 0.15 + 0.2125 = 0.3625
- PR(B) = (1-0.85) + 0.85 * (PR(A)/C(A)) = 0.15 + 0.85 * (0.25/2) = 0.15 + 0.10625 = 0.25625
- PR(C) = (1-0.85) + 0.85 * (PR(A)/C(A) + PR(B)/C(B) + PR(D)/C(D)) = 0.15 + 0.85 * (0.25/2 + 0.25/1 + 0.25/1) = 0.15 + 0.85 * (0.125 + 0.25 + 0.25) = 0.15 + 0.85 * 0.625 = 0.15 + 0.53125 = 0.68125
- PR(D) = (1-0.85) + 0.85 * (0) = 0.15
Iteration 2 (using PageRank scores from Iteration 1):
- PR(A) = 0.15 + 0.85 * (PR(C)/C(C)) = 0.15 + 0.85 * (0.68125/1) = 0.15 + 0.5790625 = 0.7290625
- PR(B) = 0.15 + 0.85 * (PR(A)/C(A)) = 0.15 + 0.85 * (0.3625/2) = 0.15 + 0.85 * 0.18125 = 0.15 + 0.1540625 = 0.3040625
- PR(C) = 0.15 + 0.85 * (PR(A)/C(A) + PR(B)/C(B) + PR(D)/C(D)) = 0.15 + 0.85 * (0.3625/2 + 0.25625/1 + 0.15/1) = 0.15 + 0.85 * (0.18125 + 0.25625 + 0.15) = 0.15 + 0.85 * 0.5875 = 0.15 + 0.499375 = 0.649375
- PR(D) = 0.15 + 0.85 * (0) = 0.15
With further iterations, the PageRank scores would converge, indicating the relative importance of each page in the graph.
Consider a small web graph with pages A, B, C, D, E and links: A→B, A→C, B→C, C→D, D→E, E→A.
Initial Scores (Iteration 0):
a(A)=1, a(B)=1, a(C)=1, a(D)=1, a(E)=1
h(A)=1, h(B)=1, h(C)=1, h(D)=1, h(E)=1
Iteration 1:
Authority Update:
- a(A) = h(E) = 1
- a(B) = h(A) = 1
- a(C) = h(A) + h(B) = 1 + 1 = 2
- a(D) = h(C) = 1
- a(E) = h(D) = 1
Hub Update:
- h(A) = a(B) + a(C) = 1 + 2 = 3
- h(B) = a(C) = 2
- h(C) = a(D) = 1
- h(D) = a(E) = 1
- h(E) = a(A) = 1
Normalization (example: sum of squares):
Norm Factor for Authorities = √(1² + 1² + 2² + 1² + 1²) = √8 ≈ 2.83
Normalized a(A)=0.35, a(B)=0.35, a(C)=0.71, a(D)=0.35, a(E)=0.35
Norm Factor for Hubs = √(3² + 2² + 1² + 1² + 1²) = √16 = 4
Normalized h(A)=0.75, h(B)=0.5, h(C)=0.25, h(D)=0.25, h(E)=0.25
Iteration 2 (using Normalized scores from Iteration 1):
Authority Update:
- a(A) = h(E) = 0.25
- a(B) = h(A) = 0.75
- a(C) = h(A) + h(B) = 0.75 + 0.5 = 1.25
- a(D) = h(C) = 0.25
- a(E) = h(D) = 0.25
Hub Update:
- h(A) = a(B) + a(C) = 0.75 + 1.25 = 2
- h(B) = a(C) = 1.25
- h(C) = a(D) = 0.25
- h(D) = a(E) = 0.25
- h(E) = a(A) = 0.25
Normalization (example: sum of squares):
Norm Factor for Authorities = √(0.25² + 0.75² + 1.25² + 0.25² + 0.25²) = √(0.0625 + 0.5625 + 1.5625 + 0.0625 + 0.0625) = √2.3125 ≈ 1.52
Normalized a(A)=0.16, a(B)=0.49, a(C)=0.82, a(D)=0.16, a(E)=0.16
Norm Factor for Hubs = √(2² + 1.25² + 0.25² + 0.25² + 0.25²) = √(4 + 1.5625 + 0.0625 + 0.0625 + 0.0625) = √5.75 ≈ 2.39
Normalized h(A)=0.84, h(B)=0.52, h(C)=0.10, h(D)=0.10, h(E)=0.10
The scores continue to converge with further iterations, highlighting C as a strong authority and A as a strong hub.