Furthermore, MMU (Morristown NJ) and SSI (Brunswick GA) are only connected with one another. Same story with MXY (McCarthy AK) and MYK (May Creek AK). Therefore, they are not really part of the network.
This leaves a network of 694 airports. Here is a table showing the minimum path lengths between all possible pairs:
Length #
1 3,692
2 42,794
3 74,998
4 74,064
5 39,197
6 5,452
7 268
8 6
--------
Total 240,471
Indeed, 240,471 corresponds to all possible airport pairings: 294 * 293 / 2.
As a result, the average path length is 3.498. Quite short, when considering that the network consists of 694 nodes.
Next, I have to calculate the clustering coefficient of the network as the fraction of pairs of nodes that, being directly connected to a third node, are also directly connected with each other.
Another way to see clustering is to consider triplets of nodes that are directly connected. If the nodes of a triplet are connected by two links, the triplet is said to be open. If all three nodes are directly connected to each other (i.e., with three links), the triplet is said to be closed. The clustering coefficient is calculated by dividing the total number of closed triplets by the total number of triplets.
We already know that the total number of open triplets in the network of US airports is 42,794, because that is the number of nodes connected via an intermediate node but not connected directly to each other. Those triplets are open because I programmed the database to rejected duplicate pairs. Therefore, pairs found to be connected via a path of length 2 couldn't be inserted into the database if they were already present with a path of length 1.
I wrote a program that, instead of attempting to insert pairs connected with a path of length 2 into the database, counts the cases in which the third node is connected to the first one. I avoided counting more the once the triplets by only considering triplets in which the nodes were in alphabetical order.
The program counted 17,440 closed triplets, resulting in a clustering factor of 0.29. This is higher than the clustering of the Web and of the Internet, as shown in a presentation slides dated May 2008 (www.newton.ac.uk/programmes/CSM/seminars/050114001.pdf):
Network | Vertices | γ | Path | Cluster |
WWW | 2 × 108 | 2.1 | 16 | 0.1078 |
Internet, router | 150000 | 2.4 | 11 | 0.18 |
Movie actors | 212250 | 2.3 | 4.54 | 0.79 |
Maths, coauthors | 70975 | 2.5 | 9.5 | 0.59 |
Phone calls | 53 × 106 | 2.1 | ||
Synonyms | 22311 | 2.8 | 4.5 | 0.7 |
Original sources: Adamic (1999), Aiello et al (2000), Barabási, Albert (1999),Broder (2000), Watts, Strogatz (1998).
The "γ" indicates how long the right tail of the power-law distribution of cluster sizes is. The lower the γ, the slower the number of clusters decreases with the cluster size. The expression for the power law is as follows: N(size) = K * size-γ, where K is a constant. A γ of 1 means that the number of clusters halves when their size doubles. A γ of 2 means that the number of clusters is reduced to a quarter when their size doubles. And so on.
To calculate the distribution for "my" network, I needed to write another small program that listed the number of links of each airport (i.e., node) or, to say it in a different way, the cluster sizes.
It turned out that there are 128 airports with a single connection to another airport. In case you are curious, the best connected city is Atlanta, with non-stop flights to 158 destinations, followed by Minneapolis/Saint Paul with 146, Chicago with 142, and Denver with 141.
The following plot shows log(N) vs. average log(size). This means that γ = 1.25, and K = 264. K should match 128. Why is it much larger? I haven't got a clue. The discrepancy sounds dramatic, but when considering the logarithms, as it appears on the plot, it doesn't look catastrophic.
And why is the γ half of what reported in the examples? Again, I have no idea.
In any case, now that I have found a bona-fide small-world network, I think for a while I will concentrate on something else.