Distance in Human Networks. Part three
With this blog post, we conclude a three-part series by network analysis expert and (R)E-TIES researcher Valdis Krebs on distance in human networks. In this final installment, the researcher argues that traditional network metrics relying on geodesics (shortest paths) can distort how communication actually flows, and proposes alternative metrics based on shorter, more frequently used paths for more accurate analysis.
Current human network metrics are calculated using geodesics - the shortest path between any two nodes in a network, whether that path is 2 steps or 12 steps. All geodesics are calculated and then utilized in the network metrics.
We saw from our distortion example that information that travels on a long network paths are both distorted and delayed and might be dropped from the communication path along the way. Why include those long paths in a calculation? We also know that communication happens on other short paths, not just via geodesics. Figure 5 shows a typical communication example that does not use an available geodesic - it uses a longer, yet more frequent path.
Even though the moms in Figure 5 have a direct link (the geodesic), most of the information flow between them is via their children, who are friends and spend a lot of time together every day at school. The thicker links represents more frequent communication. Rather than the 1 step geodesic being utilized (the shortest path), a more frequently used 3-step path is utilized: mom A - child A - child B - mom B.

The moms and their children communicate daily, as do the children at school. The longer, more frequent, path is used instead of the less frequent direct path.
This same pattern of using the more efficient but longer path, instead of the shorter path, is also seen in offices with employees and supervisors. Employees from different departments, working on the same project, communicate more frequently, than their supervisors who may only meet as needed.
Seeing the futility of long paths in a network I developed a new set of network metrics that utilize all short paths (one, two and three steps) in the network. We look at all paths between two nodes, not just geodesics, as long as those path lengths are less than four steps (not beyond the network horizon).
Figure 6 shows a client organization where we calculate all the geodesics in the network. We see that some employees are very far apart when it comes to sharing work information. Two manufacturing employees working on two different products, on two different machines, for two different customers, at two different locations probably have no relevant work information to share.
So why include 1.75 million geodesics > 3 steps in the network calculation? Do they provide greater accuracy? Or do those long geodesics distort who is central in the network? It is most likely the later (This approach works for human networks and any other networks with flow where distance distorts and delays and maybe drops the signal).

Let’s compare our non-geodesic metrics to commonly used network centralities. First let’s look at the popular Betweenness metric by Freeman (1979). It measures which nodes have the most geodesics flowing through them - revealing who controls the flow? We compare it to our Connector metric - which nodes have the most 2 and 3 step paths flowing through them?
Betweenness looks at the whole network graph, no matter how long the paths. Connector just looks within the network horizon of each node. Figure 7 shows us that both metrics return highly similar results. The size of the node in each network shows the value of that network metric.

Figure 8 shows us how the popular Eigenvector centrality compares to our Integration metric. Both metrics are based on the notion that it is better to be connected to others who are well connected. Nodes in the diagram are sized according to the value of the stated centrality metric. Both metrics choose highly similar nodes. But the Integration metric seems to distinguish the centralities better looking at the range of node sizes in each network.

Besides saving needless compute time by not calculating millions of geodesics, the algorithm and names of these new metrics are an advantage. We can easily explain them to a normal client. People are more likely to accept what they understand.
