Advanced Network Troubleshooting: Using traceroute

This discussion is a continuation on our series about network troubleshooting. You can read our feature on Basic Network Troubleshooting here. On this article, we focus on troubleshooting connectivity problems through examining the output produced by the traceroute command.

The traceroute command lists all the router jumps that happen between your server and the target server. Checking this list helps you verify if the routing over the networks in between is correct. All operating systems carry some form of router-path tracing utility. Linux distributions, for example, have tracepath and traceroute6 (for IPv6; equivalent to traceroute -6), while Windows has tracert and PathPing (Windows NT).

This is how traceroute works:

  1. It sends a ICMP or UDP packet with a time-to-live (TTL) of “0” to the target server.
  2. The first router on the path recognizes that the TTL already exceeded and drops the packet. At the same time, this router also sends an Internet Control Message Protocol (ICMP) time-exceeded message back to the source.
  3. traceroute then records the IP address of the router that sent the ICMP message as this is the first “hop” on the path to the final server destination.
  4. traceroute does the same action but uses a TTL of “1” this time. The first hop reads this packet, decrements its TTL to 0 and forwards it to the hop on the path. Second router then does the same actions as in step 3.
  5. This continues until the final or target server is reached.

You will, of course, only receive responses from functioning machines. Simply put, if a device responds when you do your troubleshoot, it is not likely the source of the connectivity problem.

Use the following syntax to generate traceroute reports:

# traceroute [destination_host]

Below is an example of a traceroute output for a query on google.com. Note that all the hop times are less than 50 milliseconds (ms). This is the acceptable return speed.

# traceroute google.com
Resolving Address: google.com
traceroute to google.com (74.125.196.138), 30 hops max, 60 byte packets
1  example.lan (X.X.X.X)  0.649 ms  0.644 ms  0.674 ms
2  67.23.161.132 (67.23.161.132)  0.212 ms  0.414 ms  0.412 ms
3  67.23.161.142 (67.23.161.142)  6.494 ms  6.510 ms  6.673 ms
4  aix.pr1.atl.google.com (198.32.132.41)  6.593 ms  6.600 ms  6.600 ms
5  72.14.233.54 (72.14.233.54)  6.925 ms  14.785 ms 72.14.233.56 (72.14.233.56)  6.811 ms
6  66.249.94.22 (66.249.94.22)  7.310 ms 66.249.94.24 (66.249.94.24)  7.372 ms 66.249.94.20 (66.249.94.20)  7.345 ms
7  209.85.248.31 (209.85.248.31)  7.357 ms 216.239.46.186 (216.239.46.186)  7.392 ms 209.85.243.26 (209.85.243.26)  7.265 ms
8  * * *
9  yk-in-f138.1e100.net (74.125.196.138)  7.291 ms  7.457 ms  7.264 ms

 The table below defines the code symbols that traceroute can return:

Returned Code Description
***

The expected 5-second response time was exceeded. The delay could be caused by one of the following:

  • A router on the path is not sending back the ICMP time-exceeded messages.
  • A router or firewall in the path is blocking the ICMP time-exceeded messages.
  • The target IP address is not responding.
!H, !N, or !P The host, network, or protocol is not reachable.
!X or !A An administrator-imposed setting is blocking the, which means that either a router Access Control List (ACL) or firewall is in the way.
!S The source route has failed as traceroute attempts to use a certain path. A certain router security setting might be causing this failure.

Performing bidirectional traces

Always trace from both directions: from the source IP to the target IP, and from the target IP to the source IP. Routes are often asymmetric, which mean they take one path in one direction and a different path in the return direction. Trace the route both ways to pinpoint a problem more accurately.


Tracing via looking glass

 A lot of Internet service providers (ISPs) provide a facility to do a traceroute from dedicated servers called looking glasses. As these looking glasses are in various locations, you can trace whether the connectivity issue you are experiencing stems from your web server or from the ISP being used.

 You can do a quick web search and query the term “Internet looking glass” to a get a long list of alternatives. You can also go to traceroute.org, which already lists looking glasses by country.

 Time-exceeded false alarm

If traceroute does not get a response within a 5-second timeout interval, three asterisks (see table above) appear beside that hop:

# traceroute arin.com
Resolving Address: arin.com
traceroute to arin.com (192.149.252.124), 30 hops max, 60 byte packets
1  208.69.X.X (208.69.X.X)  0.485 ms  0.467 ms  0.497 ms
2  67.23.161.132 (67.23.161.132)  0.308 ms  0.324 ms  0.466 ms
3  67.23.161.142 (67.23.161.142)  6.474 ms  6.524 ms  6.688 ms
4  xe-9-1-3.edge5.Atlanta2.Level3.net (4.71.254.77)  6.590 ms  6.612 ms  6.613 ms
5  ae-4-90.edge2.Washington4.Level3.net (4.69.149.208)  19.102 ms ae-1-60.edge2.Washington4.Level3.net (4.69.149.16)  19.013 ms ae-3-80.edge2.Washin   gton4.Level3.net (4.69.149.144)  19.252 ms
6  ae-3-80.edge2.Washington4.Level3.net (4.69.149.144)  19.040 ms ae-4-90.edge2.Washington4.Level3.net (4.69.149.208)  19.033 ms  19.219 ms
7  COX-COMMUNI.edge2.Washington4.Level3.net (4.53.114.34)  83.252 ms  83.047 ms COX-COMMUNI.edge2.Washington4.Level3.net (4.53.114.58)  35.844 ms
8  mrfddsrj01-ae0.0.rd.dc.cox.net (68.1.1.5)  21.366 ms  21.369 ms  21.654 ms
9  * * *
10  * * *
11  wsip-98-172-152-14.dc.dc.cox.net (98.172.152.14)  77.031 ms  23.110 ms  23.114 ms
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *

Note that there are devices that prevent receiving traceroute packets but allow ICMP packets. To get around this, add an -I flag to the traceroute syntax so that it uses ICMP packets instead. See the change below after the -I flag was used:

# traceroute -I arin.com
traceroute to arin.com (192.149.252.125), 30 hops max, 60 byte packets
1  208.69.X.X (208.69.X.X)  0.504 ms  0.508 ms  0.556 ms
2  67.23.161.132 (67.23.161.132)  0.290 ms  0.315 ms  0.348 ms
3  67.23.161.142 (67.23.161.142)  6.595 ms  6.603 ms  6.772 ms
4  xe-9-1-3.edge5.Atlanta2.Level3.net (4.71.254.77)  7.612 ms  7.617 ms  7.618 ms
5  ae-1-60.edge2.Washington4.Level3.net (4.69.149.16)  19.107 ms  19.111 ms  19.111 ms
6  ae-1-60.edge2.Washington4.Level3.net (4.69.149.16)  19.109 ms  19.034 ms  19.188 ms
7  COX-COMMUNI.edge2.Washington4.Level3.net (4.53.114.34)  60.466 ms  69.478 ms  69.467 ms
8  mrfddsrj01-ae0.0.rd.dc.cox.net (68.1.1.5)  63.591 ms  57.189 ms  57.176 ms
9  * * *
10  * * *
11  wsip-98-172-152-14.dc.dc.cox.net (98.172.152.14)  88.691 ms  88.191 ms  88.179 ms
12  host-252-131.arin.net (192.149.252.131)  86.348 ms  86.030 ms  86.018 ms
13  www.arin.net (192.149.252.125)  87.442 ms  86.854 ms  53.690 ms

Slow internet false alarm

The tracert output below seems to show that a website with the IP 80.40.X.X is loading slowly because there is congestion at hops 6 and 7 where the response time is over 200ms:

C:\>tracert 80.40.X.X
1     1 ms     2 ms     1 ms  66.134.200.97
2    43 ms    15 ms    44 ms  172.31.255.253
3    15 ms    16 ms     8 ms  192.168.21.65
4    26 ms    13 ms    16 ms  64.200.150.193
5    38 ms    12 ms    14 ms  64.200.151.229
6   239 ms   255 ms   253 ms  64.200.149.14
7   254 ms   252 ms   252 ms  64.200.150.110
8    24 ms    20 ms    20 ms  192.174.250.34
9    91 ms    89 ms    60 ms  192.174.47.6
10   17 ms    20 ms    20 ms  80.40.96.12
11   30 ms    16 ms    23 ms  80.40.X.X
Trace complete.
C:\>

This is not an outright indication of congestion, latency, or packet loss. If those issues are really happening, then all the other hops past 7 should have been problematic as well. What the trace result above actually says is that the devices on hops 6 and 7 were just slow to respond with ICMP TTL-exceeded messages. Remember that a lot of web routing devices give very low priority to packets related to trace utilities so they can give more bandwidth to other more lucrative traffic.

Request timeout before reaching target server

If the trace times out before the target server is reached, the possible causes may can one of the following scenarios:

  • A server has a bad default gateway.
  • The server is running a firewall that blocks traceroute.
  • The server is either shut down, disconnected from the network, or has an incorrectly configured network interface controller (NIC).

In the example below, the last device that responded to traceroute is a router that acts as the default gateway of the server. Remember that the problem, in this instance, is not with the router but with the server as traceroute only receives responses from functioning devices.

C:\>tracert 82.40.X.X

Tracing route to 82.40.X.X over a maximum of 30 hops
1    33 ms    49 ms    28 ms  192.168.1.1
2    33 ms    49 ms    28 ms  65.14.65.14
3    33 ms    32 ms    32 ms  81.25.69.252
4    47 ms    32 ms    31 ms  82.40.57.1
5    29 ms    28 ms    32 ms  82.40.97.11
6     *        *        *     Request timed out.
7  ^C
C:\>

 Troubleshooting example

A ping to 162.219.X.X gave a TTL timeout message. Usually, this event only happens if there is a routing loop wherein the packet bounces between two routers on the way to the target server. Each bounce makes the TTL decrease by a count of “1” until it reaches “0,” at which point the ping request times out.

The mentioned routing loop was confirmed when a traceroute was done and the packet was seen bouncing between routers 12.34.56.78 and 12.34.56.79:

 C:\>ping 162.219.X.X

Pinging 162.219.X.X with 32 bytes of data:
Reply from 208.69.Y.Y: TTL expired in transit.
Reply from 208.69.Y.Y: TTL expired in transit.
Reply from 208.69.Y.Y: TTL expired in transit.
Reply from 208.69.Y.Y: TTL expired in transit.

Ping statistics for 162.219.X.X:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

C:\>tracert 162.219.X.X

Tracing route to myserver.example.net [162.219.X.X]
over a maximum of 30 hops:
1    <1 ms    <1 ms    <1 ms  192.168.1.1
2    60 ms    70 ms    60 ms  router-2.example.net [12.34.56.79]
3    70 ms    71 ms    70 ms  router-1.example.net [12.34.56.78]
4    60 ms    70 ms    60 ms  router-2.example.net [12.34.56.79]
5    70 ms    70 ms    70 ms  router-1.example.net [12.34.56.78]
6    60 ms    70 ms    61 ms  router-2.example.net [12.34.56.79]
7    70 ms    70 ms    70 ms  router-1.example.net [12.34.56.78]
8    60 ms    70 ms    60 ms  router-2.example.net [12.34.56.79]
9    70 ms    70 ms    70 ms  router-1.example.net [12.34.56.78]



Trace complete.
C:\>

The routers with IPs 12.34.56.78 and 12.34.56.79 had their routing processes reset to solve the problem. Further investigation showed that the issue was set off by an unstable network link that caused frequent routing recalculations. The constant activity eventually corrupted the routing tables of one of the routers.


Reasons for failed traceroutes

There are several possible reasons a traceroute fails to reach the target server:

  • The traceroute packets are blocked or rejected by a router in the path. Usually, the router immediately after the last visible hop is the one causing the blockage. Check the routing table and the status of this device.
  • The target server does not exist on the network, which means it is either disconnected or turned off. Note that !H or !N messages are likely to appear.
  • The network where you are expecting the target host to be in does not exist in the routing table of one of the routers in the path. Note that !H or !N messages are likely to appear.
  • Wrong IP address is used for the target server.
  • There is a routing loop where packets bounce between two routers and never reach the target destination.
  • The packets do not have a proper return path to your server. The router immediately after the last visible hop where the routing changes. If this occurs, do the following steps:
      • Log on to the last visible router.
      • Look at the routing table to know where the next hop should be.
      • Log on to this next hop router.
      • Do a traceroute from this router to your target server.
        • If the trace completes – The routing to the target server is working fine. Trace back to your source server and traceroute will probably fail at the bad router on the return path.
        • If the trace fails – Test the routing table and check the other status of all the hops between this router and your target destination.

Essentially, if nothing is blocking your traceroute packets, then the last visible router of an incomplete trace is either the last good router on the path or the last router with a valid return path to the server that issued the traceroute.

The traceroute command is a very handy tool when troubleshooting network connectivity problems. Understanding it is crucial for every network administrator.

See also Basic Network Troubleshooting.
See also Advanced Network Troubleshooting: Using My Traceroute (MTR).
See our Knowledgebase for more How-To articles.

Comments are closed.