Network Troubleshooting

Constructing a network is really quite straight forward. You connect everything together, set the appropriate network protocols, configure your applications accordingly and away you go. Job done, that'll be a ё500 consultancy fee, thankyou very much;-)

Unfortunately, it's not always that simple. It is so easy to make a typo, to select the wrong option and generally get lost in the myriad of possible settings that at some stage, something is not going to do what you expected. With protocols such as TCP/IP, finding out what went wrong is really quite simple, but for others the tools may not be so easy to use. The following sections are divided between each of the three main network protocols you may require on a home network - TCP/IP, NETBIOS/NETBUEI and IPX/SPX, although it cannot be stressed enough that to use Cable Modem services, you only need TCP/IP as a network protocol.

Before looking at errors in the network protocols, however, it is often worth establishing that the hardware you are using is working correctly, and the appropriate drivers are loaded for whatever NIC and OS you are using. A lot of network troubles are a consequence of physical problems, be it a cable fault, NIC fault or other transient problem and all network protocols rely on the physical side of things to be available and error free. Without this, network protocols do not function at all well, and generally there is little indication given by the protocols in use that the reason a particular host is not responding is due to the fact that a physical problem exists.

Hardware Problem Solving

The four most common causes for network faults are:

  1. Faulty Cable
  2. Incorrect NIC drivers Loaded
  3. Faulty NIC
  4. Incorrect data speed negotiation
  1. Faulty Cable

    Always the last thing that is done when trying to find a fault with a network is to check or replace cables. For this reason, I decided to put this top of the list. Without the use of expensive (for the home network) cable testing equipment, it is sometimes difficult to detect a fault with a particular cable, so generally, to eliminate a suspected cable problem, the only option is to replace it, and observe the effect.

    It is also worth noting here that for the majority of uses a straught CAT5 cable is used, but there are occassions where a cross-over cable is needed so also make sure the correct type of cable is being used.

  2. Incorrect NIC drivers loaded

    Probably the most frustrating part of installing a NIC is getting your OS to recognise it, but OTOH, it's a most pleasing experience! Plug and Play (often referred to as Plug and Pray) technology means that a large number of network adapters are recognised by the OS and it will automatically install the necessary drivers itself. There are, however, a number of OS's that do not support P&P and for these there maybe a requirement to install the appropriate drivers manually.

    One particular problem with Plug and Play devices is that the OS drivers may not be particularly good, or do not contain current bug fixes, and it is therefore a prudent measure to download the card driver directly from the manufacturer's website. In the case where the manufacturer is unknown, you can identify the card by using the FCC ID search facility at the Federal Communications Commission. This helps you identify the manufacturer of a card from the FCC ID number that is etched on the card. The FCC search requires be based on the FCC License Grantee Code which is the first three characters of the ID and, optionally, the product code, so an ID of DF63C509B-TPC would require DF6 entered as the Grantee Code and 3C509B-TPC as the product code. In this example, the product code identifies itself a common 3Com Etherlink III card, but some manufacturers' products are not so well known.

  3. Faulty NIC

    Whilst this is not all that a common problem, it does happen. If suspected, then generally the easiest way to confirm your suspicions is to replace the card. Some card manufacturers, however, provide diagnostic software for their cards and it is usually a good idea to run some tests on the card as well. A number of test options are normally available, including actual card tests and also link tests. Link tests usually apply some type of loopback to the interface and then send some known test pattern, which is then verified as it is received.

    Within Windows, it is possible to check the status of the NIC through the System Properties available either by right-clicking the My Computer (or whatever yours is called;-)) icon, or by selecting System from within Control Panel. An incorrectly loaded card will show an exclamation mark against it and you will need to double-click on it to find out the problem that Windows is having with the card.

  4. Incorrect data speed negotiation

    A common issue that occurs with dual speed networking devices is that they usually negotiate the network speed with the device they are directly connected to. Unfortunately, if the other device is also set to negotiate then it sometimes happens they both neogiate different settings! One may be set at 10M Half-duplex and the other at 100MB Full-duplex, and this can have a major impact on network performance!

    The easiest way to fix this sort of problem is to set one or the other end to a fixed speed and duplex setting. As one end of your connection may be a switch, this is usually easier to do on the connecting device, via the properties of the network adapter. The following diagram shows this option for an SMC 1255TX 10/100 PCI NIC.

    Setting the Network Speed

    As can be seen from this diagram, there are four available speed settings, 10MB Half-duplex (10BaseT as shown in the diagram), 10MB Full-duplex (10BaseT Full_Duplex), 100MB Half-duplex (100BaseTX) and 100MB Full-duplex (100BaseTX Full_Duplex). As a guide, I would suggest that if you are connecting to a hub, use 10MB Half-duplex and if connecting to a switch use 100MB Full-duplex as this would cover most possibilities, although it is perfectly possible to get hubs that will support 100MB Full-duplex. If connecting two PCs via a cross-over cable, set both ends to 100MB Full-duplex, assuming the cards can work at that speed.

TCP/IP

There are two very useful tools available for TCP/IP networks, namely Ping and Traceroute. These tools, however, are useless for effective troubleshooting if some of the fundamental TCP/IP requirements are not met:

  1. Are the IP addresses of your local machines on the correct network?
  2. Are the machines set up for the correct default router, or gateway?
  3. Is DNS working correctly?

  1. Are the IP addresses of your local machines on the correct network?

    You need to use IP addresses that are appropriate for the IP network you are attached to. If you have a device on the network with an address of 192.168.0.1, then all other hosts on the same network need addresses that begin with 192.168.0 as well. It is also important that the same subnet mask is used across all hosts as well.

  2. Are the machines set up for the correct default router, or gateway?

    As with the adresses of the network hosts, the default gateway also needs to be on the same subnet. Let's assume that our host 192.168.0.1 is the LAN side of an Windows ICS machine. The hosts on the LAN would need to have this address as their default gateway.

  3. Is DNS working correctly?

    TCP/IP networking and then Internet functions quite happily using IP addresses. Most of the Internet, however, uses the idea of replacing the addresses with names so if it is found that http://www.bbc.co.uk does not work, but 212.58.224.32 does, the assumption can be made that DNS is not set up correctly or the DNS server used is not working as it should.

    Most devices or software (including Microsoft's ICS) that provide Internet sharing will provide DNS services to the LAN so usually it is a case of specifying the address of the sharing device as the DNS server, which gain in the case of MS ICS is 192.168.0.1.

    If for some reason there is no DNS server available on the gateway device, then it will be necessary to configure NTL's DNS servers on the LAN machines. NTL's DNS servers are 194.168.4.100 and 194.168.8.100 and it is normal to specify both in the setup.

DHCP

It is very common for default TCP/IP settings to be set to automatic, which means that a particular device will attempt to connect to a DHCP server in order to obtain an IP address. In DHCP parlance this is called 'address leasing', and a network device will 'borrow' a particular IP address for whatever length of time the DHCP server is set to lease the address for. This can be hours, days, weeks or even years! As well as leasing the IP address to the client, a DHCP server will also tell the client what default gateway and DNS servers to use. A very useful method for setting machines to use a TCP/IP network.

In some cases, DHCP fails to work as expected (more likely with Microsoft ICS under Windows 98, in my experience) and a device will fail to lease an address. If this occurs, two things may happen, depending on your operating system. Windows will allocate a default address that begin with 169.254, once it's DHCP time-out limit is reached. If you have two machines running windows that are connected to the same network and they are both set for DHCP, then they will still be able to communicate using their default addresses, albeit between themselves. Unix based systems, however, tend to time-out and then not allocate any address so it will be necessary to manually set an address in order to carry out any diagnostics.

Checking the IP Settings

  • Win9X

    Use the command winipcfg by selecting Run from the Start Menu. A window will be shown that looks like this:

    Output from Start/Run/winipcfg

    The initial screen for winipcfg only displayes the assigned IP Address, Subnet Mask and Default Gateway. In order to check the DNS servers in use and the length of any lease, it is necessary to click on 'More Info'. The fact that the Release and Renew buttons are greyed out implies that this machine has a fixed IP Address.

    More Winipcfg

    Note that if the PC also contains a modem, then winipcfg will default to showing the properties for this first so it is necessary to select any NICs by clicking on the down arrow where the modem is identified and selecting the NIC to be viewed.

  • WinNT/2000/XP

    In a dos command window, type ipconfig :

    C:\>ipconfig
    
    Windows 2000 IP Configuration
    
    Ethernet adapter Local Area Connection 5:
    
            Connection-specific DNS Suffix  . : nigs.net
            IP Address. . . . . . . . . . . . : 192.168.3.201
            Subnet Mask . . . . . . . . . . . : 255.255.255.0
            Default Gateway . . . . . . . . . : 192.168.3.254
    

    Again, like winipcfg, ipconfig only shows the IP Address, Subnet Mask and Default Gateway. In order to see more detail it is necessary to type the ipconfig command with the /all switch, like this ipconfig /all :

    
    C:\>ipconfig /all
    
    Windows 2000 IP Configuration
    
            Host Name . . . . . . . . . . . . : NIGS
            Primary DNS Suffix  . . . . . . . :
            Node Type . . . . . . . . . . . . : Hybrid
            IP Routing Enabled. . . . . . . . : No
            WINS Proxy Enabled. . . . . . . . : No
            DNS Suffix Search List. . . . . . : nigs.net
    
    Ethernet adapter Local Area Connection 5:
    
            Connection-specific DNS Suffix  . : nigs.net
            Description . . . . . . . . . . . : Linksys EtherFast 10/100 PC Card
            Physical Address. . . . . . . . . : 00-E0-98-21-25-4C
            DHCP Enabled. . . . . . . . . . . : Yes
            Autoconfiguration Enabled . . . . : Yes
            IP Address. . . . . . . . . . . . : 192.168.3.201
            Subnet Mask . . . . . . . . . . . : 255.255.255.0
            Default Gateway . . . . . . . . . : 192.168.3.254
            DHCP Server . . . . . . . . . . . : 192.168.3.254
            DNS Servers . . . . . . . . . . . : 192.168.3.254
            Primary WINS Server . . . . . . . : 192.168.3.254
            Lease Obtained. . . . . . . . . . : 15 July 2001 20:06:44
            Lease Expires . . . . . . . . . . : 16 July 2001 20:06:44
    
    

    In this example, it is immediately obvious that DHCP has been used to get the address 192.168.3.201 and that the lease for this connection is one day. It can also be seen that the DHCP server is 192.168.3.254, and this also happens to be both the Default Gateway and DNS server as well - a common scenario with most Internet sharing devices.

  • Linux

    At your shell prompt type ifconfig, which will result in a display that will look very similar to that shown in the following screen dump, which shows the output from a two NIC Linux machine where eth0 is attached to an NTL Cable Modem and eth1 is attached to the internal network. Note that there is no indication here as to eth0 having obtained it's address via DHCP. Neither is there any indication as to the Default Router or DNS. However we do get an awful lot more other information. This command is also available in other flavours of 'Nix, such as SunOS and the BSD families.

    Output from ifconfig

Testing the Network

There are many hundreds of tools available for troubleshooting networks, but the majority of network devices will include at least one of Ping or Traceroute. Ping allows the sending of an ECHO request packet, and if the target host reponds will receive an ECHO reply and then calculate the response time. Typically, the faster time the better;-). Traceroute does one better, in that it tries to generate a response from every router hop in the path to the target and shows the response time for each hop.

Typical output from Ping is shown below, where 10 ECHO request packets have been to sent to www.bbc.co.uk. The site reponds and ping displays the response time.


[nig@cat nigsnet]$ ping www.bbc.co.uk
PING www.bbc.net.uk (212.58.224.32) from 192.168.3.80 : 56 data bytes
64 bytes from 212.58.224.32: icmp_seq=0 ttl=246 time=37.3 ms
64 bytes from 212.58.224.32: icmp_seq=1 ttl=246 time=19.0 ms
64 bytes from 212.58.224.32: icmp_seq=2 ttl=246 time=21.1 ms
64 bytes from 212.58.224.32: icmp_seq=3 ttl=246 time=31.2 ms
64 bytes from 212.58.224.32: icmp_seq=4 ttl=246 time=30.7 ms
64 bytes from 212.58.224.32: icmp_seq=5 ttl=246 time=19.6 ms
64 bytes from 212.58.224.32: icmp_seq=6 ttl=246 time=25.8 ms
64 bytes from 212.58.224.32: icmp_seq=7 ttl=246 time=53.3 ms
64 bytes from 212.58.224.32: icmp_seq=8 ttl=246 time=27.3 ms
64 bytes from 212.58.224.32: icmp_seq=9 ttl=246 time=19.9 ms

--- www.bbc.net.uk ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 19.0/28.5/53.3 ms
Whilst ping tries to give some indication of the response time of a particular host, traceroute can also give some indication of the response time of the intervening routers that the data packets will pass. Trace'ing to the same host that was ping'ed before now shows the hosts that are passed through on the way to www.bbc.co.uk. In this instance, the route stops at hop 8, after which traceroute is assumed to be blocked by a firewall or filter. Note that the following example shows traceroute from a Linux command shell. In Windows, traceroute has been shortened to tracert. Both ping and traceroute can be used with either a hostname or IP address so if a particular host fails to respond when using the hostname it is possible to also try using the IP address instead, if it is known.


[nig@cat nigsnet]$ traceroute www.bbc.co.uk
traceroute to www.bbc.net.uk (212.58.224.32), 30 hops max, 38 byte packets
 1  gatekeeper (192.168.3.254)  0.546 ms  0.925 ms  0.956 ms
 2  172.28.95.254 (172.28.95.254)  12.314 ms  14.353 ms  11.825 ms
 3  ltn-cam2-a-s10.inet.ntl.com (62.252.67.65)  16.818 ms  12.816 ms  27.292 ms
 4  ltn-core-a-pos800.inet.ntl.com (62.252.64.153)  17.648 ms  13.196 ms  12.410 ms
 5  ltn-t2core-a-so-011-0.inet.ntl.com (213.107.47.5)  13.148 ms  19.786 ms  29.437 ms
 6  gfd-bb-a-so-200-0.inet.ntl.com (213.105.172.18)  70.735 ms  21.917 ms  16.331 ms
 7  linx-ic-1-so-100-0.inet.ntl.com (62.253.185.78)  16.498 ms  19.649 ms  23.985 ms
 8  rt-linx-a.thdo.bbc.co.uk (195.66.224.103)  18.275 ms  20.724 ms  17.990 ms
 9  * * *
10  * * *
Both of the above examples show the responses from ping and traceroute that would occur in an almost perfect world. Hop 1, gatekeeper, shows that this traceroute has been originated from a machine on the 192.168.3.0 network and that the machine is able to perform a DNS look-up in order to determine that www.bbc.co.uk is really at IP address 212.58.224.32. The fact that the BBC block traceroute at their perimeter should not detract from the fact it reveals that at the least the local network is working fine. Unfortunately, this is not always the case but both Ping and Traceroute can also be used to trouble-shoot the internal machines as well.

Troubleshooting TCP/IP on the Local Area Network

Ordinarily traceroute is not really that much use on a LAN, although it can be used as an alternative type of ping. Ping is probably the more conventional tool to use. You can use it to test for a particular PC, gateway or Router on the local network.

If a particualar machine cannot connect to an Internet host, i.e., a web server, an FTP server or a a news server, then the local gateway address is a good place to start ping diagnostics. Any router or gateway device is one host on the network that would be expected to be a pingable host - it's the connection of the LAN to the Internet and it's also possible that other hosts on the network are already using it. If the gateway responds then this would confirm that any issue with the PC would not likely be due to TCP/IP set-up. If, on the other hand, the gateway does not respond then further investigation around the local PC will be necessary.

If a host on a the network does not respond as expected, then one of the first items to check is whether the host that is running the ping request has actually got an address on the correct subnet. By default, most NICs will be set at default to use DHCP, so one of the most common causes of an incorrectly set address is that a DHCP server could not be found. This will result in the address being set from the default range 169.254.0.0. If this happens then an easy fix is to set the IP address manually so that it conforms to your own network. Be warned, however, if a PC cannot find a dhcp server, then there may be other issues such as cabling that will prevent connectivity, so even setting a manual address will not allow the PC to work.

Ping and traceroute are useful tools but all they really do is determine whether a host is alive, what the response time is likely to be and, in the case of traceroute, the path taken to a particular host. Both can also let you know whether DNS is working (getting a message saying 'host not found' when pinging www.bbc.co.uk is more likely to be a local problem rather than the bbc being down!) but that is about it. Neither will give any indication that a proxy server is serving correctly or why a connection cannot be made to a mail server, for instance..

Troubleshooting Microsoft ICS

Microsoft's ICS is probably one of the most infuriating methods of sharing an Internet service. It can be confusing to initially configure and can also cause some issues with particular applications. It can also cause some performance degradation on the ICS machine, especially if it is used as a workstation as well. That said, when it does work, it works well.

Networks using ICS can be treated just like any other network as far as initial diagnostics are concerned. Basically, the ICS machine will form the gateway for the other machines on the network, just as a router would, so a first check could still be to ping the default router as set in the client PC. However, there are two aspects to ICS that can cause it not to work as expected:

1) The wrong NIC is set as the shared connection

When two NICs are installed in a Windows machine an extra tab, called Sharing, is available, where sharing can be enabled and this should only be enabled on the cable facing NIC, and not the LAN facing one. A common mistake is to enable the LAN connected NIC and this can cause an otherwise perfectly good-looking ICS configuration to fail miserably. An easy way of checking that the right NIC is enabled is to use ipconfig /all, and see what IP addresses have been assigned. If correct, the CM connected NIC will show an NTL-assigned address and the LAN NIC will be set to 192.168.0.1.

2) The ICS DHCP service fails to work

ICS provides DHCP services to the LAN by default. This means that any clients on the LAN can just be set to auto configuration for an IP address. Sometimes, and this is probably more common on Win9x versions of ICS, the DHCP service does not work correctly and it may be necessary to allocate fixed-ip addresses to the client machines.

The following diagram shows how PCs in an ICS environment should normally be set for their IP addresses:

TCP/IP setting for MS ICS

Sometimes, ICS may issue out itself as a DNS server and not actually function as such. Where this happen sites will load using their IP address but not their name so, for example http://212.58.224.32 works but http;//www.bbc.co.uk does not. If this does happen, set NTL's DNS servers (194.168.4.100 and 194.168.8.100) on the client machine. Ping can be used to verify that the client can reach either NTL DNS server before trying this change.

Troubleshooting Microsoft Networking

Probably the second most infuriating network facility after ICS that Microsoft have ever produced. Like ICS, MS networking can be a pain but once working correctly is a very useful tool for sharing files and printers. And again, like ICS, MS Networking has a couple of common issues that can cause it not to work as expected. Basically, these revolve around incorrect hostname or workgroup setup, no shared folders or TCP/IP is just not setup correctly. Additionally, when Windows 2000 or XP are introduced into the mix, there are also user permissions to be taken into account. Windows 9x versions had little user security when sharing files, but with 2000 and XP MS made file sharing more secure and a remote PC connecting has to authenticate in order to get access to shared resources.

Troubleshooting Firewalls

Firewalls can sometimes be a necessary evil. Necessary to prevent unwanted connections to your PC, but evil in the sense that they can block traffic that you actually need to allow. A typical example of this is that some firewalls will block replies from NTL's DHCP servers, meaning that the PC will not get necessary IP lease renewal messages when requested.

Another common issue with firewalls where they are used on a private LAN is that they block wanted traffic between machines, especially MS Networking.


© Nig's Net Written using the Bluefish HTML Editor on RedHat 9.0.

All Copyrights and Trademarks ACK'd. Not to do so would be a SYN!