Solaris Troubleshooting : Data Gathering for Network Related Issues
This article will help you to gather necessary information required to understand and troubleshoot Solaris Network related issues. And Resolving the issues found during the Data analysis is not covered in this article.
Data Gathering for Networking Related Troubleshooting
Scenario 1: Interface reporting repeated “link up/link down” errors in /var/adm/messages, NOT due to any system or network administrative activity.
Required information:
– interface driver in use and instance (e1000g0, nxge1, hme4, etc.) as part of a problem statement.
– Results of interface to switch connectivity diagnostic “watch-net” from outside the Solaris OS at the Open BootProm (SPARC)
The following example shows the procedure which consists of
- Unset auto-boot,
- List network hardware,
- Select interface to test,
- Perform a “test” and a “watch-net”
- Re-set auto-boot, if desired. Note: italicized text below are comments and will not appear on the screen.
{0} ok setenv auto-boot? false necessary, if it was set to true
{0} ok reset all
{0} ok show-nets
a) /pci@1f,4000/pf@5
b) /pci@1f,4000/network@1,1
c) /pci@1f,4000/pci@2/network@0
q) NO SELECTION
Enter Selection, q to quit: c
/pci@1f,4000/pci@2/network@0 has been selected.
Type ^Y ( Control-Y ) to insert it in the command line.
e.g. ok nvalias mydev ^Y
for creating devalias mydev for/pci@1f,4000/pci@2/network@0
{0} ok test /pci@1f,4000/pci@2/network@0 (you can use ^Y for the path)
Testing /pci@1f,4000/pci@2/network@0
Internal loopback test — succeeded.
Link is — up
{0} ok watch-net /pci@1f,4000/pci@2/network@0
Internal loopback test — succeeded.
Transceiver check — passed
Looking for Ethernet Packets.
‘.’ is a Good Packet. ‘X’ is a Bad Packet.
Type any key to stop.
…….^C
{0} ok setenv auto-boot? true necessary, if it was set to true before
{0} ok
Scenario 2: No connectivity inbound or outbound (on same subnet, otherwise a possible routing/subnetting problem**)
DATA COLLECTION:
- Interface driver in use and instance (e1000g0, nxge1, hme4, etc.) as part of a problem statement.
- Results of interface to switch connectivity diagnostic “watch-net” from outside the Solaris OS at the Open BootProm (SPARC). For details the SPARC “watch-net” utility and available x64 platform network diagnostics, see the “Data Collection” section of Problem #1 above.
- Explorer output. If the issue is related to a zone, the “-w localzones” option to explorer should be used.
- Details on how interface is connected to the network (switch or hub? back-to-back connection with another system?)
- Collect this SPECIFIC snoop for 1 minute (NOTE: nothing will be displayed back to the screen during this capture):
# snoop -d -o .snoop -s 128 -q
Interface provided with “-d” would be e1000g0, bge1 or whatever interface and instance is being tested.
Scenario 3 : This host can’t ping another host (outbound communication on same subnet, otherwise a possible routing and/or subnetting problem**)
DATA COLLECTION:
- Interface driver in use and instance (e1000g0, nxge1, igb3, etc.) as part of a problem statement.
- Results of interface to switch connectivity diagnostic “watch-net” from outside the Solaris OS at the Open BootProm (SPARC). For details the SPARC “watch-net” utility and available x64 platform network diagnostics, see the “Data Collection” section of Problem #1 above.
- Explorer output. If the issue is related to a zone, the “-w localzones” option to explorer should be used.
- Details on how interface is connected to the network (switch or hub? back-to-back connection with another system?)
- Collect this SPECIFIC snoop for 1 minute (NOTE: nothing will be displayed back to the screen during this capture):
# snoop -d {interface instance}-o {interface instance}.snoop -s 128 -q
Collect the same snoop from the other system being contacted. If the other system is not running Solaris, an equivalent packet capture can be substituted (tcpdump/libpcap, Ethereal, etc.)
Scenario 4: Another host can’t ping this host (inbound communication on same subnet, otherwise a possible routing and/or subnetting problem**)
DATA COLLECTION:
- Interface driver in use and instance (e1000g0, nxge1, hme4, etc.) as part of a problem statement.
- Results of interface to switch connectivity diagnostic “watch-net” from outside the Solaris OS at the Open BootProm (SPARC). For details the SPARC “watch-net” utility and available x64 platform network diagnostics, see the “Data Collection” section of Problem #1 above.
- Explorer output. If the issue is related to a zone, the “-w localzones” option to explorer should be used.
- Details on how interface is connected to the network (switch or hub? back-to-back connection with another system?)
- Collect this SPECIFIC snoop for 1 minute (NOTE: nothing will be displayed back to the screen during this capture):
# snoop -d {interface instance}-o (interface instance}.snoop -s 128 -q
Generic Data Analysis to check the Connectivity between Interface on Different IP networks ( SUBNET)
Why is this important?
What is the difference between interfaces on the same IP network and interfaces on different IP networks? Understanding the difference demonstrates the importance of this concept.
Interfaces on the same IP subnet do not need a default router to communicate amongst themselves. Interfaces that are NOT on the same IP subnet DO require a default router to communicate amongst themselves. If this default router entry is incorrect or missing, OR if the subnet mask of the interface is incorrect or not set correctly, communication between interfaces on different networks will fail. This can appear as a HARDWARE failure, but the hardware is not at fault.
For this reason it is necessary to consider the following when qualifying, describing and troubleshooting a connectivity issue between 2 hosts:
Check 1: Are the interfaces involved on the same subnet?
A. Compare the “broadcast” address.
In the following example, the 1st broadcast address is 10.30.21.255 and the 2nd broadcast address is 129.148.171.255
THESE INTERFACES (hme0 and bge0) ARE NOT ON THE SAME SUBNET, because they have different broadcast addresses:
hme0: flags=1000843<up,broadcast,running,multicast,ipv4> mtu 1500 index 2
inet 10.30.21.33 netmask ffffff00 broadcast 10.30.21.255
ether 8:0:20:ce:8a:9e
bge0: flags=1000843<up,broadcast,running,multicast,ipv4> mtu 1500 index 2
inet 129.148.171.244 netmask ffffff00 broadcast 129.148.171.255
ether 0:3:ba:e4:81:ca
These 2 interfaces (bge1 and bge2) ARE on the same subnet because their broadcast addresses match:
bge1: flags=1000843<up,broadcast,running,multicast,ipv4> mtu 1500 index 2
inet 129.148.171.240 netmask ffffff00 broadcast 129.148.171.255
ether 0:3:ba:e4:81:ca
bge2: flags=1000843<up,broadcast,running,multicast,ipv4> mtu 1500 index 2
inet 129.148.171.241 netmask ffffff00 broadcast 129.148.171.255
ether 0:3:ba:e4:82:cb
Check 2 : If the interfaces are on the same subnet per the comparison above, use ping to see if any other systems with configured interfaces on that same subnet can be contacted. If this succeeds, then the interface is operational and there is a network configuration problem outside of the Sun system.
Check 3: If the interfaces are NOT on the same subnet per the comparison above, then see the following “Additional Possible Causes and Corrective Actions”.
2 Responses
[…] Read – Data Gathering for Network Related Issues […]
[…] Read – Data Gathering for Network Related Issues […]