Miscellaneous Operating Systems/Hardware

call for network professionals ! dns :( - Page 1

This thing is driving me nuts. Either the people running dns here are total morons (possible) or they are very clever people doing it on purpose (also possible) or there are two groups equally responsible, the morons and the cunning bastards. Anyhow ...

Code:
urchin 3% nslookup pop.gmail.com
Server:  cisco
Address:  xxx.yyy.zzz.987

Non-authoritative answer:
Name:    gmail-pop.l.google.com
Addresses:  74.125.25.109, 74.125.25.108
Aliases:  pop.gmail.com

urchin 4% ping pop.gmail.com
ping: pop.gmail.com: Non-recoverable failure in name resolution

urchin 5% ping 74.125.25.108
PING gmail-pop.l.google.com (74.125.25.108): 56 data bytes
64 bytes from 74.125.25.108: icmp_seq=1 ttl=40 time=256.759 ms
64 bytes from 74.125.25.108: icmp_seq=3 ttl=40 time=244.324 ms
64 bytes from 74.125.25.108: icmp_seq=6 ttl=40 time=244.460 ms
64 bytes from 74.125.25.108: icmp_seq=9 ttl=40 time=252.992 ms
64 bytes from 74.125.25.108: icmp_seq=10 ttl=40 time=257.041 ms
64 bytes from 74.125.25.108: icmp_seq=11 ttl=40 time=261.515 ms

----gmail-pop.l.google.com PING Statistics----
12 packets transmitted, 6 packets received, 50.0% packet loss
round-trip min/avg/max = 244.324/252.849/261.515 ms

urchin 6% traceroute pop.gmail.com
traceroute to pop.gmail.com (74.125.25.109), 30 hops max, 60 byte packets
1  cisco (xxx.yyy.zzz.654)  0 ms  0 ms  0 ms
2  gateway (lll.mmm.nnn.oo)  2 ms  1 ms  1 ms
3  210.22.66.93  6 ms

urchin 7% traceroute pop.gmail.com
traceroute: pop.gmail.com: Non-recoverable failure in name resolution

urchin 9% traceroute 74.125.25.109
traceroute to 74.125.25.109 (74.125.25.109), 30 hops max, 60 byte packets
1  cisco (xxx.yyy.zzz.321)  1 ms  0 ms  0 ms
2  gateway (lll.mmm.nnn.ooo)  3 ms  2 ms  2 ms
3  * * *
4  * * 112.64.243.170  9 ms
5  * 112.64.243.101  4 ms  3 ms
6  219.158.4.97  27 ms  28 ms  29 ms
7  219.158.101.54  36 ms  35 ms  36 ms
8  219.158.101.74  27 ms  26 ms  26 ms
9  12.126.40.57  208 ms  209 ms  210 ms
10  12.122.136.58  212 ms  212 ms  215 ms
11  cr2.sffca.ip.att.net (12.123.15.249)  217 ms  217 ms  216 ms
12  12.122.136.181  211 ms  229 ms  229 ms
13  12.250.31.10  183 ms  183 ms  182 ms
14  216.239.49.168  181 ms  213 ms  192 ms
15  209.85.250.64  191 ms 209.85.250.60  187 ms 209.85.250.64  183 ms
16  72.14.232.63  274 ms  275 ms  274 ms
17  72.14.233.200  273 ms 72.14.233.202  281 ms 72.14.233.140  273 ms
18  64.233.174.99  275 ms 64.233.174.125  277 ms  281 ms
19  * * *
20  74.125.25.109  277 ms  274 ms  276 ms

The timestamps returned from everything past the gateway in the traceroute are a stone lie - you can sit and watch it take minutes to return on some of those hops.

There is a rumour that enforcing tcp-only on some well-known external dns servers will alleviate this problem. I know that just setting one's dns server settings to known-good servers does not help. The ip's are getting poisoned somewhere. Somewhere north that starts with a B and ends with a g .. Even worse, the poisoned entries eventually screw up the good ones and you have to clear all the cached ip's in the local dns server.

Using the real ip also does not help much. If you know the correct ip, that occasionally works but not usually. Most servers now host several sites on one ip and use the desired hostname to figure out which website will be returned. If you just use the ip you don't get the site you want.

But I'd like to know what is really happening. Too bad I'm not smart enough to figure this out on my own ... but Cisco was behind this and they're smarter than me. Thanks, assholes. Anything for a buck, eh ?


Anyway, first biggest thing I do not understand is, if you do an nslookup the ip is returned right away. Often it's even the correct one. However, a ping immediately thereafter can come back with "domain name not found." Umm, how is that done (and how to get around it :D ?

_________________
waiting for flight 1203 ...
first of all, is "urchin" unix?
foetz wrote:
first of all, is "urchin" unix?

Woof ! Woof ! Irix Dog Food is the most nutritious for your growing hound !
Code:
urchin 1% uname -aR
IRIX64 urchin 6.5 6.5.30m 07202013 IP35


There are also Windows and Solaris and Mackletosh "It's Yeewwwnix !" machines on the network awailable for testing if need be.

_________________
waiting for flight 1203 ...
hamei wrote:
This thing is driving me nuts. Either the people running dns here are total morons (possible) or they are very clever people doing it on purpose (also possible) or there are two groups equally responsible, the morons and the cunning bastards. Anyhow ...

Code:
urchin 3% nslookup pop.gmail.com
Server:  cisco
Address:  xxx.yyy.zzz.987

Non-authoritative answer:
Name:    gmail-pop.l.google.com
Addresses:  74.125.25.109, 74.125.25.108
Aliases:  pop.gmail.com

OK, you have configured a working DNS server in /etc/resolv.conf. Name resolution using DNS works...
Quote:
Code:
urchin 4% ping pop.gmail.com
ping: pop.gmail.com: Non-recoverable failure in name resolution


... but are you using DNS are a source for host name resolution? Other sources (LDAP, NIS, files, ...) exist too.

Your sources are configured in /etc/nsswitch.conf. This is how mine looks:
Code:
#
# This is the SGI default nsswitch.conf file.  This file determines
# the maps that will be maintained by nsd, which methods will be
# used to lookup information for a map, and what order the methods
# are called in.
#
# For details on this file see the nsswitch.conf(4) manual page.
#
# After editing this file the nsd daemon must be sent a SIGHUP signal for
# it to notice.  Do a "killall -HUP nsd".
#
automount(dynamic):     nis(nis_enumerate_key)
#bootparams:            files nis
capability:             files nis
clearance:              files nis
ethers:                 files nis
group:                  files nis
hosts:                  nis dns files
ipnodes:                files
mac:                    files nis
mail(null_extend_key):  ndbm(file=/etc/aliases) nis
netgroup:               nis
#netid.byname:          nis
networks:               files nis
passwd:                 files(compat) [notfound=return] nis
protocols:              nis [success=return] files
rpc:                    files nis
services:               files nis
shadow(mode=0700, nis_secure=1): files
#ypservers:             nis
jlimits:                mdbm nis

The crucial bit here is:
Code:
hosts:                  nis dns files

Host name resolution tries NIS first, then DNS, then files (/etc/hosts)
Most people can probably eliminate 'nis' here, and/or swap the order of the 'files' and 'dns' entries.

_________________
Now this is a deep dark secret, so everybody keep it quiet :)
It turns out that when reset, the WD33C93 defaults to a SCSI ID of 0, and it was simpler to leave it that way... -- Dave Olson, in comp.sys.sgi

Currently in commercial service: Image :Onyx2: (2x) :O3x02L:
In the museum : almost every MIPS/IRIX system.
Wanted : GM1 board for Professional Series GT graphics (030-0076-003, 030-0076-004)
jan-jaap wrote:
OK, you have configured a working DNS server in /etc/resolv.conf. Name resolution using DNS works...

Check
Quote:
... but are you using DNS are a source for host name resolution? Other sources (LDAP, NIS, files, ...) exist too.

files, dns

As little as possible in files, if the router/dns server does most of it I don't have to mess with hosts files all over the place.
Quote:
Your sources are configured in /etc/nsswitch.conf. This is how mine looks:

Almost exactly the same except :
Code:
hosts:                  nis dns files

for me is
Code:
hosts:                  files dns

Quote:
The crucial bit here is:
Code:
hosts:                  nis dns files

Host name resolution tries NIS first, then DNS, then files (/etc/hosts)

Check
Quote:
Most people can probably eliminate 'nis' here, and/or swap the order of the 'files' and 'dns' entries.

double check

Also ...
Code:
ip dns server

--More--

ip name-server 8.8.4.4
ip name-server 176.34.53.14
ip name-server 4.2.2.2

--More--

ip host cisco xxx.yyy.zzz.001
ip host gateway 222.333.444.99
ip host host_one xxx.yyy.zzz.002
ip host host_two xxx.yyy.zzz.003
ip host platform.twitter.com 127.0.0.1
ip host printer_one xxx.yyy.zzz.201
ip host printer_two xxx.yyy.zzz.202


Everything works lovely, then the whole dns thing falls into the pit, then an hour later it all works lovely, then it will quit again. No change whatever from me (although sometimes I'll get antsy and clear the cached host entries on the dns server, which seems to help. But that might be a coincidence.) Local and first-stop dns is done on the router (cisco). Lots of poisoned dns entries even tho I am not (supposedly) using their dns servers.

If I could figure out what the heck they are doing I could either get around it or give up. But just having it screw with me pisses me off and wastes a lot of time, too.

_________________
waiting for flight 1203 ...
Does this happen with other hosts, or just pop.gmail.com?

_________________
:Octane2: :Octane: :1600SW: (less is more?)
pierocks wrote:
Does this happen with other hosts, or just pop.gmail.com?

Many others but gmail is the only one I care about. I can live without youtube and facebook. In fact ....

I wouldn't care about gmail either but have too many old mails, addresses, people who have my info ... not going to change that just for the harmonica society.

I can't get pissed at them for messing with google, google is essentially shit and should be messed with. It just happens to inconvenince me :P

Admittedly, having all the dynamic ip's poisoned can be a pita sometimes also.

_________________
waiting for flight 1203 ...
Well, my only thought was that you were hitting different DNS servers with subsequent queries and one happened to be broken.

_________________
:Octane2: :Octane: :1600SW: (less is more?)
hamei wrote:
Everything works lovely, then the whole dns thing falls into the pit, then an hour later it all works lovely, then it will quit again. No change whatever from me

that sounds pretty much like an external issue. how about adding more servers to your resolv.conf?
"cisco" routes your subnet to "gateway" and runs a caching-only nameserver? What nameservers does cisco query when you ask it something that's not in its cache?

_________________
Project:
Movin' on up, toooo the east side
Plan:
World domination! Or something...
vishnu wrote:
"cisco" routes your subnet to "gateway" and runs a caching-only nameserver? What nameservers does cisco query when you ask it something that's not in its cache?

Excactly ! But I don't think that cisco is getting good ip's even though it is supposed to be querying safe upstream nameservers. I have tried several : currently one is a google server (contributing to the Evil Empire, I should be ashamed of myself), one is maybe OpenDNS ? and the last is 4.2.2.2, the Universal DNS Server.

I can and have changed them around tho, with no real improvement. How can I tell whence the internet addresses are really originating ? One thing that throws me off is the < nslookup >, <ping hostname : "can't find hostname"> sequence. If I do an nslookup and the cisco returns an ip, why cannot Mr O350 find that ip five seconds later for a ping ?

There is a rumor that enforcing tcp queries rather than allowing udp requests fixes this problem but I am skeptical .... any thoughts on that ?

Yes, personal vpn's have been useful in the past but they are problematic at this time. Nor do I think IBM will give me a corporate vpn account so I can get my mail easily :)

_________________
waiting for flight 1203 ...
UDP is reputed to be more error prone because it's "fire and forget" whereas TCP is "self healing," so there might be something to that. Presumably the google nameserver that you're querying is in the PRC, and thus might have been tinkered with by the Central Committee? I thought google bailed from the PRC after they got all that bad publicity for caving to the CC's demand for censoring. What version of BIND are you using? I run a caching-only nameserver on my firewall with bind-9.9.1_P3 which queries OpenDNS upstream and it works perfectly, but then I'm in the heart of the good 'ol USA...

_________________
Project:
Movin' on up, toooo the east side
Plan:
World domination! Or something...
Silly test you can do: remove all but one nameserver. Test. If it does work fine, replace by other nameserver. Test again. Repeat for all nameservers you use. Pinpointing the misbehaving one is what I'd attempt first.

Unless multiple ones are misbehaving, and then you're in for a fight. :D

_________________
while (!asleep()) sheep++;
What does 8.8.8.8 have to say?

Code:
air:~ $ nslookup 8.8.8.8
Server:      8.8.8.8
Address:   8.8.8.8#53

Non-authoritative answer:
8.8.8.8.in-addr.arpa   name = google-public-dns-a.google.com.

Authoritative answers can be found from:

_________________
:Indy: :Indigo2IMP: :Octane: :Indy: 4xRS6K 2xHP9K 6xSUN 1xDEC 14xMAC 7xPC 2xPS2
Sorry to hear the dragon of misfortune has visited you, hamei.

Personally I would not test with an IRIX system, nsd is a beast of black magic that previously has interfered with my happiness. Is some foul demon eating your udp packets, though? Looking at the man scroll, it seems that IRIX' traceroute uses ICMP packets instead of the traditional UDP, so working replies might not be an indication that it's limited to DNS queries.

Look, now you made me write like you do :-P

_________________
:Octane: halo , oct ane
N.B.: I tend to talk out of my ass. Do not take it too seriously.
duck wrote:
it seems that IRIX' traceroute uses ICMP packets instead of the traditional UDP


Since when has anything but ICMP been "traditional" for traceroute?!

_________________
:OnyxR: :IRIS3130: :IRIS2400: :Onyx: :ChallengeL: :4D220VGX: :Indigo: :Octane: :Cube: :Indigo2IMP: :Indigo2: :Indy:
Van Jacobsen's 1988 traceroute used UDP probes.
http://www.kohala.com/start/papers.othe ... 9feb08.txt

_________________
:PI: :O2: :Indigo2IMP: :Indigo2IMP:
you're right, I got confused about where the TTL field that traceroute needs to work actually comes from.

_________________
:OnyxR: :IRIS3130: :IRIS2400: :Onyx: :ChallengeL: :4D220VGX: :Indigo: :Octane: :Cube: :Indigo2IMP: :Indigo2: :Indy:
duck wrote:
Sorry to hear the dragon of misfortune has visited you, hamei.

More like the dragon of exasperation :) They do this on porpoise and the solutions are worse than the problem.

Google I don't care about that much. Yes, it's an okay search engine but way too much baggage and the searches themselves are becoming less and less relevant.

But the mail is annoying. If you run a mail server from China, most of your mail gets bounced. "It's from Chiiina, must be spam !! Pita.

So you give up and use gmail. What happens then ? "You use google, we're going to fuck with you, hrr hrr hrr."

Can't win for losing.

Quote:
Personally I would not test with an IRIX system, nsd is a beast of black magic that previously has interfered with my happiness.

It is strange that a Windows computer on the network acts differently. Not correct, just differently bad.

While an iPad, going through the same router, had no problems. That day, at least.

The intent is Microsoftian : make it so miserable to use that no one will. And I can understand that, but it's an annoyance.

Quote:
Is some foul demon eating your udp packets, though? Looking at the man scroll, it seems that IRIX' traceroute uses ICMP packets instead of the traditional UDP, so working replies might not be an indication that it's limited to DNS queries.

After some frustrating research, it looks like this is maybe what is happening :

I can use the dns server I want. Normally that works correctly. This is designed to fool us proletarian suckers. If you grab the Spinrite tools (yes kids ! We will be testing our DNS today ! DNS, as you know, is one of the main building blocks of the Internet ! ... Steve Gibson hasn't changed one iota in thirty years, bless his pointy little head) and also the Google name server tools seems to work correctly. The google tool mentions all the poisoned dns tho ... how can that happen ? they are getting returns from their own name servers ! how ?

Maybe Cisco IDS. If you look at a traceroute there's one huge jump right before it leaves the country (and travels halfway around the world for fun) :
Code:
urchin 48% traceroute hobbes.nmsu.edu
traceroute to hobbes.nmsu.edu (128.123.34.6), 30 hops max, 60 byte packets
1  ceesco (xxx.yyy.zzz.333)  1 ms  0 ms  0 ms
2  gateway (254.143.032.555)  2 ms  1 ms  1 ms
3  210.22.66.93  6 ms  3 ms  3 ms
4  112.64.243.170  5 ms *  34 ms
5  139.226.193.53  100 ms  100 ms  102 ms
6  219.158.21.241  100 ms  98 ms  97 ms
7  219.158.5.126  45 ms  48 ms  45 ms
8  219.158.97.62  35 ms  41 ms  42 ms
9  219.158.29.218  334 ms  335 ms  334 ms
10  sjp-brdr-04.inet.qwest.net (63.146.27.85)  328 ms  324 ms  328 ms
11  abq-edge-06.inet.qwest.net (205.171.151.74)  370 ms * *
12  63.225.11.106  367 ms  366 ms  363 ms
13  206.206.145.49  363 ms  371 ms  381 ms
14  192.65.78.198  362 ms  366 ms  369 ms
15  dc-gate-vlan100.nmsu.edu (128.123.100.35)  374 ms  372 ms  370 ms
16  hobbes.nmsu.edu (128.123.34.6)  379 ms  377 ms  375 ms

If you use the Chinese name servers, they are close by but pre-poisoned. So you go for one in the Free World. The theory is, you think you are using the name server that you ask for. And you are. But when you hit a keyword, the friendly neighborhood Cisco Intrusion Detection System somewhere around 219.158.29.218 bounces back a bad address to your requesting server, even as it lets the query continue happily down the road. Or ditches it into the drink, I know not which. The requesting host accepts the first piece of junk that comes back, naturally, and off we go to the wrong place.

Correct me on this but it seems that if you force tcp rather than udp, would the requesting host wait for the correct address rather than assuming the first thing that just came back via udp was the right answer ? Are there any functional secure public dns servers that encrypt or otherwise protect the receiver from poisoned dns ?

There's a software project for someone - an encrypted dns server running on a non-standard secret port ... hmm.

What was confusing was that it seemed like I was using a good name server instead of the crap China Telecom tries to feed you. And I was ... but they can bounce back wrong answers even if you are using a different dns server. Without it being obvious, either.

That still doesn't answer another mystery tho - when this happens (I believe it occurs when they are messing with their poisoning system and things get out of wack) I can (sometimes) do an < nslookup hostname > and get a cached result back instantly. But an immediate following < ping hostname > returns "Host not found" or something to that effect. Happens from both Irix and Windows. WTF is the story here ?

The Internet in China is a never-ending adventure :(

Quote:
Look, now you made me write like you do :-P

I'll tell my Mom she scored another victory :D

_________________
waiting for flight 1203 ...
just guessing, but nslookup uses its own internal resolver, not the system resolver library. so it's sometimes the case that lookups happen differently (and not surprising that lookups done by nslookup don't enter the system's address cache)

_________________
:PI: :O2: :Indigo2IMP: :Indigo2IMP: