Oct 222013
 

I had a Bacula job fail today:

22-Oct 20:08 nyi-fd JobId 147614: Warning: bsock.c:128 Could not connect to Storage daemon on crey.example.org:9103. ERR=Operation timed out
Retrying ...
22-Oct 20:35 nyi-fd JobId 147614: Fatal error: bsock.c:134 Unable to connect to Storage daemon on crey.example.org:9103. ERR=Interrupted system call
22-Oct 20:35 nyi-fd JobId 147614: Fatal error: Failed to connect to Storage daemon: crey.example.org:9103
22-Oct 20:35 bacula-dir JobId 147614: Fatal error: Bad response to Storage command: wanted 2000 OK storage, got 2902 Bad storage

Is bacula-sd running on crey? Yes it is. Can I telnet to port 9103 on crey? Yes, I can:

$ telnet 10.5.0.20 9103
Trying 10.5.0.20...
Connected to crey.example.org.
Escape character is '^]'.

What about from the nyi-fd server? Can I telnet from there?

$ telnet 10.5.0.20 9103
Trying 10.5.0.20...
telnet: connect to address 10.5.0.20: Operation timed out
telnet: Unable to connect to remote host

I started tcpdump on the gateway, and on the 10.5.0.20 host. Things just weren’t getting through. On the host which could not be reached, I saw:

20:53:23.777116 ARP, Request who-has 10.4.2.20 tell 10.5.0.20, length 46
20:53:24.778458 IP 10.4.2.20 > 10.5.0.10: ICMP echo request, id 43677, seq 99, length 64
20:53:24.778582 ARP, Request who-has 10.8.1.20 tell 10.5.0.20, length 46
20:53:25.779665 IP 10.4.2.20 > 10.5.0.10: ICMP echo request, id 43677, seq 100, length 64
20:53:25.779777 ARP, Request who-has 10.8.1.20 tell 10.5.0.20, length 46

I had no idea…

Hmm. Well. It took me a while, but I finally remembered adding an alias to the NIC on the jail host for crey. It looked like this:

em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO>
        ether 00:25:90:82:21:5a
        inet 10.5.0.74 netmask 0xffffff00 broadcast 10.5.0.255
        inet6 fe80::225:90ff:fe82:215a%em0 prefixlen 64 scopeid 0x1
        inet 10.5.0.10 netmask 0xffffffff broadcast 10.5.0.10
        inet 10.5.0.102 netmask 0xffffffff broadcast 10.5.0.102
        inet 10.5.0.111 netmask 0xffffffff broadcast 10.5.0.111
        inet 10.5.0.104 netmask 0xffffffff broadcast 10.5.0.104
        inet 10.5.0.105 netmask 0xffffffff broadcast 10.5.0.105
        inet 10.5.0.110 netmask 0xffffffff broadcast 10.5.0.110
        inet 10.5.0.114 netmask 0xffffffff broadcast 10.5.0.114
        inet 10.5.0.20 netmask 0xffffffff broadcast 10.5.0.20
        inet 10.5.0.140 netmask 0xffffffff broadcast 10.5.0.140
        inet 10.5.0.112 netmask 0xffffffff broadcast 10.5.0.112
        inet 10.5.0.13 netmask 0xffffffff broadcast 10.5.0.13
        inet 10.5.0.14 netmask 0xffffffff broadcast 10.5.0.14
        inet 10.5.0.15 netmask 0xffffffff broadcast 10.5.0.15
        inet 10.5.0.106 netmask 0xffffffff broadcast 10.5.0.106
        inet 10.5.0.127 netmask 0xff000000 broadcast 255.255.255.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active

Look at line 20. See how the netmask differs from the others? That’s the cause. I had added the alias with this command:

ifconfig em0 alias 10.5.0.127 255.255.255.255

When I should have used this command:

ifconfig em0 alias 10.5.0.127 netmask 255.255.255.255

I removed the faulty alias with this command:

ifconfig em0 delete 10.5.0.127

And then issued the correct command. Everything ran fine then.

I am positive I encountered this problem before…

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive