PXE Boot -> NFS hangs [solved]

Questions related to network booting via PXE
ConiKost
Posts: 9
Joined: 14 Jan 2011, 18:00

PXE Boot -> NFS hangs [solved]

Postby ConiKost » 13 Oct 2012, 18:20

Hi!
I've setup the Sysresccd v3.0.0 for PXE Boot via DNSMasq.
Command prompt is: docache dodhcp setkmap=de nfsboot=192.168.23.1:/srv/systemrescuecd rootpass=1234

The last output is:

>> Mounting the NFS filesystem from nfs://192.168.23.1:/srv/systemrescuecd
>> Successfully mounted the NFS filesystem

Thats it :( After this, its just hangs and nothing happens anymore.. The NFS seems to be correct, as mounting it with an ordinary ubuntu client works fine. Any ideas, what could be wrong?

gernot
Posts: 1127
Joined: 07 Apr 2010, 16:19

Re: PXE Boot -> NFS hangs

Postby gernot » 13 Oct 2012, 18:46

Next message after
"Successfully mounted the NFS filesystem"
is
"Successfully checked md5 sum of ${BOOTPATH}/${LOOPDAT}""

Check that the md5sum calculation of sysrcd.dat is not in progress.
With slow connections and docache booting via html is the better option.

Gernot
p.s. you dont need "dodhcp".

ConiKost
Posts: 9
Joined: 14 Jan 2011, 18:00

Re: PXE Boot -> NFS hangs

Postby ConiKost » 13 Oct 2012, 19:06

The connection should not be slow, as its my lan and gigabit..

gernot
Posts: 1127
Joined: 07 Apr 2010, 16:19

Re: PXE Boot -> NFS hangs

Postby gernot » 13 Oct 2012, 19:25

can you check the progress of communication?
Maybe with wireshark or tcpdump.

I think there is something blocking.

Gernot

ConiKost
Posts: 9
Joined: 14 Jan 2011, 18:00

Re: PXE Boot -> NFS hangs

Postby ConiKost » 14 Oct 2012, 01:20

Which things should I monitor with tcpdump?

gernot
Posts: 1127
Joined: 07 Apr 2010, 16:19

Re: PXE Boot -> NFS hangs

Postby gernot » 14 Oct 2012, 07:05

Check that the client open sysrcd.dat.
Then check that the data packages are sent to the client and receipted by him.
If the same package is repeated over long time without a response something is blocked.

sysrcd use UDP to load files via NFS. It takes 4 minutes on my system with direct connection.

Gernot

ConiKost
Posts: 9
Joined: 14 Jan 2011, 18:00

Re: PXE Boot -> NFS hangs

Postby ConiKost » 14 Oct 2012, 15:17

Hi!
Ok, I've captured the whole start from PXE up to where it hangs.

At the end i can see a lot of:

Code: Select all

31961   47.262403   Galactica   Erasmus   IPv4   9210   Fragmented IP protocol (proto=UDP 17, off=0, ID=9568) [Reassembled in #31962]
31956   43.052187   Galactica   Erasmus   NFS   7378   [RPC duplicate of #31941]V3 READ Reply (Call In 31939) Len:16384
31951   42.352152   Erasmus   Galactica   NFS   162   [RPC retransmission of #31939]V3 READ Call (Reply In 31941), FH:0xbf5347f4 Offset:0 Len:16384


I don't know, why :( Do you want the complete tcpdump, to look into?

ConiKost
Posts: 9
Joined: 14 Jan 2011, 18:00

Re: PXE Boot -> NFS hangs

Postby ConiKost » 14 Oct 2012, 15:24

I found something.
When I set my MTU on my PXE-Server back to 1500, its works fine.
When its > 1500 (using 9200), that my problem accours. Why? :(

gernot
Posts: 1127
Joined: 07 Apr 2010, 16:19

Re: PXE Boot -> NFS hangs

Postby gernot » 15 Oct 2012, 04:22

Ethernet allows only a maximum MTU of 1500.
Looks like your jumbo package get fragmented and not assembled again.

Gernot

ConiKost
Posts: 9
Joined: 14 Jan 2011, 18:00

Re: PXE Boot -> NFS hangs

Postby ConiKost » 15 Oct 2012, 06:49

Well, is there anything I can do?
I mean, everything works fine with jumbo frames.. I can mount nfs via systemrescuecd and other linux distros fine. The only scenario, where this does not work is the pxe boot of systemrescuecd :/

gernot
Posts: 1127
Joined: 07 Apr 2010, 16:19

Re: PXE Boot -> NFS hangs

Postby gernot » 15 Oct 2012, 17:20

No idea, but check how working systems boot via jumbo frame.

Gernot

helamonster
Posts: 4
Joined: 13 Aug 2008, 21:55
Contact:

Re: PXE Boot -> NFS hangs

Postby helamonster » 24 Jan 2013, 18:35

I recently encountered this same problem (hangs on md5sum check, after "Successfully mounted the NFS filesystem" message.
However, both my NFS server and the client machine are using an MTU of 1500. They are both on the same LAN. I can't tell if they are on the same switch or not, but there would only be one other switch between them and they shouldn't be doing anything funny (unmanaged).

After looking at the network traffic, it appears the NFS session stalls after a second and then starts receiving very slowly. I am not yet sure why this happens. In another network with similar server/client hardware, everything works fine with the default 32768. It could be due to the network switch(es) between them doing something funny. I will have to investigate that further. For now, I just wanted to get it working...

After more looking, it seems that the default NFS wsize and rsize used when mounting NFS from sysrcd is 32768. A little research indicated that this high value is known to cause problems in some environments. From the NFS docs:
Using an rsize or wsize larger than your network's MTU (often set to 1500, in many networks) will cause IP packet fragmentation when using NFS over UDP. IP packet fragmentation and reassembly require a significant amount of CPU resource at both ends of a network connection. In addition, packet fragmentation also exposes your network traffic to greater unreliability, since a complete RPC request must be retransmitted if a UDP packet fragment is dropped for any reason. Any increase of RPC retransmissions, along with the possibility of increased timeouts, are the single worst impediment to performance for NFS over UDP.


After reducing these values to 1024, I was able to successfully mount the NFS folder, perform the md5sum check, and fully boot sysrcd via PXE/NFS without any problem.

For those curious: I did this by modifying the "init" file in initram.igz to mount with a wsize and rsize of 1024.
Extract the initram.igz file:

Code: Select all

mkdir temp
cd temp
cat ../initram.igz | xz -d | cpio -id

Within the sysresccd_stage1_nfs function of the "init" file, change:

Code: Select all

cmd="mount -t nfs -o intr,nolock ${nfsurl} ${BOOTPATH}"

to:

Code: Select all

cmd="mount -t nfs -o intr,nolock,rsize=1024,wsize=1024 ${nfsurl} ${BOOTPATH}"

Recreate the initram.igz file:

Code: Select all

find . | cpio -o -H newc| lzma > ../newinitram.igz


Afterwards, I found that a wsize/rsize value of 4096 works as well (and is a little faster). A value of 8192 or higher re-intoroduced the original problem.

Perhaps this default should be reduced for maximum compatibility in sysrcd PXE/NFS booting?

admin
Site Admin
Posts: 2715
Joined: 17 Jul 2003, 09:44

Re: PXE Boot -> NFS hangs

Postby admin » 24 Jan 2013, 20:47

Thanks for your contribution, this will be in SystemRescueCd-3.3.0-beta017.


Return to “Network Boot via PXE”

Who is online

Users browsing this forum: No registered users and 2 guests