[SOLVED] Unusual Urouter error on RedHat Enterprise 6.5
Author: gianni.sandigliano@unifacesolutions.com (gianni)
A Uappl was ported from RedHat 4.2 32bit to RedHat 6.5 64bit. Oldest Uversion was: 9.2 Newest Uversion is: 9.6.04.X402 This Uappl is delivering Uniface Services through SOAP as WebServices; frontend application is Java based and its server is configured on the same node where Uniface is also installed. Everything is working, here an example of a working session: 61:32.222.34 t=1823463168: accepted new connection on TCP:+13001 61:32.222.48 t=1031780096: From Client:chn=6;len=135: CLTCON; 61:32.222.53 t=1031780096: clt=(hst=172.16.3.134,trt-lnx-app03.interna.regio.it;pid=0;tid=0;sid=0;usr=nobody;ust=) 61:32.222.55 t=1031780096: log=(hst=TCP:trt-lnx-app03.interna.regio.it+13001;usr=usys;ust=LISRV) 61:32.222.57 t=1031780096: reguser: nid=172.16.3.134, node=trt-lnx-app03.interna.regio.it, pid=0, ust= 61:32.264.11 t=1031780096: To Client:chn=6;len=2: CONANS; continue: From time to time, usually in the middle of the morning, when more users are using those services an error -21, it should be a <NetworkLoginError>, is generated at urouter level. This is what urouter log file is reporting: 61:34.954.38 t=1823463168: accepted new connection on TCP:+13001 61:34.954.52 t=1031780096: From Client:chn=6;len=135: CLTCON; 61:34.954.56 t=1031780096: clt=(hst=172.16.3.134,trt-lnx-app03.interna.regio.it;pid=0;tid=0;sid=0;usr=nobody;ust=) 61:34.954.58 t=1031780096: log=(hst=TCP:trt-lnx-app03.interna.regio.it+13001;usr=usys;ust=LISRV) 61:34.954.62 t=1031780096: reguser: nid=172.16.3.134, node=trt-lnx-app03.interna.regio.it, pid=0, ust= 61:36.967.82 t=1031780096: [Mon Nov 3 14:09:34 2014] err=-21: cretpsv: Authentication of user/password failed for user usys 61:36.967.87 t=1031780096: To Client:chn=6;len=70: CONANS; Error=-21: From then on every client get same answer: 61:39.632.95 t=1823463168: accepted new connection on TCP:+13001 61:43.795.92 t=1823463168: accepted new connection on TCP:+13001 61:46.884.77 t=1823463168: accepted new connection on TCP:+13001 61:58.710.17 t=1823463168: accepted new connection on TCP:+13001 62:02.138.85 t=1823463168: accepted new connection on TCP:+13001 62:20.956.70 t=1823463168: accepted new connection on TCP:+13001 62:33.163.03 t=1823463168: accepted new connection on TCP:+13001 63:18.532.76 t=1823463168: accepted new connection on TCP:+13001 63:18.564.72 t=1823463168: accepted new connection on TCP:+13001 63:19.880.47 t=1823463168: accepted new connection on TCP:+13001 63:25.753.78 t=1823463168: accepted new connection on TCP:+13001 63:25.782.29 t=1823463168: accepted new connection on TCP:+13001 63:28.425.74 t=1823463168: accepted new connection on TCP:+13001 63:35.509.25 t=1823463168: accepted new connection on TCP:+13001 64:19.685.94 t=1823463168: accepted new connection on TCP:+13001 Until all active servers are restarted (in this case because idle, in other cases because a biped manual restart): 64:34.607.88 t=1823463168: accepted new connection on TCP:+13001 64:34.608.05 t=1031780096: From Client:chn=6;len=135: CLTCON; 64:34.608.09 t=1031780096: clt=(hst=172.16.3.134,trt-lnx-app03.interna.regio.it;pid=0;tid=0;sid=0;usr=nobody;ust=) 64:34.608.11 t=1031780096: log=(hst=TCP:trt-lnx-app03.interna.regio.it+13001;usr=usys;ust=LISRV) 64:34.608.13 t=1031780096: reguser: nid=172.16.3.134, node=trt-lnx-app03.interna.regio.it, pid=0, ust= 64:34.608.23 t=1031780096: [Mon Nov 3 14:12:32 2014] err=-21: cretpsv: Authentication of user/password failed for user usys 64:34.608.25 t=1031780096: To Client:chn=6;len=70: CONANS; Error=-21: 65:18.051.44 t=1835534112: Stopping server sid=20; shut=1 mode=normal 65:18.051.50 t=1835534112: Reason for stop: Server timed out after 300 seconds (max=300) 65:18.051.52 t=1835534112: To Server:chn=1000;len=6: SRVSHUT; It seems urouter is not anymore able neither to talk to current alive uservers neither to start new ones until last userver is closed; after this point everything is working again... While we are applying latest 9.6.05 patches (up to X505) and collecting more infos, is there anyone having suggestions, hints or any clue on reasons for this BAD behaviour? Thanks for any answers!
6 Comments
Local Administrator
I reported wrong Uversion on new platform! the correct one is 9.6.04.X401 I've just found these two bugs solved in Service pack mx04 updating from 9.6.04 to 9.6.05. BUG 30524: Urouter creating a second urouter process BUG 30560: Urouter needs to have a dedicated thread for username/password validations. The first one is already installed (part of X401) while the second one NO! We'll giving it a try...
Author: gianni (gianni.sandigliano@unifacesolutions.com)
Local Administrator
I am not sure what exactly your issue is, but don't believe either of these two fixes would solve it. What I see here is an authentication failure for one userver, and another userver timing out after 300 seconds. None of this is an error. Cheers, Chris
Author: Chris Breemer (chris.breemer@uniface.com)
Local Administrator
Chris, thanks for your reply! The new machine unfortunately went down because of other software and hardware problems; the customer was switched back to the older server. When the machine will be back fully updated we will continue to dig...
Author: gianni (gianni.sandigliano@unifacesolutions.com)
Local Administrator
Back again to this issue...now on latest patch of 9605... External behaviour: from time to time Uniface urouter/userver delivering webservices does not answer to external calls (from a Java program) for a short amount of time; after a while everything is working back again. I digged into log files for a while getting out 3 potential problems: 1) TCP error [24]: Too many open files 2) Failed to start (len=191) /home/usys/uniface9604/common/bin/userver -srvid=27 -dnp=TCP:+13001||06F416A6-E1D0-11E4-B2F7-D60C3E25F06A| -drv=ANY -ust=lisrv -dir=/home/usys/uniface9604/gest -asn=/opt/labinf/asn96/lisrv.asn 3) err=-21: Authenticate: Authentication of user/password failed for user usys ...and 1 hint: A) When last userver is going down on timeout everything restart to work properly My actual got feel is error #3 is "probably" depending from errors #1 and #2 For error #1 I've (probably) found a solution changing some kernel parameters (More infos on this later...) Any clue / hint / suggestion / help on error #2 ??? Thanks in advance for any help! Ciao, Gianni
Author: gianni (gianni.sandigliano@unifacesolutions.com)
Local Administrator
Hi all, Finger crossed I do not see anymore error #1 after two hours having changed some kernel params...hope it will be definitively gone! Still looking for a solution for error #2; at this moment focusing on kernel and glibc versions because strange behaviours are reported with different combinations of these two very core linux packages...it seems the only solution is to migrate to RedHat Enterprise V7.1 including glibc >= 2.18... If any clue / hint / suggestion / help on error #2 is coming it is appreciated...a lot! Ciao, Gianni
Author: gianni (gianni.sandigliano@unifacesolutions.com)
Local Administrator
[SOLVED] Solutions: Issue #1: TCP error [24]: Too many open files
It depends from too low maximum number of open files at kernel level; taking a simplified approach there are 2 parameters to change: Step 1 - maximum number of opened files at system level: $ sudo vi /etc/sysctl.conf add:
net.core.somaxconn=131072 fs.file-max=131072 exec following comand to have your kernel considering modifications:
$ sudo sysctl -p Step 2 - maximum number of opened files at user session level: $ sudo vi /etc/security/limits.conf add:
* soft nofile 65535 * hard nofile 65535 Step 3 - maximum number of opened files as constant at kernel compile level: $ sudo vi /usr/include/linux/limits.h change:
NR_OPEN = 65536
Step #1 and #2 are mandatory; step #3 is optional.
Issue #2 - Failed to start (len=191) /home/usys/uniface9604/common/bin/userver -srvid=27 -dnp=TCP:+13001||06F416A6-E1D0-11E4-B2F7-D60C3E25F06A| -drv=ANY -ust=lisrv -dir=/home/usys/uniface9604/gest -asn=/opt/labinf/asn96/lisrv.asn
It depends from a bug in glibc; upgrade your glibc to version >=2.18 or to latest avalable for currently supported RedHat Enterprise and derivative.
Consider I didn't change ANYTHING on Uniface!!!
Gianni
Author: gianni (gianni.sandigliano@unifacesolutions.com)