[SOLVED] Unusual Urouter error on RedHat Enterprise 6.5

Author: gianni.sandigliano@unifacesolutions.com (gianni)

A Uappl was ported from RedHat 4.2 32bit to RedHat 6.5 64bit. Oldest Uversion was: 9.2 Newest Uversion is: 9.6.04.X402 This Uappl is delivering Uniface Services through SOAP as WebServices; frontend application is Java based and its server is configured on the same node where Uniface is also installed. Everything is working, here an example of a working session:  61:32.222.34 t=1823463168: accepted new connection on TCP:+13001  61:32.222.48 t=1031780096: From Client:chn=6;len=135: CLTCON;  61:32.222.53 t=1031780096:     clt=(hst=172.16.3.134,trt-lnx-app03.interna.regio.it;pid=0;tid=0;sid=0;usr=nobody;ust=)  61:32.222.55 t=1031780096:     log=(hst=TCP:trt-lnx-app03.interna.regio.it+13001;usr=usys;ust=LISRV)  61:32.222.57 t=1031780096: reguser: nid=172.16.3.134, node=trt-lnx-app03.interna.regio.it, pid=0, ust=  61:32.264.11 t=1031780096: To Client:chn=6;len=2: CONANS; continue: From time to time, usually in the middle of the morning, when more users are using those services an error -21, it should be a <NetworkLoginError>, is generated at urouter level. This is what urouter log file is reporting:  61:34.954.38 t=1823463168: accepted new connection on TCP:+13001  61:34.954.52 t=1031780096: From Client:chn=6;len=135: CLTCON;  61:34.954.56 t=1031780096:     clt=(hst=172.16.3.134,trt-lnx-app03.interna.regio.it;pid=0;tid=0;sid=0;usr=nobody;ust=)  61:34.954.58 t=1031780096:     log=(hst=TCP:trt-lnx-app03.interna.regio.it+13001;usr=usys;ust=LISRV)  61:34.954.62 t=1031780096: reguser: nid=172.16.3.134, node=trt-lnx-app03.interna.regio.it, pid=0, ust=  61:36.967.82 t=1031780096: [Mon Nov  3 14:09:34 2014] err=-21: cretpsv: Authentication of user/password failed for user usys  61:36.967.87 t=1031780096: To Client:chn=6;len=70: CONANS; Error=-21: From then on every client get same answer:  61:39.632.95 t=1823463168: accepted new connection on TCP:+13001  61:43.795.92 t=1823463168: accepted new connection on TCP:+13001  61:46.884.77 t=1823463168: accepted new connection on TCP:+13001  61:58.710.17 t=1823463168: accepted new connection on TCP:+13001  62:02.138.85 t=1823463168: accepted new connection on TCP:+13001  62:20.956.70 t=1823463168: accepted new connection on TCP:+13001  62:33.163.03 t=1823463168: accepted new connection on TCP:+13001  63:18.532.76 t=1823463168: accepted new connection on TCP:+13001  63:18.564.72 t=1823463168: accepted new connection on TCP:+13001  63:19.880.47 t=1823463168: accepted new connection on TCP:+13001  63:25.753.78 t=1823463168: accepted new connection on TCP:+13001  63:25.782.29 t=1823463168: accepted new connection on TCP:+13001  63:28.425.74 t=1823463168: accepted new connection on TCP:+13001  63:35.509.25 t=1823463168: accepted new connection on TCP:+13001  64:19.685.94 t=1823463168: accepted new connection on TCP:+13001 Until all active servers are restarted (in this case because idle, in other cases because a biped manual restart):  64:34.607.88 t=1823463168: accepted new connection on TCP:+13001  64:34.608.05 t=1031780096: From Client:chn=6;len=135: CLTCON;  64:34.608.09 t=1031780096:     clt=(hst=172.16.3.134,trt-lnx-app03.interna.regio.it;pid=0;tid=0;sid=0;usr=nobody;ust=)  64:34.608.11 t=1031780096:     log=(hst=TCP:trt-lnx-app03.interna.regio.it+13001;usr=usys;ust=LISRV)  64:34.608.13 t=1031780096: reguser: nid=172.16.3.134, node=trt-lnx-app03.interna.regio.it, pid=0, ust=  64:34.608.23 t=1031780096: [Mon Nov  3 14:12:32 2014] err=-21: cretpsv: Authentication of user/password failed for user usys  64:34.608.25 t=1031780096: To Client:chn=6;len=70: CONANS; Error=-21:  65:18.051.44 t=1835534112: Stopping server sid=20; shut=1 mode=normal  65:18.051.50 t=1835534112: Reason for stop: Server timed out after 300 seconds (max=300)  65:18.051.52 t=1835534112: To Server:chn=1000;len=6: SRVSHUT; It seems urouter is not anymore able neither to talk to current alive uservers neither to start new ones until last userver is closed; after this point everything is working again... While we are applying latest 9.6.05 patches (up to X505) and collecting more infos, is there anyone having suggestions, hints or any clue on reasons for this BAD behaviour? Thanks for any answers!

6 Comments

  1. I reported wrong Uversion on new platform! the correct one is 9.6.04.X401 I've just found these two bugs solved in Service pack mx04 updating from 9.6.04 to 9.6.05. BUG 30524:  Urouter creating a second urouter process BUG 30560:  Urouter needs to have a dedicated thread for username/password validations. The first one is already installed (part of X401) while the second one NO! We'll giving it a try...


    Author: gianni (gianni.sandigliano@unifacesolutions.com)
  2. I am not sure what exactly your issue is, but don't believe either of these two fixes would solve it. What I see here is an authentication failure for one userver, and another userver timing out after 300 seconds. None of this is an error. Cheers, Chris


    Author: Chris Breemer (chris.breemer@uniface.com)
  3. Chris, thanks for your reply! The new machine unfortunately went down because of other software and hardware problems; the customer was switched back to the older server. When the machine will be back fully updated we will continue to dig...


    Author: gianni (gianni.sandigliano@unifacesolutions.com)
  4. Back again to this issue...now on latest patch of 9605... External behaviour: from time to time Uniface urouter/userver delivering webservices does not answer to external calls (from a Java program) for a short amount of time; after a while everything is working back again. I digged into log files for a while getting out 3 potential problems: 1) TCP error [24]: Too many open files 2) Failed to start (len=191) /home/usys/uniface9604/common/bin/userver -srvid=27 -dnp=TCP:+13001||06F416A6-E1D0-11E4-B2F7-D60C3E25F06A| -drv=ANY -ust=lisrv -dir=/home/usys/uniface9604/gest -asn=/opt/labinf/asn96/lisrv.asn 3) err=-21: Authenticate: Authentication of user/password failed for user usys ...and 1 hint: A) When last userver is going down on timeout everything restart to work properly My actual got feel is error #3 is "probably" depending from errors #1 and #2 For error #1 I've (probably) found a solution changing some kernel parameters (More infos on this later...) Any clue / hint / suggestion / help on error #2 ??? Thanks in advance for any help! Ciao, Gianni


    Author: gianni (gianni.sandigliano@unifacesolutions.com)
  5. Hi all, Finger crossed I do not see anymore error #1 after two hours having changed some kernel params...hope it will be definitively gone! Still looking for a solution for error #2; at this moment focusing on kernel and glibc versions because strange behaviours are reported with different combinations of these two very core linux packages...it seems the only solution is to migrate to RedHat Enterprise V7.1 including glibc >= 2.18... If any clue / hint / suggestion / help on error #2 is coming it is appreciated...a lot! Ciao, Gianni


    Author: gianni (gianni.sandigliano@unifacesolutions.com)
  6. [SOLVED] Solutions: Issue #1: TCP error [24]: Too many open files

    It depends from too low maximum number of open files at kernel level; taking a simplified approach there are 2 parameters to change: Step 1 - maximum number of opened files at system level:     $ sudo vi /etc/sysctl.conf add:     net.core.somaxconn=131072   fs.file-max=131072 exec following comand to have your kernel considering modifications:     $ sudo sysctl -p Step 2 - maximum number of opened files at user session level:    $ sudo vi /etc/security/limits.conf add:     * soft nofile 65535   * hard nofile 65535 Step 3 - maximum number of opened files as constant at kernel compile level:    $ sudo vi /usr/include/linux/limits.h change: NR_OPEN = 65536

    Step #1 and #2 are mandatory; step #3 is optional.

    Issue #2 - Failed to start (len=191) /home/usys/uniface9604/common/bin/userver -srvid=27 -dnp=TCP:+13001||06F416A6-E1D0-11E4-B2F7-D60C3E25F06A| -drv=ANY -ust=lisrv -dir=/home/usys/uniface9604/gest -asn=/opt/labinf/asn96/lisrv.asn

    It depends from a bug in glibc; upgrade your glibc to version >=2.18 or to latest avalable for currently supported RedHat Enterprise and derivative.

    Consider I didn't change ANYTHING on Uniface!!!

    Gianni


    Author: gianni (gianni.sandigliano@unifacesolutions.com)