participate


Java Virtual Machine (JVM) - 15 Minute Garbage Collection (promotion failed)
This question is not answered.

<<   Back to Forum  |   Give us Feedback
This topic has 4 replies on 1 page.
Rob.Chansler
Posts:1
Registered: 9/11/08
15 Minute Garbage Collection (promotion failed)   
Sep 11, 2008 1:47 PM
 
 
I was charmed by our experience with the concurrent garbage collector on a large server (a Hadoop name node) until this happened. From examining application performance measures, the suspicion is that this occurred when a large number of long-lived objects were freed during a short interval. An event like this is rare—it was observed only once in several weeks of operation across several servers.

Is it possible to anticipate such an event by monitoring GC statistics? And of course, is it possible to evade such events by a more thoughtful choice of GC parameters?

We can afford multiple concurrent threads, and easily tolerate modest (1-10 second) interruptions of the application. We run Sun’s Java 6 JVM on Red Hat Linux.

I would welcome any clues you might share!

Rob Chansler
HDFS Development


GC params:
/grid/0/java/jdk/bin/java -Xmx14000m -server -XX:ParallelGCThreads=8
-XX:+UseConcMarkSweepGC
-XX:ErrorFile=/grid/0/hadoop/var/hs_err_pid%p.log -XX:NewSize=1G
-XX:MaxNewSize=1G -Xloggc:/grid/0/hadoop/var/log/gc.log -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps


The bad news:
2008-08-26T10:44:34.377+0000: 3133772.612: [GC 3133772.612: [ParNew (promotion failed): 846226K->843806K(943744K), 0.9896180 secs]3133773.602: [CMS: 9440626K->6752921K(13287424K), 875.4940150 secs] 10285742K->6752921K(14231168K), 876.5044790 secs] [Times: user=68.80 sys=11.75, real=876.57 secs]


Previous big collection:

2008-07-21T04:15:53.676+0000: 51.911: [GC 51.911: [ParNew: 943744K->104832K(943744K), 0.3818170 secs] 2445397K->1679701K(4089472K), 0.3819150 secs] [Times: user=1.04 sys=0.11, real=0.38 secs]
2008-07-21T04:15:54.058+0000: 52.293: [GC [1 CMS-initial-mark: 1574869K(3145728K)] 1696485K(4089472K), 0.0650750 secs] [Times: user=0.06 sys=0.01, real=0.07 secs]
2008-07-21T04:15:54.124+0000: 52.358: [CMS-concurrent-mark-start]
2008-07-21T04:15:56.194+0000: 54.428: [GC 54.428: [ParNew: 943744K->104832K(943744K), 0.4020780 secs] 2518613K->1744091K(4089472K), 0.4021790 secs] [Times: user=1.27 sys=0.10, real=0.40 secs]
2008-07-21T04:15:58.660+0000: 56.894: [GC 56.894: [ParNew: 943744K->102925K(943744K), 0.4112150 secs] 2583003K->1811261K(4089472K), 0.4113150 secs] [Times: user=1.30 sys=0.11, real=0.41 secs]
2008-07-21T04:16:00.598+0000: 58.832: [CMS-concurrent-mark: 5.660/6.474 secs] [Times: user=19.27 sys=0.43, real=6.47 secs]
2008-07-21T04:16:00.598+0000: 58.832: [CMS-concurrent-preclean-start]
2008-07-21T04:16:00.907+0000: 59.141: [GC 59.141: [ParNew: 941837K->104425K(943744K), 0.4080750 secs] 2650173K->1882155K(4089472K), 0.4081890 secs] [Times: user=1.31 sys=0.12, real=0.41 secs]
2008-07-21T04:16:02.886+0000: 61.120: [CMS-concurrent-preclean: 1.731/2.288 secs] [Times: user=4.79 sys=0.39, real=2.29 secs]
2008-07-21T04:16:02.886+0000: 61.120: [CMS-concurrent-abortable-preclean-start]
2008-07-21T04:16:03.100+0000: 61.334: [GC 61.334: [ParNew: 943337K->103938K(943744K), 0.3826300 secs] 2721067K->1950840K(4089472K), 0.3827300 secs] [Times: user=1.20 sys=0.10, real=0.38 secs]
2008-07-21T04:16:03.483+0000: 61.717: [CMS-concurrent-abortable-preclean: 0.208/0.597 secs] [Times: user=1.61 sys=0.12, real=0.60 secs]
2008-07-21T04:16:03.483+0000: 61.717: [GC[YG occupancy: 120722 K (943744 K)]61.717: [Rescan (parallel) , 0.1772610 secs]61.895: [weak refs processing, 0.0006690 secs] [1 CMS-remark: 1846902K(3145728K)] 1967624K(4089472K), 0.1780550 secs] [Times: user=0.69 sys=0.01, real=0.18 secs]
2008-07-21T04:16:03.663+0000: 61.897: [CMS-concurrent-sweep-start]
2008-07-21T04:16:05.277+0000: 63.511: [CMS-concurrent-sweep: 1.614/1.614 secs] [Times: user=3.21 sys=0.01, real=1.61 secs]
2008-07-21T04:16:05.277+0000: 63.511: [CMS-concurrent-reset-start]
2008-07-21T04:16:05.467+0000: 63.702: [CMS-concurrent-reset: 0.190/0.190 secs] [Times: user=0.27 sys=0.11, real=0.19 secs]
2008-07-21T04:16:05.552+0000: 63.786: [GC 63.786: [ParNew: 942850K->104832K(943744K), 0.3284880 secs] 2740583K->1961645K(4089472K), 0.3285860 secs] [Times: user=1.14 sys=0.02, real=0.33 secs]



More context before the big event:

2008-08-26T10:42:16.522+0000: 3133634.756: [GC 3133634.756: [ParNew: 842104K->1828K(943744K), 0.0630230 secs] 10270596K->9430323K(14231168K), 0.0631670 secs] [Times: user=0.24 sys=0.00, real=0.06 secs]
2008-08-26T10:42:18.525+0000: 3133636.759: [GC 3133636.759: [ParNew: 840740K->982K(943744K), 0.0559390 secs] 10269235K->9429482K(14231168K), 0.0560870 secs] [Times: user=0.20 sys=0.00, real=0.06 secs]
2008-08-26T10:42:20.715+0000: 3133638.949: [GC 3133638.949: [ParNew: 839894K->887K(943744K), 0.0755280 secs] 10268394K->9429396K(14231168K), 0.0756720 secs] [Times: user=0.28 sys=0.00, real=0.08 secs]
2008-08-26T10:42:23.026+0000: 3133641.260: [GC 3133641.260: [ParNew: 839799K->4790K(943744K), 0.0591790 secs] 10268308K->9433300K(14231168K), 0.0593340 secs] [Times: user=0.21 sys=0.01, real=0.06 secs]
2008-08-26T10:42:25.240+0000: 3133643.474: [GC 3133643.474: [ParNew: 843702K->3526K(943744K), 0.2852200 secs] 10272212K->9432714K(14231168K), 0.2853640 secs] [Times: user=0.24 sys=0.07, real=0.28 secs]
2008-08-26T10:42:27.559+0000: 3133645.793: [GC 3133645.793: [ParNew: 842438K->23766K(943744K), 0.0990300 secs] 10271626K->9452958K(14231168K), 0.0991760 secs] [Times: user=0.37 sys=0.01, real=0.10 secs]
2008-08-26T10:42:29.758+0000: 3133647.992: [GC 3133647.992: [ParNew: 862678K->73519K(943744K), 0.1520710 secs] 10291870K->9502715K(14231168K), 0.1522260 secs] [Times: user=0.53 sys=0.01, real=0.15 secs]
2008-08-26T10:42:32.029+0000: 3133650.263: [GC 3133650.263: [ParNew: 912431K->52066K(943744K), 0.2556350 secs] 10341627K->9482335K(14231168K), 0.2557910 secs] [Times: user=0.28 sys=0.01, real=0.25 secs]
2008-08-26T10:42:34.276+0000: 3133652.510: [GC 3133652.511: [ParNew: 890978K->14685K(943744K), 0.0807430 secs] 10321247K->9444954K(14231168K), 0.0808890 secs] [Times: user=0.28 sys=0.00, real=0.08 secs]
2008-08-26T10:42:36.455+0000: 3133654.690: [GC 3133654.690: [ParNew: 853597K->4615K(943744K), 0.0739620 secs] 10283866K->9434884K(14231168K), 0.0741070 secs] [Times: user=0.25 sys=0.00, real=0.08 secs]
2008-08-26T10:42:38.620+0000: 3133656.854: [GC 3133656.854: [ParNew: 843527K->1808K(943744K), 0.0538370 secs] 10273796K->9432082K(14231168K), 0.0539820 secs] [Times: user=0.20 sys=0.00, real=0.05 secs]
2008-08-26T10:42:40.741+0000: 3133658.976: [GC 3133658.976: [ParNew: 840720K->1260K(943744K), 0.0616640 secs] 10270994K->9431540K(14231168K), 0.0618110 secs] [Times: user=0.22 sys=0.01, real=0.06 secs]
2008-08-26T10:42:42.800+0000: 3133661.034: [GC 3133661.034: [ParNew: 840172K->733K(943744K), 0.0628710 secs] 10270452K->9431016K(14231168K), 0.0630200 secs] [Times: user=0.20 sys=0.01, real=0.06 secs]
2008-08-26T10:42:44.691+0000: 3133662.925: [GC 3133662.925: [ParNew: 839645K->1306K(943744K), 0.0556130 secs] 10269928K->9431595K(14231168K), 0.0557570 secs] [Times: user=0.21 sys=0.01, real=0.06 secs]
2008-08-26T10:42:46.622+0000: 3133664.856: [GC 3133664.857: [ParNew: 840218K->603K(943744K), 0.0868630 secs] 10270507K->9430895K(14231168K), 0.0870050 secs] [Times: user=0.27 sys=0.01, real=0.09 secs]
2008-08-26T10:42:48.559+0000: 3133666.794: [GC 3133666.794: [ParNew: 839515K->597K(943744K), 0.0526450 secs] 10269807K->9430892K(14231168K), 0.0527890 secs] [Times: user=0.20 sys=0.00, real=0.05 secs]
2008-08-26T10:42:50.434+0000: 3133668.668: [GC 3133668.668: [ParNew: 839509K->270K(943744K), 0.0622910 secs] 10269804K->9430569K(14231168K), 0.0624350 secs] [Times: user=0.22 sys=0.01, real=0.07 secs]
2008-08-26T10:42:52.102+0000: 3133670.337: [GC 3133670.337: [ParNew: 839182K->420K(943744K), 0.0685080 secs] 10269481K->9430721K(14231168K), 0.0686470 secs] [Times: user=0.26 sys=0.01, real=0.07 secs]
2008-08-26T10:42:54.221+0000: 3133672.455: [GC 3133672.455: [ParNew: 839332K->1171K(943744K), 0.0680490 secs] 10269633K->9431478K(14231168K), 0.0681910 secs] [Times: user=0.26 sys=0.00, real=0.07 secs]
2008-08-26T10:42:56.246+0000: 3133674.481: [GC 3133674.481: [ParNew: 831218K->1526K(943744K), 0.0683390 secs] 10261525K->9431840K(14231168K), 0.0684800 secs] [Times: user=0.24 sys=0.01, real=0.07 secs]
2008-08-26T10:42:58.233+0000: 3133676.467: [GC 3133676.468: [ParNew: 840438K->2752K(943744K), 0.0631640 secs] 10270752K->9433082K(14231168K), 0.0633000 secs] [Times: user=0.21 sys=0.01, real=0.07 secs]
2008-08-26T10:43:00.121+0000: 3133678.355: [GC 3133678.356: [ParNew: 841664K->46642K(943744K), 0.1124360 secs] 10271994K->9476986K(14231168K), 0.1125890 secs] [Times: user=0.42 sys=0.00, real=0.11 secs]
2008-08-26T10:43:02.170+0000: 3133680.405: [GC 3133680.405: [ParNew: 885554K->29817K(943744K), 0.0562090 secs] 10315898K->9460198K(14231168K), 0.0563620 secs] [Times: user=0.20 sys=0.00, real=0.06 secs]
2008-08-26T10:43:04.244+0000: 3133682.478: [GC 3133682.478: [ParNew: 868729K->9613K(943744K), 0.0663680 secs] 10299110K->9440009K(14231168K), 0.0665290 secs] [Times: user=0.24 sys=0.01, real=0.06 secs]
2008-08-26T10:43:06.301+0000: 3133684.535: [GC 3133684.535: [ParNew: 848525K->6436K(943744K), 0.0757490 secs] 10278921K->9436834K(14231168K), 0.0758960 secs] [Times: user=0.29 sys=0.00, real=0.08 secs]
2008-08-26T10:43:08.237+0000: 3133686.472: [GC 3133686.472: [ParNew: 845348K->3761K(943744K), 0.0586470 secs] 10275746K->9434167K(14231168K), 0.0587930 secs] [Times: user=0.20 sys=0.00, real=0.06 secs]
2008-08-26T10:43:10.444+0000: 3133688.679: [GC 3133688.679: [ParNew: 842673K->1600K(943744K), 0.0695720 secs] 10273079K->9432021K(14231168K), 0.0697230 secs] [Times: user=0.25 sys=0.01, real=0.07 secs]
2008-08-26T10:43:12.695+0000: 3133690.929: [GC 3133690.929: [ParNew: 840512K->330K(943744K), 0.1088840 secs] 10270933K->9430868K(14231168K), 0.1090320 secs] [Times: user=0.25 sys=0.01, real=0.11 secs]
2008-08-26T10:43:14.824+0000: 3133693.059: [GC 3133693.059: [ParNew: 839242K->975K(943744K), 0.0572740 secs] 10269780K->9431517K(14231168K), 0.0574170 secs] [Times: user=0.21 sys=0.01, real=0.05 secs]
2008-08-26T10:43:16.927+0000: 3133695.161: [GC 3133695.161: [ParNew: 839887K->1133K(943744K), 0.0575420 secs] 10270429K->9431678K(14231168K), 0.0576880 secs] [Times: user=0.22 sys=0.01, real=0.06 secs]
2008-08-26T10:43:19.095+0000: 3133697.329: [GC 3133697.329: [ParNew: 840045K->763K(943744K), 0.0573000 secs] 10270590K->9431311K(14231168K), 0.0574390 secs] [Times: user=0.21 sys=0.01, real=0.05 secs]
2008-08-26T10:43:21.096+0000: 3133699.331: [GC 3133699.331: [ParNew: 839675K->13956K(943744K), 0.0928380 secs] 10270223K->9444507K(14231168K), 0.0929930 secs] [Times: user=0.32 sys=0.01, real=0.09 secs]
2008-08-26T10:43:23.022+0000: 3133701.256: [GC 3133701.257: [ParNew: 852868K->50935K(943744K), 0.1159200 secs] 10283419K->9481501K(14231168K), 0.1160710 secs] [Times: user=0.44 sys=0.00, real=0.11 secs]
2008-08-26T10:43:25.204+0000: 3133703.438: [GC 3133703.438: [ParNew: 889847K->37299K(943744K), 0.0763920 secs] 10320413K->9467867K(14231168K), 0.0765390 secs] [Times: user=0.28 sys=0.00, real=0.07 secs]
2008-08-26T10:43:27.458+0000: 3133705.692: [GC 3133705.693: [ParNew: 876211K->28042K(943744K), 0.0856480 secs] 10306779K->9458612K(14231168K), 0.0857940 secs] [Times: user=0.31 sys=0.01, real=0.09 secs]
2008-08-26T10:43:29.821+0000: 3133708.055: [GC 3133708.055: [ParNew: 866954K->26572K(943744K), 0.0796810 secs] 10297524K->9457146K(14231168K), 0.0798360 secs] [Times: user=0.29 sys=0.01, real=0.08 secs]
2008-08-26T10:43:31.898+0000: 3133710.133: [GC 3133710.133: [ParNew: 865484K->9612K(943744K), 0.0759190 secs] 10296058K->9440189K(14231168K), 0.0761010 secs] [Times: user=0.28 sys=0.01, real=0.08 secs]
2008-08-26T10:43:33.877+0000: 3133712.111: [GC 3133712.111: [ParNew: 848524K->57083K(943744K), 0.1264440 secs] 10279101K->9487664K(14231168K), 0.1265930 secs] [Times: user=0.47 sys=0.01, real=0.13 secs]
2008-08-26T10:43:36.105+0000: 3133714.340: [GC 3133714.340: [ParNew: 895995K->93978K(943744K), 0.1617870 secs] 10326576K->9524602K(14231168K), 0.1619380 secs] [Times: user=0.48 sys=0.01, real=0.16 secs]
2008-08-26T10:43:38.336+0000: 3133716.570: [GC 3133716.570: [ParNew: 932890K->62254K(943744K), 0.0824270 secs] 10363514K->9492886K(14231168K), 0.0825700 secs] [Times: user=0.31 sys=0.00, real=0.08 secs]
2008-08-26T10:43:40.566+0000: 3133718.801: [GC 3133718.801: [ParNew: 901166K->20715K(943744K), 0.0593320 secs] 10331798K->9451347K(14231168K), 0.0594680 secs] [Times: user=0.21 sys=0.00, real=0.06 secs]
2008-08-26T10:43:42.683+0000: 3133720.917: [GC 3133720.917: [ParNew: 859627K->7139K(943744K), 0.0652910 secs] 10290259K->9437772K(14231168K), 0.0654460 secs] [Times: user=0.22 sys=0.00, real=0.06 secs]
2008-08-26T10:43:44.905+0000: 3133723.139: [GC 3133723.140: [ParNew: 846051K->6879K(943744K), 0.0603340 secs] 10276684K->9437512K(14231168K), 0.0604880 secs] [Times: user=0.22 sys=0.00, real=0.06 secs]
2008-08-26T10:43:47.290+0000: 3133725.524: [GC 3133725.524: [ParNew: 845791K->69927K(943744K), 0.7552410 secs] 10276424K->9502175K(14231168K), 0.7553990 secs] [Times: user=0.47 sys=0.03, real=0.76 secs]
2008-08-26T10:43:50.223+0000: 3133728.457: [GC 3133728.457: [ParNew: 908839K->50757K(943744K), 0.3770860 secs] 10341087K->9484702K(14231168K), 0.3772430 secs] [Times: user=0.29 sys=0.02, real=0.37 secs]
2008-08-26T10:43:53.665+0000: 3133731.899: [GC 3133731.899: [ParNew: 889669K->17570K(943744K), 0.1151750 secs] 10323614K->9451515K(14231168K), 0.1153260 secs] [Times: user=0.23 sys=0.00, real=0.12 secs]
2008-08-26T10:43:55.800+0000: 3133734.034: [GC 3133734.034: [ParNew: 856482K->52245K(943744K), 0.1297590 secs] 10290427K->9486190K(14231168K), 0.1299040 secs] [Times: user=0.47 sys=0.02, real=0.13 secs]
2008-08-26T10:43:57.894+0000: 3133736.128: [GC 3133736.128: [ParNew: 891157K->66220K(943744K), 0.1084660 secs] 10325102K->9500165K(14231168K), 0.1086110 secs] [Times: user=0.43 sys=0.00, real=0.11 secs]
2008-08-26T10:43:59.970+0000: 3133738.205: [GC 3133738.205: [ParNew: 905132K->87376K(943744K), 0.1282660 secs] 10339077K->9521327K(14231168K), 0.1284180 secs] [Times: user=0.43 sys=0.00, real=0.12 secs]
2008-08-26T10:44:02.140+0000: 3133740.374: [GC 3133740.374: [ParNew: 926288K->56830K(943744K), 0.2168710 secs] 10360239K->9491670K(14231168K), 0.2170190 secs] [Times: user=0.36 sys=0.00, real=0.21 secs]
2008-08-26T10:44:04.351+0000: 3133742.586: [GC 3133742.586: [ParNew: 895742K->34998K(943744K), 0.1215650 secs] 10330582K->9470402K(14231168K), 0.1217300 secs] [Times: user=0.23 sys=0.00, real=0.12 secs]
2008-08-26T10:44:06.703+0000: 3133744.937: [GC 3133744.937: [ParNew: 873910K->10916K(943744K), 0.0731220 secs] 10309314K->9446321K(14231168K), 0.0732850 secs] [Times: user=0.27 sys=0.00, real=0.07 secs]
2008-08-26T10:44:08.988+0000: 3133747.222: [GC 3133747.222: [ParNew: 849828K->15970K(943744K), 0.1863390 secs] 10285233K->9452538K(14231168K), 0.1864860 secs] [Times: user=0.28 sys=0.00, real=0.19 secs]
2008-08-26T10:44:11.230+0000: 3133749.464: [GC 3133749.464: [ParNew: 854882K->11414K(943744K), 0.0752990 secs] 10291450K->9448045K(14231168K), 0.0754470 secs] [Times: user=0.23 sys=0.00, real=0.07 secs]
2008-08-26T10:44:13.539+0000: 3133751.773: [GC 3133751.773: [ParNew: 850326K->4165K(943744K), 0.1450020 secs] 10286957K->9442249K(14231168K), 0.1451500 secs] [Times: user=0.24 sys=0.01, real=0.14 secs]
2008-08-26T10:44:15.777+0000: 3133754.011: [GC 3133754.011: [ParNew: 843077K->3117K(943744K), 0.1793890 secs] 10281161K->9442598K(14231168K), 0.1795460 secs] [Times: user=0.22 sys=0.01, real=0.18 secs]
2008-08-26T10:44:18.099+0000: 3133756.333: [GC 3133756.333: [ParNew: 842029K->2595K(943744K), 0.0573670 secs] 10281510K->9442091K(14231168K), 0.0575450 secs] [Times: user=0.21 sys=0.00, real=0.05 secs]
2008-08-26T10:44:20.133+0000: 3133758.367: [GC 3133758.367: [ParNew: 841507K->23463K(943744K), 0.0958970 secs] 10281003K->9462964K(14231168K), 0.0960530 secs] [Times: user=0.35 sys=0.00, real=0.10 secs]
2008-08-26T10:44:22.107+0000: 3133760.341: [GC 3133760.341: [ParNew: 862375K->15068K(943744K), 0.0551030 secs] 10301876K->9454572K(14231168K), 0.0552540 secs] [Times: user=0.20 sys=0.00, real=0.06 secs]
2008-08-26T10:44:24.113+0000: 3133762.348: [GC 3133762.348: [ParNew: 853980K->4335K(943744K), 0.0533730 secs] 10293484K->9443841K(14231168K), 0.0535190 secs] [Times: user=0.20 sys=0.00, real=0.05 secs]
2008-08-26T10:44:26.140+0000: 3133764.374: [GC 3133764.374: [ParNew: 843247K->41257K(943744K), 0.1019790 secs] 10282753K->9480765K(14231168K), 0.1021250 secs] [Times: user=0.40 sys=0.00, real=0.10 secs]
2008-08-26T10:44:28.218+0000: 3133766.452: [GC 3133766.452: [ParNew: 880169K->30235K(943744K), 0.0622620 secs] 10319677K->9469746K(14231168K), 0.0624020 secs] [Times: user=0.23 sys=0.00, real=0.06 secs]
2008-08-26T10:44:30.273+0000: 3133768.508: [GC 3133768.508: [ParNew: 869147K->9638K(943744K), 0.0649140 secs] 10308658K->9449152K(14231168K), 0.0650640 secs] [Times: user=0.25 sys=0.00, real=0.07 secs]
2008-08-26T10:44:32.339+0000: 3133770.573: [GC 3133770.573: [ParNew: 848550K->7314K(943744K), 0.0606370 secs] 10288064K->9446830K(14231168K), 0.0607820 secs] [Times: user=0.22 sys=0.01, real=0.06 secs]
2008-08-26T10:44:34.377+0000: 3133772.612: [GC 3133772.612: [ParNew (promotion failed): 846226K->843806K(943744K), 0.9896180 secs]3133773.602: [CMS: 9440626K->6752921K(13287424K), 875.4940150 secs] 10285742K->6752921K(14231168K), 876.5044790 secs] [Times: user=68.80 sys=11.75, real=876.57 secs]
2008-08-26T10:59:12.885+0000: 3134651.120: [GC 3134651.120: [ParNew: 838912K->23797K(943744K), 0.0948810 secs] 7591833K->6776719K(14231168K), 0.0950400 secs] [Times: user=0.33 sys=0.01, real=0.09 secs]
2008-08-26T10:59:17.536+0000: 3134655.771: [GC 3134655.771: [ParNew: 861121K->59563K(943744K), 0.1132570 secs] 7614043K->6812485K(14231168K), 0.1134080 secs] [Times: user=0.42 sys=0.02, real=0.11 secs]
2008-08-26T10:59:20.495+0000: 3134658.729: [GC 3134658.729: [ParNew: 898475K->51081K(943744K), 0.3042860 secs] 7651397K->6811886K(14231168K), 0.3044420 secs] [Times: user=1.05 sys=0.06, real=0.30 secs]
2008-08-26T10:59:23.153+0000: 3134661.387: [GC 3134661.387: [ParNew: 889993K->49014K(943744K), 0.2169420 secs] 7650798K->6809820K(14231168K), 0.2170890 secs] [Times: user=0.75 sys=0.05, real=0.22 secs]
2008-08-26T10:59:24.483+0000: 3134662.717: [GC 3134662.717: [ParNew: 887926K->64394K(943744K), 0.1089200 secs] 7648732K->6825199K(14231168K), 0.1090620 secs] [Times: user=0.42 sys=0.02, real=0.11 secs]
 
ramki_at_jdc
Posts:229
Registered: 6/24/03
Re: 15 Minute Garbage Collection (promotion failed)   
Sep 11, 2008 4:20 PM (reply 1 of 4)  (In reply to original post )
 
 
Hello Rob --

Is it true that you had but a single CMS cycle event in an entire month?
(That would indeed square up with the observation that the promotion rate
into your gigantic old generation is a mere trickle.)

It appears as though you may be on a relatively recent version of Java? Perhaps
JDK 6uXX? In that case, (and my colleagues will beat me up for suggesting
this) you might consider running a CMS cycle explicitly
perhaps once a week. Add the option -XX:+ExplicitGCInvokesConcurrent
and have a thread that runs an explicit gc (System.gc()) may be once
a week or so. (Of course this is not a general solution if you expect
variable and bursty load on your server so that the "week or so"
interval will not work.

You could estimate the rough footprint you expect your application to
have, and if it is "X", and your old gen size is "S" (here 13G) perhaps
run with the option -XX:CMSinitiatingOccupancyFraction=Y
where Y = (X+S)* 100/(2S)

That basically says that even if the "adaptive"/"ergonomic" CMS triggering
policy says to start a CMS collection cycle pretty late, do one anyway
when the occupancy of the old gen reaches mid-way between the
steady-state footprint and completely full.

You might find Jon Masamitsu's tuning tips helpful:-

blogs.sun.com/jonthecollector

As well as the GC tuning guide, plus other resources hanging off of:-

http://java.sun.com/javase/technologies/hotspot/gc/index.jsp

I am meanwhile filing a performance bug on CMS/GC to improve its
"ergonomic" trigger so it does not go for Rip-Van-Winkle naps, as happened
in this case.

-- ramki
 
raghu1111
Posts:1
Registered: 9/15/08
Re: 15 Minute Garbage Collection (promotion failed)   
Sep 15, 2008 5:28 PM (reply 2 of 4)  (In reply to #1 )
 
 
Thanks for reply and multiple options Ramki. We are using jdk1.6.0_i586.

Could you briefly explain what was wrong at the time 15-min GC started? If you could elaborate a bit more about the following, we could understand more about avoiding this problem with our load.

ramki_at_jdc wrote:
I am meanwhile filing a performance bug on CMS/GC to improve its
"ergonomic" trigger so it does not go for Rip-Van-Winkle naps, as happened
in this case.
 
djeezus
Posts:5
Registered: 11/14/07
Re: 15 Minute Garbage Collection (promotion failed)   
Sep 16, 2008 1:04 AM (reply 3 of 4)  (In reply to #2 )
 
 
'lo all,

I agree with Ramki's proposal to do an ExplicitGC, and with jvm1.6 at least you can have CMS collector do this, so with low pause time.

However, you could also try using "MaxTenuringThreshold=30" in combination with proper sizing of SurvivorSpaces via "SurvivorRatio=2", which will create large survivorspaces, which leaves plenty of time for young objects to die before they reach "30" references and hence get promoted to oldgen...

I have used this technique with jvm1.5.0_11; I think CMS is default collector in 1.6, but I think the SurvivorRatio and MaxTenuringThreshold are still somewhat conservative. Must check docs on that.

What Ramki meant with "improving CMS/GC ergonomics so it won't go to sleep for 15min." is that CMS was brought to live just to do that, namely lower the pause times by doing GC work (mostly) concurrent with the application. If it pauses the app 15min. it might be a bug in jvm GC CMS code, or you could try the above I suggested, see if that works around the need for old-gen cleanup

grtz,
gert
 
ramki_at_jdc
Posts:229
Registered: 6/24/03
Re: 15 Minute Garbage Collection (promotion failed)   
Sep 17, 2008 9:20 AM (reply 4 of 4)  (In reply to #2 )
 
 
Thanks for reply and multiple options Ramki. We are using jdk1.6.0_i586.

Yes that supports +ExplicitGCInvokesConcurrent.


Could you briefly explain what was wrong at the time 15-min GC started? ...

Well, basically, because a major collection cycle had not happened for a month,
your (old gen) heap contained a lot of dead objects that should have been reclaimed
long ago. Because there was a lot of room in the old generation, and because the
automatic ("ergonomic" above) triggering of concurrent CMS cycles is not
sufficiently sophisticated, there was no major collection cycle in the interim.
The 15 minute event was a stop-world compacting full gc, which would likely
have been prevented had the concurrent collector run periodically
in the previous month.

If you could elaborate a bit more about the following, we could understand more about avoiding this problem with our load.

ramki_at_jdc wrote:
I am meanwhile filing a performance bug on CMS/GC to improve its
"ergonomic" trigger so it does not go for Rip-Van-Winkle naps, as happened
in this case.

The Rip-Van-Winkle reference was to CMS not running for an entire month.
The 15-minute full gc is likely to be expected under the circumstances
(a very large heap, much of which may not even have been resident
when the full gc event happened). The solution is to avoid that full gc
by doing periodic concurrent collections. In an ideal world, the concurrent
collector would automatically figure out when and how frequently to
run despite the low promotion rates. What I suggested you do above was
a workaround to avoid the problem.

I will post a reference to the CR I will file to address this in the JVM.
Meanwhile, please try my suggestions. Feel free to email
hotspot \dash gc \dash use \at openjdk \dot java \dot net for
a higher bandwidth communication channel.

Complete logs would help us as well, if you can share them
(please contact me on the above alias).

-- ramki
 
This topic has 4 replies on 1 page.
Back to Forum
 
Read the Developer Forums Code of Conduct

Click to email this message Email this Topic

Edit this Topic
  
 
 
Forums Statistics
    Users Online : 28
  • Guests : 129

About Sun forums
  • Oracle Forums is a large collection of user generated discussions. It is here to help you ask questions, find answers, and participate in discussions.

    Check out our guide on Getting started with Oracle Forums for a full walkthrough of how to best leverage the benefits of this community.

Powered by Jive Forums