participate


Java Virtual Machine (JVM) - Spurious OutOfMemoryError on amd64 Linux JVM -- fine on 32bit JVM
<<   Back to Forum  |   Give us Feedback
5 Duke Stars available
This topic has 13 replies on 1 page.
gojomo
Posts:34
Registered: 7/8/97
Spurious OutOfMemoryError on amd64 Linux JVM -- fine on 32bit JVM   
Oct 7, 2005 6:34 PM

 
Our application has a stress test that runs fine for hours with a 512m heap on a 32bit 1.5.0_05 JVM -- but the exact same test triggers an OutOfMemoryError ("Java heap space") within 10 minutes using the 64bit amd64 1.5.0_05 JVM. Trying a larger heap doesn't buy any more time: same error in about the same number of minutes with a 1g heap.

I thus suspect a GC bug in the 64bit JVM.

Are there known issues/workarounds?

System is a dual-opteron w/4g ram running SUSE linux:

# cat /proc/version
Linux version 2.6.11.4-21.9-smp (geeko@buildhost) (gcc version 3.3.5 20050117 (prerelease) (SUSE Linux)) #1 SMP Fri Aug 19 11:58:59 UTC 2005

Any ideas appreciated.

- Gordon @ IA
 
Shaik786
Posts:8
Registered: 10/6/05
Re: Spurious OutOfMemoryError on amd64 Linux JVM -- fine on 32bit JVM   
Oct 10, 2005 10:10 PM (reply 1 of 13)  (In reply to original post )

 
Hi,

why dont u try -Xms and -Xmx commands?

ex:

-Xms128MB -Xmx256MB
 
tschodt
Posts:4,417
Registered: 4/24/98
Re: Spurious OutOfMemoryError on amd64 Linux JVM -- fine on 32bit JVM   
Oct 11, 2005 3:00 AM (reply 2 of 13)  (In reply to original post )

 
What JVM options do you run with?
Can you get a GC log of this?
 
nhaustov
Posts:15
Registered: 8/11/05
Re: Spurious OutOfMemoryError on amd64 Linux JVM -- fine on 32bit JVM   
Oct 11, 2005 4:00 AM (reply 3 of 13)  (In reply to original post )

 
Can you give some more information about your stress test? What does it do? Anything that can help to track down the problem.
 
gojomo
Posts:34
Registered: 7/8/97
Re: Spurious OutOfMemoryError on amd64 Linux JVM -- fine on 32bit JVM   
Oct 11, 2005 1:01 PM (reply 4 of 13)  (In reply to #3 )

 
We use -Xms and -Xmx, with the same value for our tests and production runs.

The JVM options used on our most recent reproduction of this are:

-Xms512m -Xmx512m -Xloggc:gc.log -XX:-UseGCTimeLimit

We've also previously tried -XX:+UseSerialGC without any improvement.

The gc.log is pasted below; there's excessive Full GC activity starting second 163.

Our stress test is of our heavily multithreaded web crawler against other dummy web servers offering up endless mock content. The dummy servers are running on the same machine.

Again, the exact same test runs for many hours, completing much more work, when using the 1.5.0_05 32bit JVM on the exact same machine/OS.

The OOM occurs with a real workload too (a workload that with the 32bit JVM runs indefinitely); the stress test is just good at triggering the spurious OOM quickly.

Thanks for any ideas.

- Gordon @ IA

=== gc.log===
cat gc.log
0.000: [GC 131072K->9734K(502464K), 0.0171170 secs]
11.379: [GC 140806K->20622K(502464K), 0.0290280 secs]
14.586: [GC 151694K->22551K(502464K), 0.0310990 secs]
16.471: [GC 153623K->24780K(502464K), 0.0273150 secs]
18.624: [GC 155852K->27426K(500736K), 0.0285100 secs]
20.499: [GC 156770K->29785K(481600K), 0.0306480 secs]
22.213: [GC 139993K->32378K(484224K), 0.0310670 secs]
24.098: [GC 142586K->34249K(490304K), 0.0321960 secs]
25.620: [GC 142537K->36281K(486208K), 0.0355840 secs]
26.960: [GC 144569K->38394K(486400K), 0.0378570 secs]
28.567: [GC 140666K->40328K(488064K), 0.0448030 secs]
29.835: [GC 142600K->42760K(487104K), 0.0323980 secs]
31.081: [GC 145352K->44404K(487744K), 0.0299550 secs]
32.401: [GC 146996K->46468K(488384K), 0.0278770 secs]
33.701: [GC 149764K->48580K(463104K), 0.0257660 secs]
34.961: [GC 151876K->51028K(487808K), 0.0860810 secs]
36.228: [GC 153300K->52772K(458176K), 0.0194730 secs]
37.370: [GC 155044K->54700K(487104K), 0.0206750 secs]
38.302: [GC 155692K->56620K(452672K), 0.0137150 secs]
39.272: [GC 157612K->58644K(487296K), 0.0104980 secs]
40.304: [GC 159252K->60876K(452480K), 0.0114250 secs]
41.315: [GC 161484K->62740K(487360K), 0.0108680 secs]
42.503: [GC 164052K->64828K(486656K), 0.0110980 secs]
43.681: [GC 166140K->66932K(487424K), 0.0114790 secs]
44.619: [GC 170484K->68812K(486912K), 0.0106730 secs]
45.629: [GC 172364K->71356K(489152K), 0.0122510 secs]
46.832: [GC 179324K->73188K(488832K), 0.0116810 secs]
47.746: [GC 181156K->75532K(491648K), 0.0125570 secs]
48.661: [GC 188684K->77788K(491328K), 0.0128220 secs]
49.737: [GC 190940K->79924K(494272K), 0.0125230 secs]
50.696: [GC 198516K->82340K(493952K), 0.0128840 secs]
51.808: [GC 200932K->84636K(497216K), 0.0130020 secs]
52.771: [GC 208732K->87340K(496832K), 0.0144920 secs]
54.058: [GC 211436K->89460K(499968K), 0.0427670 secs]
55.170: [GC 218612K->92324K(499392K), 0.0144250 secs]
56.264: [GC 221476K->95300K(502400K), 0.0156980 secs]
57.233: [GC 229316K->97356K(501760K), 0.0144990 secs]
58.291: [GC 231372K->100108K(505152K), 0.0153870 secs]
59.394: [GC 238796K->103316K(504384K), 0.0163420 secs]
60.500: [GC 242004K->105548K(507200K), 0.0152320 secs]
61.705: [GC 248076K->108708K(506368K), 0.0171280 secs]
62.791: [GC 251236K->111660K(509120K), 0.0183790 secs]
64.078: [GC 257836K->114076K(508288K), 0.0159550 secs]
65.304: [GC 260252K->117308K(510912K), 0.0184850 secs]
66.551: [GC 266812K->120068K(510208K), 0.0175000 secs]
67.926: [GC 269572K->123388K(512512K), 0.0178030 secs]
69.166: [GC 275772K->126404K(511872K), 0.0176150 secs]
70.411: [GC 278788K->129412K(513728K), 0.0240760 secs]
71.821: [GC 284228K->132084K(513152K), 0.0921570 secs]
73.441: [GC 286900K->135124K(514944K), 0.0173320 secs]
75.000: [GC 292116K->138524K(514560K), 0.0345050 secs]
76.549: [GC 295516K->142028K(515712K), 0.0193610 secs]
77.766: [GC 300620K->144652K(515456K), 0.0177930 secs]
78.931: [GC 303244K->148012K(516672K), 0.0191800 secs]
80.267: [GC 308204K->151668K(516416K), 0.0197100 secs]
81.538: [GC 311860K->155660K(517056K), 0.0242030 secs]
82.730: [GC 316684K->157540K(517184K), 0.0191990 secs]
84.013: [GC 318564K->160748K(517312K), 0.0191010 secs]
85.289: [GC 322540K->164612K(517440K), 0.0205260 secs]
86.633: [GC 326404K->168276K(518080K), 0.0217100 secs]
88.264: [GC 330708K->170388K(515584K), 0.0193590 secs]
89.665: [GC 332820K->173980K(518208K), 0.0205610 secs]
90.885: [GC 336668K->177276K(518016K), 0.0234800 secs]
92.354: [GC 339964K->181012K(518080K), 0.0211850 secs]
93.669: [GC 343956K->183540K(518144K), 0.0202830 secs]
95.111: [GC 346484K->186660K(518464K), 0.0202090 secs]
96.312: [GC 350244K->191292K(518528K), 0.0225580 secs]
97.747: [GC 354876K->193492K(518784K), 0.0212600 secs]
99.051: [GC 357204K->197492K(518720K), 0.0225200 secs]
100.363: [GC 361204K->200988K(518464K), 0.0269040 secs]
101.698: [GC 364444K->203916K(517696K), 0.0228630 secs]
103.198: [GC 367372K->206116K(518656K), 0.0382270 secs]
104.527: [GC 369508K->209524K(516480K), 0.0212330 secs]
105.828: [GC 372916K->213908K(518464K), 0.0230210 secs]
107.357: [GC 377236K->216428K(518528K), 0.0222090 secs]
108.750: [GC 379756K->219596K(518464K), 0.0219630 secs]
110.000: [GC 383180K->223404K(518528K), 0.0265310 secs]
111.497: [GC 386988K->226844K(518848K), 0.0233830 secs]
112.874: [GC 390812K->230564K(518784K), 0.0259740 secs]
114.241: [GC 394532K->233836K(518016K), 0.0248930 secs]
115.482: [GC 396780K->237740K(518336K), 0.0261450 secs]
116.870: [GC 400684K->239660K(517952K), 0.0233640 secs]
118.294: [GC 402092K->243444K(516992K), 0.0241060 secs]
119.593: [GC 405876K->245380K(518144K), 0.0267050 secs]
120.919: [GC 407748K->249804K(518080K), 0.0247860 secs]
122.249: [GC 412172K->252612K(518080K), 0.0244200 secs]
123.587: [GC 415044K->255340K(518080K), 0.0241980 secs]
124.888: [GC 417772K->259652K(518016K), 0.0253500 secs]
126.274: [GC 422148K->263196K(518080K), 0.0256720 secs]
127.594: [GC 425692K->266452K(517760K), 0.0258910 secs]
128.882: [GC 428500K->269756K(516928K), 0.0313120 secs]
130.156: [GC 431804K->272388K(518016K), 0.0252550 secs]
131.424: [GC 434436K->274532K(517888K), 0.0252650 secs]
132.733: [GC 436580K->277954K(517952K), 0.0235040 secs]
134.033: [GC 440066K->281210K(517888K), 0.0243370 secs]
135.316: [GC 443322K->284594K(518016K), 0.0248240 secs]
136.710: [GC 446898K->288154K(517952K), 0.0244030 secs]
138.022: [GC 450458K->291002K(518080K), 0.0238050 secs]
139.275: [GC 453562K->295794K(518016K), 0.0329510 secs]
140.604: [GC 458354K->298258K(518208K), 0.0254150 secs]
141.858: [GC 460818K->301530K(517248K), 0.0258400 secs]
143.130: [GC 464090K->303938K(518208K), 0.0255380 secs]
144.459: [GC 466434K->308786K(517568K), 0.0281720 secs]
145.840: [GC 471282K->312266K(517504K), 0.0277930 secs]
147.081: [GC 474057K->314361K(517760K), 0.0259440 secs]
148.653: [GC 476025K->318505K(517632K), 0.0266730 secs]
149.909: [GC 480169K->319977K(517760K), 0.0307430 secs]
151.175: [GC 481641K->325353K(517120K), 0.0280780 secs]
152.520: [GC 486377K->328497K(516352K), 0.0284770 secs]
153.863: [GC 489521K->331225K(517376K), 0.0273290 secs]
155.274: [GC 492057K->333145K(514560K), 0.0260700 secs]
156.505: [GC 493977K->336833K(517376K), 0.0261620 secs]
157.769: [GC 497793K->341753K(517248K), 0.0288820 secs]
159.143: [GC 502713K->344105K(517376K), 0.0346540 secs]
160.452: [GC 504937K->345761K(514432K), 0.0260100 secs]
161.803: [GC 506593K->348697K(517248K), 0.0248720 secs]
163.112: [GC 509529K->353777K(517248K), 0.0281470 secs]
163.140: [Full GC 353777K->353714K(517248K), 0.8743650 secs]
165.183: [Full GC 510399K->364189K(517248K), 0.9832890 secs]
167.384: [Full GC 510399K->367728K(517248K), 0.9913350 secs]
169.628: [Full GC 510399K->345497K(517248K), 1.3735170 secs]
172.314: [GC 506329K->352593K(516096K), 0.0339260 secs]
172.348: [Full GC 352593K->351683K(516096K), 0.8887860 secs]
174.370: [Full GC 509183K->361506K(516096K), 1.0165850 secs]
176.551: [Full GC 509183K->363812K(516096K), 0.9744650 secs]
178.669: [Full GC 509183K->356598K(516096K), 1.3040300 secs]
181.215: [Full GC 509183K->367475K(516096K), 1.0324420 secs]
183.321: [Full GC 509183K->371710K(516096K), 1.0454780 secs]
185.399: [Full GC 509183K->374235K(516096K), 1.0553360 secs]
187.592: [Full GC 509183K->369305K(516096K), 1.3696960 secs]
189.967: [Full GC 509183K->380065K(516096K), 1.0540160 secs]
191.991: [Full GC 509183K->380156K(516096K), 1.0433660 secs]
194.110: [Full GC 509183K->382892K(516096K), 1.0378660 secs]
196.520: [Full GC 509183K->370364K(516096K), 1.4272560 secs]
199.230: [Full GC 509183K->382299K(516096K), 1.0653660 secs]
201.354: [Full GC 509183K->383715K(516096K), 1.0404710 secs]
203.401: [Full GC 509183K->386794K(516096K), 1.0909490 secs]
205.448: [Full GC 509183K->372601K(516096K), 1.4715710 secs]
207.966: [Full GC 509183K->383726K(516096K), 1.1186900 secs]
210.173: [Full GC 509183K->384806K(516096K), 1.0756330 secs]
212.216: [Full GC 509183K->386530K(516096K), 1.0562950 secs]
214.127: [Full GC 509183K->374007K(516096K), 1.4921180 secs]
216.709: [Full GC 509183K->384476K(516096K), 1.1003210 secs]
218.756: [Full GC 509183K->389208K(516096K), 1.1266390 secs]
220.824: [Full GC 509183K->391178K(516096K), 1.1103640 secs]
222.917: [Full GC 509183K->376252K(516096K), 1.5572060 secs]
225.487: [Full GC 509184K->386269K(516096K), 1.1161850 secs]
227.523: [Full GC 509184K->386927K(516096K), 1.1501740 secs]
229.524: [Full GC 509184K->391321K(516096K), 1.1755640 secs]
231.611: [Full GC 509184K->376627K(516096K), 1.5773680 secs]
234.205: [Full GC 509184K->387778K(516096K), 1.1731380 secs]
236.261: [Full GC 509184K->386454K(516096K), 1.1259650 secs]
238.407: [Full GC 509184K->389030K(516096K), 1.1318580 secs]
240.507: [Full GC 509184K->375883K(516096K), 1.5678630 secs]
243.086: [Full GC 509183K->388317K(516096K), 1.1591290 secs]
245.286: [Full GC 509183K->390181K(516096K), 1.1503720 secs]
247.338: [Full GC 509183K->393652K(516096K), 1.1592330 secs]
249.365: [Full GC 509183K->374658K(516096K), 1.5552400 secs]
251.865: [Full GC 509183K->386315K(516096K), 1.1557070 secs]
253.957: [Full GC 509183K->390355K(516096K), 1.2030850 secs]
256.034: [Full GC 509183K->392298K(516096K), 1.2052920 secs]
258.205: [Full GC 509183K->377112K(516096K), 1.5773380 secs]
260.705: [Full GC 509183K->386617K(516096K), 1.2136430 secs]
262.876: [Full GC 509183K->389272K(516096K), 1.1778590 secs]
264.987: [Full GC 509183K->391560K(516096K), 1.1827200 secs]
267.081: [Full GC 509183K->376372K(516096K), 1.5812970 secs]
269.651: [Full GC 509183K->388230K(516096K), 1.2361510 secs]
271.805: [Full GC 509183K->389186K(516096K), 1.1816150 secs]
273.978: [Full GC 509183K->391897K(516096K), 1.2276850 secs]
276.104: [Full GC 509183K->378765K(516096K), 1.6700570 secs]
278.765: [Full GC 509183K->388503K(516096K), 1.2686390 secs]
280.992: [Full GC 509183K->393272K(516096K), 1.2734340 secs]
283.227: [Full GC 509183K->391551K(516096K), 1.2236950 secs]
285.284: [Full GC 509183K->376745K(516096K), 1.6195260 secs]
287.987: [Full GC 509183K->386318K(516096K), 1.2304310 secs]
290.135: [Full GC 509183K->391214K(516096K), 1.2520620 secs]
292.324: [Full GC 509183K->393559K(516096K), 1.2966160 secs]
294.528: [Full GC 509183K->379214K(516096K), 1.6974410 secs]
297.179: [Full GC 509183K->389877K(516096K), 1.2679060 secs]
299.416: [Full GC 509183K->392775K(516096K), 1.2698340 secs]
301.565: [Full GC 509183K->394602K(516096K), 1.2695340 secs]
303.685: [Full GC 509183K->380827K(516096K), 1.7415940 secs]
306.414: [Full GC 509183K->391204K(516096K), 1.3363030 secs]
308.745: [Full GC 509183K->394165K(516096K), 1.3384920 secs]
311.040: [Full GC 509183K->396009K(516096K), 1.3207300 secs]
313.259: [Full GC 509183K->381572K(516096K), 1.6983180 secs]
315.997: [Full GC 509183K->391104K(516096K), 1.3478780 secs]
318.257: [Full GC 509183K->392612K(516096K), 1.3423490 secs]
320.531: [Full GC 509183K->395986K(516096K), 1.3149160 secs]
322.718: [Full GC 509183K->382205K(516096K), 1.7162420 secs]
325.440: [Full GC 509183K->393063K(516096K), 1.3127220 secs]
327.670: [Full GC 509183K->395949K(516096K), 1.3528610 secs]
329.912: [Full GC 509183K->398638K(516096K), 1.3677230 secs]
332.150: [Full GC 509183K->381064K(516096K), 1.7550890 secs]
334.872: [Full GC 509183K->388843K(516096K), 1.3482270 secs]
337.291: [Full GC 509183K->391036K(516096K), 1.3493960 secs]
339.524: [Full GC 509183K->392954K(516096K), 1.3142390 secs]
341.677: [Full GC 509183K->380690K(516096K), 1.7369540 secs]
344.393: [Full GC 509183K->392898K(516096K), 1.3624490 secs]
346.616: [Full GC 509183K->396245K(516096K), 1.3863390 secs]
348.920: [Full GC 509183K->396817K(516096K), 1.3731110 secs]
351.262: [Full GC 509183K->381531K(516096K), 1.7824690 secs]
353.991: [Full GC 509183K->391907K(516096K), 1.3671210 secs]
356.233: [Full GC 509183K->395693K(516096K), 1.3722690 secs]
358.488: [Full GC 509183K->398409K(516096K), 1.3634450 secs]
360.790: [Full GC 509183K->382518K(516096K), 1.8140490 secs]
363.599: [Full GC 509183K->392863K(516096K), 1.4062480 secs]
366.105: [Full GC 509183K->392140K(516096K), 1.3443720 secs]
368.278: [Full GC 509183K->395305K(516096K), 1.3932590 secs]
370.540: [Full GC 509183K->382850K(516096K), 1.8546990 secs]
373.361: [Full GC 509183K->393750K(516096K), 1.4490570 secs]
375.731: [Full GC 509183K->396727K(516096K), 1.4453550 secs]
378.033: [Full GC 509183K->398892K(516096K), 1.3986780 secs]
380.345: [Full GC 509183K->384709K(516096K), 1.8209790 secs]
383.130: [Full GC 509183K->395680K(516096K), 1.4695580 secs]
385.553: [Full GC 509183K->397800K(516096K), 1.4305000 secs]
387.890: [Full GC 509183K->399804K(516096K), 1.4391880 secs]
390.159: [Full GC 509183K->385701K(516096K), 1.8693400 secs]
393.006: [Full GC 509183K->397344K(516096K), 1.4956900 secs]
395.395: [Full GC 509183K->399156K(516096K), 1.4463620 secs]
397.696: [Full GC 509183K->400749K(516096K), 1.4721360 secs]
400.014: [Full GC 509183K->385910K(516096K), 1.8883460 secs]
402.851: [Full GC 509183K->395781K(516096K), 1.4831520 secs]
405.232: [Full GC 509183K->397892K(516096K), 1.4995180 secs]
407.583: [Full GC 509183K->400187K(516096K), 1.4685140 secs]
409.056: [Full GC 402795K->386475K(516096K), 1.7622760 secs]
411.762: [Full GC 509183K->397019K(516096K), 1.4558810 secs]
414.138: [Full GC 509183K->398223K(516096K), 1.4938910 secs]
416.496: [Full GC 509183K->399782K(516096K), 1.4498060 secs]
418.841: [Full GC 509183K->385966K(516096K), 1.8652060 secs]
421.660: [Full GC 509183K->395391K(516096K), 1.4961290 secs]
424.023: [Full GC 509183K->398553K(516096K), 1.4993220 secs]
426.359: [Full GC 509183K->400742K(516096K), 1.4831500 secs]
428.834: [Full GC 509183K->386180K(516096K), 1.9114430 secs]
431.698: [Full GC 509183K->397700K(516096K), 1.5054190 secs]
434.065: [Full GC 509183K->398857K(516096K), 1.4707000 secs]
436.386: [Full GC 509183K->401235K(516096K), 1.4788970 secs]
438.687: [Full GC 509183K->387313K(516096K), 1.9363560 secs]
440.627: [Full GC 387866K->387866K(516096K), 1.3949730 secs]
442.994: [Full GC 509183K->396731K(516096K), 1.5102370 secs]
445.301: [Full GC 509183K->401646K(516096K), 1.5487810 secs]
447.747: [Full GC 509183K->386929K(516096K), 1.9297980 secs]
450.654: [Full GC 509183K->396473K(516096K), 1.4842530 secs]
453.029: [Full GC 509183K->401132K(516096K), 1.5529700 secs]
455.451: [Full GC 509183K->402216K(516096K), 1.5402450 secs]
457.759: [Full GC 509183K->389535K(516096K), 1.9486470 secs]
460.654: [Full GC 509183K->400050K(516096K), 1.5613130 secs]
463.096: [Full GC 509183K->403148K(516096K), 1.5568830 secs]
465.522: [Full GC 509183K->404263K(516096K), 1.5389980 secs]
467.879: [Full GC 509183K->390543K(516096K), 1.9502270 secs]
470.780: [Full GC 509183K->400812K(516096K), 1.5240920 secs]
473.211: [Full GC 509183K->402821K(516096K), 1.5630580 secs]
475.609: [Full GC 509183K->405855K(516096K), 1.5762670 secs]
477.994: [Full GC 509183K->390241K(516096K), 1.9757570 secs]
480.962: [Full GC 509183K->398140K(516096K), 1.5523580 secs]
483.356: [Full GC 509183K->402141K(516096K), 1.5437280 secs]
485.714: [Full GC 509183K->403879K(516096K), 1.5717020 secs]
488.130: [Full GC 509183K->390585K(516096K), 2.0231860 secs]
491.142: [Full GC 509183K->401515K(516096K), 1.6003600 secs]
493.585: [Full GC 509183K->403342K(516096K), 1.6063540 secs]
495.982: [Full GC 509183K->404542K(516096K), 1.5813290 secs]
498.372: [Full GC 509183K->390390K(516096K), 2.0139460 secs]
501.320: [Full GC 509183K->401142K(516096K), 1.5596510 secs]
503.779: [Full GC 509183K->406581K(516096K), 1.6079740 secs]
506.104: [Full GC 509183K->407176K(516096K), 1.6042450 secs]
508.544: [Full GC 509183K->391787K(516096K), 2.0266960 secs]
511.447: [Full GC 509183K->402085K(516096K), 1.6135150 secs]
513.796: [Full GC 509183K->402444K(516096K), 1.5701240 secs]
516.355: [Full GC 509183K->400553K(516096K), 1.5770940 secs]
518.745: [Full GC 509183K->393103K(516096K), 2.0610350 secs]
521.725: [Full GC 509183K->401320K(516096K), 1.5753710 secs]
524.057: [Full GC 509183K->404430K(516096K), 1.6316090 secs]
526.576: [Full GC 509183K->406210K(516096K), 1.6312170 secs]
529.046: [Full GC 509183K->392083K(516096K), 2.0759390 secs]
45428.446: [Full GC 509183K->396871K(516096K), 1.6912050 secs]
 
gojomo
Posts:34
Registered: 7/8/97
Re: Spurious OutOfMemoryError on amd64 Linux JVM -- fine on 32bit JVM   
Oct 11, 2005 1:21 PM (reply 5 of 13)  (In reply to #4 )

 
More info:

Checking the 32bit JVM gc.log, there's no analogous rapid Full GC / memory-use yoyo activity. So perhaps it's not the GC that's awry in the 64bit JVM, but something else that causes runaway memory use.

Other notes about our crawling application: it makes heavy use of BerkeleyDB Java Edition; it uses a number of memory-sensitive SoftReference-based caches.

- Gordon @ IA
 
nhaustov
Posts:15
Registered: 8/11/05
Re: Spurious OutOfMemoryError on amd64 Linux JVM -- fine on 32bit JVM   
Oct 12, 2005 1:14 AM (reply 6 of 13)  (In reply to #4 )

 
It looks like something is allocating very fast, at rate 100M/several seconds, but these objects become garbage almost immediately. And these objects end up in tenured gen. This can happen due to soft references.
You might want to run with -XX:+PrintClassHistogram and use Ctrl-\ / Ctrl-Break (on Win) to get table with number of objects. This can give you some idea which objects occupy the heap.
 
gojomo
Posts:34
Registered: 7/8/97
Re: Spurious OutOfMemoryError on amd64 Linux JVM -- fine on 32bit JVM   
Oct 12, 2005 5:30 PM (reply 7 of 13)  (In reply to #6 )

 
Yes, that's what it looks like to me as well, but (1) even with the soft caches, there's nothing that under normal/foreseeable operation would allocate objects so fast; and (2) the 32bit JVM test, identical in all respects except for JVM, doesn't ever show such wild-allocation activity.

So it may not be primarily a GC/memory-management problem -- but still appears to be something which behaves wildly different under the 64bit JVM, perhaps related to SoftReferences.

Are there any known 64bit-specific bugs or implementation anomalies relating to SoftReferences?

I'll work to get a series of class histograms during the Full GC spinning before the OOM, but all my 64bit OS test machines are currently indisposed, so it may be a few days. (I recall being frustrated trying to use -XX:+PrintClassHistogram for past issues -- I think it effectively does a Full GC before it prints, so it can't catch what that X hundred MB of flash garbage was... )

Thanks, still trying to get to the bottom of what's going on.

- Gordon @ IA
 
ramki_at_jdc
Posts:229
Registered: 6/24/03
Re: Spurious OutOfMemoryError on amd64 Linux JVM -- fine on 32bit JVM   
Dec 2, 2005 4:28 PM (reply 8 of 13)  (In reply to #7 )

 
Remember that 512 MB is 512 MB of space whether it's
32 bit or 64 bit. On 64-bit, Java references (pointers) are
twice as large. So pointer-rich objects would be larger,
and your memory footprint would be larger. A rough rule of
thumb we have found useful over a small universe of
Java applications is that a 40% increase in heap size
usually suffices to deal with this object size dilation.
In other words, -Xmx512M in 32-bit becomes roughly
-Xmx720M for 64-bit and so on.

Are there any known 64bit-specific bugs or
implementation anomalies relating to SoftReferences?

No. But remember that the policy to clear soft references is
implementation dependent (i.e. subject to change) and is often
related to the availability of memory. If memory is tight we may
clear soft references more eagerly, and your application may react
negatively (in terms of producing more allocation activity to
recompute the values that got cleared out of your caches
as a result).


I'll work to get a series of class histograms during
the Full GC spinning before the OOM, but all my 64bit
OS test machines are currently indisposed, so it may
be a few days. (I recall being frustrated trying to
use -XX:+PrintClassHistogram for past issues -- I
think it effectively does a Full GC before it prints,
so it can't catch what that X hundred MB of flash
garbage was... )

Thanks, still trying to get to the bottom of what's
going on.

I'd recommend running with a larger heap (even one twice
as big) to see if the anomaly goes away and then to tune downward.

If you can post a comparative set of of logs, one from the 32-bit
runs and one from the 64-bit runs, it would be interesting to
watch for differences and try and get to the root of the performance
pathology you are faced with. But remember to normalize the heap
sizes first, in some manner similar to the one mentioned above.

- Gordon @ IA
 
ramki_at_jdc
Posts:229
Registered: 6/24/03
Re: Spurious OutOfMemoryError on amd64 Linux JVM -- fine on 32bit JVM   
Dec 2, 2005 4:30 PM (reply 9 of 13)  (In reply to #8 )

 
Forgot to mention this: use -XX:+PrintGCDetails for more visibility into
the shape of the heap, which can affect GC behaviour quite
profoundly. I suspect, from a quick look at your logs above, that
you are getting into repeated full gc cycles because of failing scavenges.

Which version of the JDK are you running?
 
ramki_at_jdc
Posts:229
Registered: 6/24/03
Re: Spurious OutOfMemoryError on amd64 Linux JVM -- fine on 32bit JVM      
Dec 2, 2005 4:37 PM (reply 10 of 13)  (In reply to #9 )

 
Sorry for the multiple messages. I just noticed that you menton
not just the full gc's but also out of memory error's with the 64-bit jvm
when running the exact same application.

Here's what i suggest. Use -Xmx1g (as you did before),
but in addition, use -XX:MaxPermSize=128M. The latter
increases the size of the "permanent generation".
(See GC documentation on java.sun.com/docs/hotspot for
what that is.) This will ensure you do not run out of space
there.

The smoking gun would be to run with -XX:+PrintHeapAtGC,
and then look at the perm gen when the out of memory occurs
to see if we ran out of space there.

That is a likely possibility, for example, if your application allocates
lots of interned strings or many classes.

If increasing the MaxPermSize works, let us know. You may also want to
set PermSize to the size that perm space stabilizes at, a size you can
get by running with PrintHeapAtGC.
 
gojomo
Posts:34
Registered: 7/8/97
Re: Spurious OutOfMemoryError on amd64 Linux JVM -- fine on 32bit JVM   
Dec 6, 2005 4:49 PM (reply 11 of 13)  (In reply to #10 )

 
Thanks for your detailed replies.

The expansion of object size under a 64-bit JVM is definitely a big part of the problem -- in particular, we've found that BerkeleyDB-Java Edition's cache self-sizing (as of 2.0.90) underestimates the heap size of its objects, and so will overgrow its intended size on 64bit JVMs. (The default 60%-of-heap target can become much more, making an OOME almost inevitable.)

I've cut that cache size by half as a workaround, and achieved a vast improvement -- hours of runtime before an OOME instead of minutes.

I'm still using an artificially small heap -- 300MB -- to better understand the failure mode and heap-use leading up to it. When I do hit an OOME, some of the readouts are a little counter to my understanding; perhaps you can help me understand. In particular, at the time of the OOME (in a handler that catches the Error), Runtime.getRuntime().freeMemory() is returning about 35MB -- which would seem enough space to proceed at some rate rather than failing with an OOME.

I ran with the new options suggested, and at about the time of the OOME, a heap summary from the gc.log is:

{Heap before GC invocations=11179 (full 7355):
PSYoungGen total 68160K, used 34176K [0x00002aaac0eb0000, 0x00002aaac72b0000, 0x00002aaac72b0000)
eden space 34176K, 100% used [0x00002aaac0eb0000,0x00002aaac3010000,0x00002aaac3010000)
from space 33984K, 0% used [0x00002aaac3010000,0x00002aaac3010000,0x00002aaac5140000)
to space 34112K, 0% used [0x00002aaac5160000,0x00002aaac5160000,0x00002aaac72b0000)
PSOldGen total 204800K, used 204799K [0x00002aaab46b0000, 0x00002aaac0eb0000, 0x00002aaac0eb0000)
object space 204800K, 99% used [0x00002aaab46b0000,0x00002aaac0eaffd8,0x00002aaac0eb0000)
PSPermGen total 25728K, used 25673K [0x00002aaaaf4b0000, 0x00002aaab0dd0000, 0x00002aaab46b0000)
object space 25728K, 99% used [0x00002aaaaf4b0000,0x00002aaab0dc2538,0x00002aaab0dd0000)
9810.100: [Full GC [PSYoungGen: 34176K->32241K(68160K)] [PSOldGen: 204799K->204799K(204800K)] 238975K->237041K(272960K) [PSPermGen: 25673K->25673K(25728K)] GC time is exceeding GCTimeLimit of 98%
, 1.1456510 secs]
{Heap after GC invocations=11179 (full 7355):
PSYoungGen total 68160K, used 32241K [0x00002aaac0eb0000, 0x00002aaac72b0000, 0x00002aaac72b0000)
eden space 34176K, 94% used [0x00002aaac0eb0000,0x00002aaac2e2c798,0x00002aaac3010000)
from space 33984K, 0% used [0x00002aaac3010000,0x00002aaac3010000,0x00002aaac5140000)
to space 34112K, 0% used [0x00002aaac5160000,0x00002aaac5160000,0x00002aaac72b0000)
PSOldGen total 204800K, used 204799K [0x00002aaab46b0000, 0x00002aaac0eb0000, 0x00002aaac0eb0000)
object space 204800K, 99% used [0x00002aaab46b0000,0x00002aaac0eaffd8,0x00002aaac0eb0000)
PSPermGen total 25728K, used 25673K [0x00002aaaaf4b0000, 0x00002aaab0dd0000, 0x00002aaab46b0000)
object space 25728K, 99% used [0x00002aaaaf4b0000,0x00002aaab0dc2538,0x00002aaab0dd0000)
}

Is the "PSPermGen" value of "99% used" the "smoking gun" you'd mentioned?

Earlier in the same run this figure also hit 99% -- but after that the PSPermGen size grew (and then later shrank), making it seem like 99% utilization at ~22MB total isn't necessarily fatal. For example, here's an excerpt from gc.log early in the run (just 20 seconds in -- already 99% of 21606K PSPermGen used):

{Heap before GC invocations=6 (full 0):
PSYoungGen total 88576K, used 88562K [0x00002aaac0eb0000, 0x00002aaac72b0000, 0x00002aaac72b0000)
eden space 75776K, 100% used [0x00002aaac0eb0000,0x00002aaac58b0000,0x00002aaac58b0000)
from space 12800K, 99% used [0x00002aaac59b0000,0x00002aaac662cac8,0x00002aaac6630000)
to space 12800K, 0% used [0x00002aaac6630000,0x00002aaac6630000,0x00002aaac72b0000)
PSOldGen total 204800K, used 22570K [0x00002aaab46b0000, 0x00002aaac0eb0000, 0x00002aaac0eb0000)
object space 204800K, 11% used [0x00002aaab46b0000,0x00002aaab5cbab98,0x00002aaac0eb0000)
PSPermGen total 21632K, used 21606K [0x00002aaaaf4b0000, 0x00002aaab09d0000, 0x00002aaab46b0000)
object space 21632K, 99% used [0x00002aaaaf4b0000,0x00002aaab09c9ac8,0x00002aaab09d0000)
20.572: [GC [PSYoungGen: 88562K->12796K(60800K)] 111133K->38669K(265600K), 0.0348600 secs]
{Heap after GC invocations=6 (full 0):
PSYoungGen total 60800K, used 12796K [0x00002aaac0eb0000, 0x00002aaac72b0000, 0x00002aaac72b0000)
eden space 48000K, 0% used [0x00002aaac0eb0000,0x00002aaac0eb0000,0x00002aaac3d90000)
from space 12800K, 99% used [0x00002aaac6630000,0x00002aaac72af3e0,0x00002aaac72b0000)
to space 27200K, 0% used [0x00002aaac3d90000,0x00002aaac3d90000,0x00002aaac5820000)
PSOldGen total 204800K, used 25872K [0x00002aaab46b0000, 0x00002aaac0eb0000, 0x00002aaac0eb0000)
object space 204800K, 12% used [0x00002aaab46b0000,0x00002aaab5ff4048,0x00002aaac0eb0000)
PSPermGen total 21632K, used 21606K [0x00002aaaaf4b0000, 0x00002aaab09d0000, 0x00002aaab46b0000)
object space 21632K, 99% used [0x00002aaaaf4b0000,0x00002aaab09c9ac8,0x00002aaab09d0000)
}

...but then not much later, another excerpt, showing PSPermGen total size has temporarily grown to 52480K:

{Heap before GC invocations=80 (full 2):
PSYoungGen total 88000K, used 87968K [0x00002aaac0eb0000, 0x00002aaac72b0000, 0x00002aaac72b0000)
eden space 87808K, 100% used [0x00002aaac0eb0000,0x00002aaac6470000,0x00002aaac6470000)
from space 192K, 83% used [0x00002aaac6470000,0x00002aaac6498000,0x00002aaac64a0000)
to space 7680K, 0% used [0x00002aaac6b30000,0x00002aaac6b30000,0x00002aaac72b0000)
PSOldGen total 204800K, used 113449K [0x00002aaab46b0000, 0x00002aaac0eb0000, 0x00002aaac0eb0000)
object space 204800K, 55% used [0x00002aaab46b0000,0x00002aaabb57a5f8,0x00002aaac0eb0000)
PSPermGen total 52480K, used 23375K [0x00002aaaaf4b0000, 0x00002aaab27f0000, 0x00002aaab46b0000)
object space 52480K, 44% used [0x00002aaaaf4b0000,0x00002aaab0b83e20,0x00002aaab27f0000)
66.277: [GC [PSYoungGen: 87968K->7185K(93312K)] 201417K->120739K(298112K), 0.0170500 secs]
{Heap after GC invocations=80 (full 2):
PSYoungGen total 93312K, used 7185K [0x00002aaac0eb0000, 0x00002aaac72b0000, 0x00002aaac72b0000)
eden space 85632K, 0% used [0x00002aaac0eb0000,0x00002aaac0eb0000,0x00002aaac6250000)
from space 7680K, 93% used [0x00002aaac6b30000,0x00002aaac7234660,0x00002aaac72b0000)
to space 8384K, 0% used [0x00002aaac6250000,0x00002aaac6250000,0x00002aaac6a80000)
PSOldGen total 204800K, used 113553K [0x00002aaab46b0000, 0x00002aaac0eb0000, 0x00002aaac0eb0000)
object space 204800K, 55% used [0x00002aaab46b0000,0x00002aaabb5945f8,0x00002aaac0eb0000)
PSPermGen total 52480K, used 23375K [0x00002aaaaf4b0000, 0x00002aaab27f0000, 0x00002aaab46b0000)
object space 52480K, 44% used [0x00002aaaaf4b0000,0x00002aaab0b83e20,0x00002aaab27f0000)
}

But then by the time of the OOME, PSPermGen had shrunk back to 25728K -- making it seem it would have plenty of space to grow again under the 64MB default max -- if in fact available perm space was a problem.

I did notice the "GC time is exceeding GCTimeLimit of 98%" message in the first excerpt. Indeed, there are many of these, and the similar "GC time would exceed GCTimeLimit of 98%" message, for about 90 seconds around the time of the OOME. Is the appearance of a single "is exceeding" warning an indicator that an 'ergonomic' too-much-time-in-GC OOME is inevitable? Or must the too-much-time condition persist for a 'while' before an OOME occurs?

(I've suggested elsewhere -- http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6287811 -- that any such 'time-limit' OOMEs should not report as 'java heap space', and more documentation of what exactly causes this threshold to be exceeded over what sampling period would be very helpful.)

If in fact what I've seen on this most recent run is a 'too-much-time-in-GC' OOME, is there anything else in the shape of the gc.log excerpts provided that suggests GC-tuning that might help? (Besides, of course, moving to a more spacious heap.)

(I'm now rerunning my test with -XX:GCTimeLimit=100 -XX:GCHeapFreeLimit=0 to see if, despite the constant GC, it runs longer before hitting the OOME.)

At this point, the knowledge about the object size dilation and BDB cache sizing issue has helped a lot, and I'll be running more realistic tests under more generously-sized heaps soon. But I'm still curious about the specific stresses that are causing the OOME in this constrained-heap case.

Thanks for all your pointers,

- Gordon @ IA
 
ramki_at_jdc
Posts:229
Registered: 6/24/03
Re: Spurious OutOfMemoryError on amd64 Linux JVM -- fine on 32bit JVM   
Dec 8, 2005 11:20 AM (reply 12 of 13)  (In reply to #11 )

 

I ran with the new options suggested, and at about
the time of the OOME, a heap summary from the gc.log
is:

{Heap before GC invocations=11179 (full 7355):
PSYoungGen total 68160K, used 34176K
K [0x00002aaac0eb0000, 0x00002aaac72b0000,
0x00002aaac72b0000)
eden space 34176K, 100% used
ed
[0x00002aaac0eb0000,0x00002aaac3010000,0x00002aaac3010
000)
from space 33984K, 0% used
ed
[0x00002aaac3010000,0x00002aaac3010000,0x00002aaac5140
000)
to space 34112K, 0% used
ed
[0x00002aaac5160000,0x00002aaac5160000,0x00002aaac72b0
000)
PSOldGen total 204800K, used 204799K
K [0x00002aaab46b0000, 0x00002aaac0eb0000,
0x00002aaac0eb0000)
object space 204800K, 99% used
ed
[0x00002aaab46b0000,0x00002aaac0eaffd8,0x00002aaac0eb0
000)
PSPermGen total 25728K, used 25673K
K [0x00002aaaaf4b0000, 0x00002aaab0dd0000,
0x00002aaab46b0000)
object space 25728K, 99% used
ed
[0x00002aaaaf4b0000,0x00002aaab0dc2538,0x00002aaab0dd0
000)
9810.100: [Full GC [PSYoungGen:
34176K->32241K(68160K)] [PSOldGen:
204799K->204799K(204800K)] 238975K->237041K(272960K)
[PSPermGen: 25673K->25673K(25728K)] GC time is
exceeding GCTimeLimit of 98%
, 1.1456510 secs]
{Heap after GC invocations=11179 (full 7355):
PSYoungGen total 68160K, used 32241K
K [0x00002aaac0eb0000, 0x00002aaac72b0000,
0x00002aaac72b0000)
eden space 34176K, 94% used
ed
[0x00002aaac0eb0000,0x00002aaac2e2c798,0x00002aaac3010
000)
from space 33984K, 0% used
ed
[0x00002aaac3010000,0x00002aaac3010000,0x00002aaac5140
000)
to space 34112K, 0% used
ed
[0x00002aaac5160000,0x00002aaac5160000,0x00002aaac72b0
000)
PSOldGen total 204800K, used 204799K
K [0x00002aaab46b0000, 0x00002aaac0eb0000,
0x00002aaac0eb0000)
object space 204800K, 99% used
ed
[0x00002aaab46b0000,0x00002aaac0eaffd8,0x00002aaac0eb0
000)
PSPermGen total 25728K, used 25673K
K [0x00002aaaaf4b0000, 0x00002aaab0dd0000,
0x00002aaab46b0000)
object space 25728K, 99% used
ed
[0x00002aaaaf4b0000,0x00002aaab0dc2538,0x00002aaab0dd0
000)
}

Is the "PSPermGen" value of "99% used" the "smoking
gun" you'd mentioned?

On the contrary, this shows ample available space in the permanent
generation, but that we are running out of space in the regular java heap.
Note that the Old Space is full and so (almost) is Eden space. There is
roughly 2 MB left in Eden, and there's some space that's available
in the so-called Survivor spaces.

It is possible that if the JVM were able to use this space, we might
go on a while longer and fill up all available space before giving
up the ghost.

Indeed, if you use +UseSerialGC that would be the behaviour.

However, the fact that you are completely out of space in the
Java object heap and that you are spending so much time in
GC while feering so little space for subsequent allocation
indicates that your program cannot make much useful progress
with such a constrained heap.

I'd reiterate my suggestion of sizing the heap (and perm heap)
to twice the size that you used in the 64-bit case, note how big
the occupancy gets and then size it appropriately downward.

You should continue to turn off the UseGCTimeLimit flag
(-XX:-UseGCTimeLimit) so as to avoid the ergonomic OOM
I am fairly confident that in your experiment below of running
with that turned off, but with the heap still small, you will almost
certainly still see the OOM; only it will take much longer as your
application makes ever-decreasing progress to the eventual OOM
condition, as GC overhead climbs rapidly as you approach the
limit of the heap.

As regards returning a different detail message for the ergonomic
OOM, you are absolutely right that that would be diagnostic and
we are looking into how best to return this. For various reasons,
associated with how the relevant interfaces are structured and where
this condition is detected, there is some restructuring of code necessary
to return such detail to the OOM exception, and so this is taking somewhat
longer to fix than we had hoped (especially given our current priorities).

...
I did notice the "GC time is exceeding GCTimeLimit of
98%" message in the first excerpt. Indeed, there are
many of these, and the similar "GC time would exceed
GCTimeLimit of 98%" message, for about 90 seconds
around the time of the OOME. Is the appearance of a
single "is exceeding" warning an indicator that an
'ergonomic' too-much-time-in-GC OOME is inevitable?
Or must the too-much-time condition persist for a
'while' before an OOME occurs?

Yes, the condition is tested over a period of "time" (as in
some number of full gc epochs) during all of
which time it must continue to hold for such an OOME to fire.


(I've suggested elsewhere --
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=628
7811 -- that any such 'time-limit' OOMEs should not
report as 'java heap space', and more documentation
of what exactly causes this threshold to be exceeded
over what sampling period would be very helpful.)

see above.

If in fact what I've seen on this most recent run is
a 'too-much-time-in-GC' OOME, is there anything else
in the shape of the gc.log excerpts provided that
suggests GC-tuning that might help? (Besides, of
course, moving to a more spacious heap.)

see suggestion above.

(I'm now rerunning my test with -XX:GCTimeLimit=100
-XX:GCHeapFreeLimit=0 to see if, despite the constant
GC, it runs longer before hitting the OOME.)

You can use -XX:-UseGCTimeLimit instead of setting the two
limits above.
Also see my conjecture above as to how that experiment might
fare.

At this point, the knowledge about the object size
dilation and BDB cache sizing issue has helped a lot,
and I'll be running more realistic tests under more
generously-sized heaps soon. But I'm still curious
about the specific stresses that are causing the OOME
in this constrained-heap case.

Please let us know what you learn, and if we can help you any further in
this investigation.

Thanks for all your pointers,

- Gordon @ IA
 
darat55
Posts:22
Registered: 7/2/01
Re: Spurious OutOfMemoryError on amd64 Linux JVM -- fine on 32bit JVM   
Dec 12, 2005 2:39 AM (reply 13 of 13)  (In reply to #4 )

 
Hi, your GC is working really bad:
- in the starting phase you produce abou 100 MB of garbage per second
- the volume of live objects grows very fast in this phase
- once the heap is almost full your JVM has a 60% standstill quota due to GC
Either there is a memory leak or you simply need more memory.
Try the following:
-Xms512m -Xmx1500m -XX:NewSize=200m -XX:MaxNewSize=200m -XX:SurvivorRatio=8 .....
Then, please, post the GC-Log here again.
You should not bother too much about differences of 32 and 64bit JVMs but about the memory management of application and JVM in the first place.
 
This topic has 13 replies on 1 page.
Back to Forum
 
Read the Developer Forums Code of Conduct

Click to email this message Email this Topic

Edit this Topic
  
 
 
Forums Statistics
    Users Online : 22
  • Guests : 132

About Sun forums
  • Sun Forums is a large collection of user generated discussions. It is here to help you ask questions, find answers, and participate in discussions.

    Check out our guide on Getting started with Sun Forums for a full walkthrough of how to best leverage the benefits of this community.

Powered by Jive Forums