I’m running Weka 3.5.3 on a server with 4 GB memory with sampled-data set as large as 24.000 rows with 19 fields. When i run a simple J4.8 classification, the JVM showed-up an out-of-memory exception.
As a workaround, i fiddle RunWeka.ini and and create my own windows-based short-cut as follows (i needed the LibSVM, but if you dont require libSVM then you can simply remove libsvm.jar from the line):
C:WINDOWSsystem32javaw.exe -Xmx1400m -classpath “%CLASSPATH%;weka.jar;libsvm.jar” weka.gui.GUIChooser
but before that i make sure that i have set the correct CLASSPATH by pasting this command line below into windows prompt :
set CLASSPATH=C:Program FilesWeka-3-5weka.jar;%CLASSPATH%
Now I have succesfully increased the Java Virtual memory but i wasn’t able to run J4.8 because another out-of-memory exception was thrown again. I have tried to increase the Xmx option to 1600M but JVM said that it cant allocate the required memory.
After googling around i have found out 2 interesting things/facts that i have learn the hard way:
1. Windows 32-bit have a memory limit of 4GB because of max memory addressing issue on a 32-bit system, that’s minus the shared memory you use for VGA display. So basically you will only be able to access 3.8GB at max. If you have larger memory than 4 GB (let say 10GB), you would need to add /PAE to allow your windows to utilize this memory (convert it into 36-bit).
2. Windows 32-bit default setting allocates 2GB user-space and 2GB kernel-space. User-space is the amount of memory that can be allocated to 1 process (javaw.exe). However windows setting allows you to change the user-space allocation to 3GB. This can be done by changing windows boot option to /PAE /3GB, then reboot the box. This option is accessible by left-clicking on My Computer -> Properties -> Advanced -> Startup and Recovery. You will see the box like below :
(click edit and notepad should appear, add the parameters, save, verify, reboot, and you should go fine..)
3. Although i have activated point 2 settings above, the JRE doesn’t seem to be able to allocate VM more than 1.6 GB. This is the major disadvantage of building data mining application under JAVA and also a typical JRE problem on 32-bit machine but this wasn’t the case for Sun-based O/S.
To break this memory barrier, i use Bea JRockIt 5 (30MB) :
http://download2.bea.com/pub/jrockit/50/jrockit-R27.5.0-jre1.5.0_14-windows-ia32.exe
Please note that the trial version would run only for 1 hours.
After i installed it on my machine, i change the shortcut on point 1 (-Xmx) to 2800M (i think this is the max number for 32-Bit), double click on it, then voila.. It works!
I have also test this with Weka 3.5.7 and it runs smoothly.
So basically if you want to run Weka for more that 1.6 GB memory you need to:
1. Activate the -Xmx settings
2. Activate /3GB on your windows
3. Install JRock IT
Now i was able to run the J4.8 classification without OOM warning. However for larger data-set i still dont have any clue on how to run it. Perhaps in that case, running Weka on SPARC box or running parallel-Weka or compiling weka source code to windows DLL ( using JET Excelsior ) might save you.
The result of memory utilization (> 2GB limit) can be viewed by clicking on the picture below :
You might be interested in Debellor (www.debellor.org). It is an open source framework for scalable data mining – processing large volumes of data without OutOfMemory exceptions – this is possible thanks to stream-oriented architecture, where algorithms may be connected in a pipeline and process data samples on-the-fly, without buffering in memory. A part of Weka algorithms are integrated with Debellor, so you can use them in connection with your own implementations.
Thank you mwojnars,
I have read the presentation and the paper and so far it looks promising… 🙂
I’ll give a closer look at your work (Debellor) and hopefully I can test & review it using my 1M-rows dataset,…
PS: Debellor w/ source-code would be better because more people can test and improve your work…. 🙂
Good work!
Hi!
You’ll find Debellor’s source code in the same JAR as byte code – debellor.jar – if you open it as a compressed file you’ll find .java and .class files together. Another way is to view or download sources directly from SVN (password not needed):
https://debellor.svn.sourceforge.net/svnroot/debellor
Hello
Thanks for your help. I managed to change the -Xmx value only by installing JRockIT (step 3). I was running Weka on Windows7 32-bit, and I couldn’t make the value of -Xmx larger then 1066m. After I installed JRockIT, I modified this value (in RunWeka.ini) to larger extents and the machine started just fine. Hope this will help 🙂
how big is you physical memory… ?
I had 4GB back then (waste of MB on 32 bit system).. and the memory level stayed consistently at 2800MB in my case..
I have 3GB of memory.
It seems that after I installed JRockIT and it had overridden the public Java Runtime Environment (JRE) I am able to start Weka with the desired amount of memory.
The problem is that now the classifier isn’t working properly, it doesn’t do anything. I let it run several hours and I had no answer.