participate


Java Programming - problem in the package or classpath
<<   Back to Forum  |   Give us Feedback
This topic has 3 replies on 1 page.
daniel-jeem
Posts:10
Registered: 10/13/08
problem in the package or classpath   
Nov 13, 2008 1:19 AM

 
Hello every body

I want to add a module at "jhove" which is open source: http://hul.harvard.edu/jhove/index.html
When I launch jhove like this: ./jhove -k file.pdf, it shows me more informations about this pdf.
I want that it re-knows me anathor type of file.(warc format)
Then I have to write a new module.

I try to do it following the link: http://hul.harvard.edu/jhove/writingamodule.html
I added my java code in :/home/jhove/classes/edu/harvard/hul/ois/jhove/module/Project/
The compilation works well.

I add : EXTRA_JARS=/jhove/classes/edu/harvard/hul/ois/jhove/module/Project/WarcModule.jar
At the file: jhove.tmpl

I also add this code at jhove.conf like they montionned in document:
<module>
<class>WarcModule</class>
</module>

I also try with this solution:
<module>
<class>edu.harvard.hul.ois.jhove.module.Project.MonModule</class>
</module>

But I have this error:
edu.harvard.hul.ois.jhove.JhoveException: cannot instantiate module: WarcModule
at edu.harvard.hul.ois.jhove.JhoveBase.init(Unknown Source)
at Jhove.main(Unknown Source)

I think that is problem in the classpath.
There is my code:

package edu.harvard.hul.ois.jhove.module.Project;
import java.io.File;  
import java.util.*;
import java.io.*;
import edu.harvard.hul.ois.jhove.*;
import edu.harvard.hul.ois.jhove.ModuleBase;
import edu.harvard.hul.ois.jhove.RepInfo;
import org.archive.io.warc.*;
 
 
public class MonModule extends ModuleBase 
{
	    private static final String NAME = "Warc-hul";
	    private static final String RELEASE = "1.7";
	    private static final int [] DATE = {2008, 9, 23};
	    private static final String [] FORMAT = { "warc"};
	    private static final String COVERAGE = 
	        "PDF 1.0-1.6; PDF/X-1 (ISO 15930-1:2001), X-1a (ISO 15930-4:2003), " +
		"X-2 (ISO 15930-5:2003), and X-3 (ISO 15930-6:2003); Tagged PDF; " +
		"Linearized PDF; PDF/A (ISO/CD 19005-1)";
	    private static final String [] MIMETYPE = {"application/warc"};
	    private static final String WELLFORMED = "A PDF file is " +
	        "well-formed if it meets the criteria defined in Chapter " +
	        "3 of the PDF Reference 1.6 (5th edition, 2004)";
	    private static final String VALIDITY = null;
	    private static final String REPINFO = null;
	    private static final String NOTE = "This module does *not* validate " +
		"data within content streams (including operators) or encrypted data";
	    private static final String RIGHTS = "Copyright 2003-2007 by JSTOR and " +
		"the President and Fellows of Harvard College. " +
		"Released under the GNU Lesser General Public License.";
	    private static final String ENCRYPTED = "<May be encrypted>";
   
	    
	   
	    
	    
protected MonModule(String NAME, String RELEASE, int[] DATE, String[] FORMAT, String COVERAGE, String[] MIMETYPE, String WELLFORMED, String VALIDITY, String REPINFO, String NOTE, String RIGHTS, boolean x)
	 {
	    super  (NAME, RELEASE, DATE, FORMAT, COVERAGE, MIMETYPE, WELLFORMED,VALIDITY, REPINFO, NOTE, RIGHTS, true);
	     
	  }
 
 
/******************************************************************
 * PRIVATE INSTANCE FIELDS.
 ******************************************************************/
 
/* First 6 bytes of file */
protected byte _sig[];
 
/* Checksummer object */
protected Checksummer _ckSummer;
 
/* XMP property */
protected Property _xmpProp;
 
/* Input stream wrapper which handles checksums */
protected ChecksumInputStream _cstream;
 
/* Data input stream wrapped around _cstream */
protected DataInputStream _dstream;
 
/* Flag for presence of global color table */
protected boolean _globalColorTableFlag;
 
/* Size of global color table */
protected int _globalColorTableSize;
 
/* Count of graphic control extensions preceding
 * something to modify */
protected int _gceCounter;
 
/* Top-level metadata property */
protected Property _metadata;
 
/* Blocks list property */
protected List _blocksList;
 
/* Total count of graphic and plain text extension blocks */
protected int _numGraphicBlocks;
 
 
public void checkSignatures (File file,  InputStream stream, RepInfo info) 
throws IOException
{
 int sigBytes[] = { 'W', 'A', 'R', 'C'};
 int i;
 int ch;
 try {
    _dstream = null;
    _dstream = getBufferedDataStream (stream, _je != null ?
                _je.getBufferSize () : 0);
    for (i = 0; i < 4; i++) {
        ch = readUnsignedByte(_dstream, this);
        if (ch != sigBytes[i]) {
            info.setWellFormed (false);
            return;
        }
    }
    info.setModule (this);
    info.setFormat (_format[0]);
    info.setMimeType (_mimeType[0]);
    info.setSigMatch(_name);
}
catch (Exception e) {
    // Reading a very short file may take us here.
    info.setWellFormed (false);
    return;
}
}


Please can any one help me.
 
swmtgoet_x
Posts:339
Registered: 8/20/08
Re: problem in the package or classpath   
Nov 13, 2008 1:43 AM (reply 1 of 3)  (In reply to original post )

 
Hi,

what parameters are you actually trying to hand in to the superclass constructor?

If it is the values listed as private static final member variables, why are you listing them as parameters to your own constructor?

I suspect that this jhove is trying to instantiate your class relying on the fact that you at least provide a default construtor (i.e. no parameters, preferrably public) - but you do not provide one.

Having to call a special superclass constructor does not have to mean you must not change the parameters your constructor offers.

Bye.
 
daniel-jeem
Posts:10
Registered: 10/13/08
Re: problem in the package or classpath   
Nov 13, 2008 3:26 AM (reply 2 of 3)  (In reply to #1 )

 
Thanks swmtgoet_x for help.
It dont display me the error.
Regards.
 
prometheuzz
Posts:15,167
Registered: 7/28/05
Re: problem in the package or classpath   
Nov 14, 2008 2:11 PM (reply 3 of 3)  (In reply to #2 )

 
Daniel, not sure if everything is clear now, but the way you are validating a WARC file is seriously flawed! I strongly suggest you use an existing WARC tool to do the actual parsing/validating.
I've seen more and more people on JHove's mailing list posting problems on how to create their own modules, so I decided to write a small how-to-write-your-own-module-guide.
Perhaps you'd like to give it a shot. I'll also post it on the mailing list.

================================================================================

This is a step-by-step tutorial that will enable you to compile and run a
custom made module for Harvard's idetification tool JHove [1]. This is not
a tutorial on Java programming! For a thorough explanation of this tool and
extensive documentation, please see:
http://hul.harvard.edu/jhove/documentation.html


================================================================================
= Setp 1 =

Download and unzip JHove 1.1f [2]. The unzipped folder will be called
JHOVE_HOME from now on.

================================================================================
= Step 2 =

In this example I will construct a very elementary ARC module and will be using
Heritrix' [3] ARCUtils class, so download Heritrix 1.14 [4] and unzip it. After
unzipping, locate the file 'heritrix-1.14.1.jar' (it might be a different
version) and place it in the directory 'JHOVE_HOME/bin'.

================================================================================
= Step 3 =

Create a folder 'JHOVE_HOME/bin/test' and create a new file in it called
'ArcModule.java'. Paste the following contents in that file:

package test;
 
import java.io.IOException;
import java.io.InputStream;
import edu.harvard.hul.ois.jhove.ModuleBase;
import edu.harvard.hul.ois.jhove.RepInfo;
import org.archive.io.arc.ARCUtils;
 
public class ArcModule extends ModuleBase {
 
    private static final String NAME = "ARC-hul";
    private static final String RELEASE = "0.1";
    private static final int[] DATE = {2008, 11, 11};
    private static final String[] FORMAT = {"ARC"};
    private static final String COVERAGE = null;
    private static final String[] MIMETYPE = {"application/arc"};
    private static final String WELLFORMED = "...";
    private static final String VALIDITY = null;
    private static final String REPINFO = "...";
    private static final String NOTE = null;
    private static final String RIGHTS = "GNU LGPL";
    
    public ArcModule() {
        super (NAME, RELEASE, DATE, FORMAT, COVERAGE, MIMETYPE, WELLFORMED,
                VALIDITY, REPINFO, NOTE, RIGHTS, false);
        // Optionally set some Agent information: see the other Modules how
        // this can be done.
    }
 
    @Override
    public int parse(InputStream stream, RepInfo info, int parseIndex) {
        info.setModule(this);
        boolean wellFormed = false;
        try {
            if(ARCUtils.testCompressedARCStream(stream)) {
                wellFormed = true;
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        info.setWellFormed(wellFormed);
        return 0;
    }
}


================================================================================
= Step 4 =

Compile this ArcModule by opening a shell (command prompt) and cd-ing to
'JHOVE_HOME/bin' and executing the following command:


*nix & Mac OS:
javac -cp .:JhoveApp.jar:heritrix-1.14.1.jar test/ArcModule.java


Windows:
javac -cp .;JhoveApp.jar;heritrix-1.14.1.jar test\ArcModule.java


(Note, if you're using JDK 1.4, replace '-cp' with '-classpath')

You shouldn't get any messages if all goes well.

================================================================================
= Step 5 =

Open the file 'JHOVE_HOME/conf/jhove.conf' and add the following right beneath
the line
<bufferSize>?????</bufferSize>
, where ????? is a number:

<module>
  <class>test.ArcModule</class>
</module>


Save the file.

================================================================================
= Step 6 =

Create a folder called 'JHOVE_HOME/arcs' and copy two compressed ARC files in
them. If you don't have any compressed ARC files laying around, you can
download two small [5]. The file 'A.arc.gz' is a valid compressed ARC file,
while 'B.arc.gz' is the same as 'A.arc.gz' but I removed the ARC-header from
the latter, making it an invalid ARC file.

================================================================================
= Step 7 =

Open a shell, cd to 'JHOVE_HOME/bin' and execute the following command:

*nix & Mac OS:
java -cp .:JhoveApp.jar:heritrix-1.14.1.jar Jhove -c ../conf/jhove.conf -m ARC-hul ../arcs


Windows:
java -cp .;JhoveApp.jar;heritrix-1.14.1.jar Jhove -c ..\conf\jhove.conf -m ARC-hul ..\arcs


Which will cause JHove to scan everything that is in 'JHOVE_HOME/arcs' folder
and throws it through your newly create ArcModule. The output will be as
follows:

Jhove (Rel. 1.1, 2008-02-21)
 Date: 2008-11-14 22:29:51 CET
 RepresentationInformation: .../jhove/arcs/A.arc.gz
  ReportingModule: ARC-hul, Rel. 0.1 (2008-11-11)
  LastModified: 2008-08-24 20:23:20 CEST
  Size: 130870
  Status: Well-Formed and valid
 
 RepresentationInformation: .../jhove/arcs/B.arc.gz
  ReportingModule: ARC-hul, Rel. 0.1 (2008-11-11)
  LastModified: 2008-11-14 21:53:15 CET
  Size: 116136
  Status: Not well-formed


Which is the expected result: A is valid and B is not.

================================================================================
= Final remarks =

As I said, this is not a programming tutorial, nor is it the best way to
validate ARC files: more meta data should be extracted from the file. But I
leave that for you. This was only a guide to show you how to get started on
writing and running your own modules. You can have a look at the source
of the existing modules to see the "best practices" w.r.t. writing a module.

Best of luck!

Regards,

Bart.

================================================================================
= References =

[1] http://hul.harvard.edu
[2] http://hul.harvard.edu/jhove/download.html
[3] http://crawler.archive.org
[4] http://sourceforge.net/project/showfiles.php?group_id=73833&package_id=73980
[5] http://iruimte.nl/arcs
 
This topic has 3 replies on 1 page.
Back to Forum
 
Read the Developer Forums Code of Conduct

Click to email this message Email this Topic

Edit this Topic
  
 
 
Forums Statistics
    Users Online : 28
  • Guests : 129

About Sun forums
  • Oracle Forums is a large collection of user generated discussions. It is here to help you ask questions, find answers, and participate in discussions.

    Check out our guide on Getting started with Oracle Forums for a full walkthrough of how to best leverage the benefits of this community.

Powered by Jive Forums