Search Options

Display Count
Sort
Preferred Language
Advanced Search

Results 1 - 5 of 5 for LHA (0.03 seconds)

  1. src/test/java/org/codelibs/fess/crawler/rule/CrawlerRuleMimeTypePatternTest.java

                + "|application/rdf\\+xml" //
                + "|application/pdf" //
                + "|application/x-freemind" //
                + "|application/lha" //
                + "|application/x-lha" //
                + "|application/x-lha-compressed" //
                + "|text/xml" //
                + "|text/xml-external-parsed-entity" //
                + "|text/html)";
    
        // HTML rule pattern from webHtmlRule in rule.xml
    Created: Tue Mar 31 13:07:34 GMT 2026
    - Last Modified: Wed Feb 04 14:24:39 GMT 2026
    - 8.7K bytes
    - Click Count (0)
  2. src/main/resources/crawler/rule.xml

    			<!-- Supported MIME type -->
    			<arg>
      "(application/xml"
    + "|application/xhtml\+xml"
    + "|application/rdf\+xml"
    + "|application/pdf"
    + "|application/x-freemind"
    + "|application/lha"
    + "|application/x-lha"
    + "|application/x-lha-compressed"
    + "|text/xml"
    + "|text/xml-external-parsed-entity"
    + "|text/html)"
    			</arg>
    		</postConstruct>
    	</component>
    
    
    Created: Tue Mar 31 13:07:34 GMT 2026
    - Last Modified: Sun Mar 29 08:21:02 GMT 2026
    - 4.6K bytes
    - Click Count (0)
  3. CLAUDE.md

    ### Protocols
    
    HTTP/HTTPS, File, FTP/FTPS, SMB/CIFS (SMB1/SMB2+), Storage (MinIO via `storage://`), S3 (`s3://`), GCS (`gcs://`)
    
    ### Content Formats
    
    Office (Word, Excel, PowerPoint), PDF, Archives (ZIP, TAR, GZ, LHA), HTML, XML, JSON, Markdown, Media metadata, Images (EXIF/IPTC/XMP), Email (EML)
    
    ---
    
    ## Architecture
    
    ### Module Structure
    
    ```
    fess-crawler-parent/
    ├── fess-crawler/              # Core framework
    Created: Sun Apr 12 03:50:13 GMT 2026
    - Last Modified: Thu Mar 12 03:39:20 GMT 2026
    - 8.1K bytes
    - Click Count (0)
  4. README.md

    #### PDFs and Images
    - PDF documents (text and metadata extraction)
    - Images (JPEG, PNG, GIF, TIFF, BMP)
    - Image metadata (EXIF, IPTC, XMP)
    
    #### Archives and Compressed Files
    - ZIP, TAR, GZ archives
    - LHA compression format
    - Nested archive extraction
    
    #### Web and Markup
    - HTML, XHTML with XPath support
    - XML documents
    - JSON and structured data
    
    #### Media Files
    - Audio formats (MP3, WAV, FLAC)
    Created: Sun Apr 12 03:50:13 GMT 2026
    - Last Modified: Sun Aug 31 05:32:52 GMT 2025
    - 15.3K bytes
    - Click Count (0)
  5. fess-crawler-lasta/src/main/resources/crawler/extractor.xml

    		<postConstruct name="addExtractor">
    			<arg>[
    				"application/pdf"
    				]</arg>
    			<arg>pdfExtractor</arg>
    		</postConstruct>
    		<postConstruct name="addExtractor">
    			<arg>[
    				"application/x-lha",
    				"application/x-lharc"
    				]</arg>
    			<arg>lhaExtractor</arg>
    		</postConstruct>
    		<postConstruct name="addExtractor">
    			<arg>[
    				"message/rfc822"
    				]</arg>
    			<arg>emlExtractor</arg>
    Created: Sun Apr 12 03:50:13 GMT 2026
    - Last Modified: Wed Feb 11 01:15:55 GMT 2026
    - 50.4K bytes
    - Click Count (0)
Back to Top