Search Options

Display Count
Sort
Preferred Language
Advanced Search

Results 1 - 10 of 29 for tika (0.02 seconds)

  1. src/main/resources/tika.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <properties>
      <parsers>
        <parser class="org.apache.tika.parser.DefaultParser">
          <parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
        </parser>
      </parsers>
    Created: Tue Mar 31 13:07:34 GMT 2026
    - Last Modified: Mon Feb 24 12:59:41 GMT 2020
    - 241 bytes
    - Click Count (0)
  2. fess-crawler/pom.xml

    			<artifactId>tika-parser-html-module</artifactId>
    			<version>${tika.version}</version>
    		</dependency>
    		<dependency>
    			<groupId>org.apache.tika</groupId>
    			<artifactId>tika-parser-image-module</artifactId>
    			<version>${tika.version}</version>
    		</dependency>
    		<dependency>
    			<groupId>org.apache.tika</groupId>
    			<artifactId>tika-parser-mail-module</artifactId>
    Created: Sun Apr 12 03:50:13 GMT 2026
    - Last Modified: Sun Mar 29 01:35:48 GMT 2026
    - 12.5K bytes
    - Click Count (0)
  3. fess-crawler-lasta/src/main/resources/crawler/extractor.xml

    				"application/x-js-taro",
    				"application/x-tika-msoffice",
    				"application/x-tika-msoffice-embedded",
    				"application/x-tika-msoffice-embedded;format=ole10_native",
    				"application/x-tika-msoffice-embedded;format=comp_obj",
    				"application/x-tika-msworks-spreadsheet",
    				"application/x-tika-ooxml",
    				"application/x-tika-ooxml-protected",
    				"application/x-tika-staroffice",
    				"application/x-uc2-compressed",
    Created: Sun Apr 12 03:50:13 GMT 2026
    - Last Modified: Wed Feb 11 01:15:55 GMT 2026
    - 50.4K bytes
    - Click Count (0)
  4. CLAUDE.md

    - **License**: Apache 2.0
    - **DI**: LastaFlute DI
    - **Repo**: https://github.com/codelibs/fess-crawler
    
    ### Tech Stack
    
    - **HTTP**: Apache HttpComponents 4.5+ and 5.x (switchable)
    - **Extraction**: Apache Tika, POI, PDFBox
    - **Testing**: JUnit 4, UTFlute, Mockito, Testcontainers
    - **Storage**: In-memory (default), OpenSearch (optional)
    - **Cloud**: AWS SDK v2 (S3), Google Cloud Storage
    
    ### Protocols
    
    Created: Sun Apr 12 03:50:13 GMT 2026
    - Last Modified: Thu Mar 12 03:39:20 GMT 2026
    - 8.1K bytes
    - Click Count (0)
  5. src/main/java/org/codelibs/fess/crawler/transformer/FessStandardTransformer.java

        public Logger getLogger() {
            return logger;
        }
    
        /**
         * Gets the appropriate extractor for the given response data.
         * Selects an extractor based on the MIME type or falls back to the Tika extractor.
         *
         * @param responseData the response data containing the document to extract
         * @return the extractor instance for processing the document
    Created: Tue Mar 31 13:07:34 GMT 2026
    - Last Modified: Fri Nov 28 16:29:12 GMT 2025
    - 3.8K bytes
    - Click Count (0)
  6. src/test/java/org/codelibs/fess/crawler/helper/FessMimeTypeHelperTest.java

            try (InputStream is = new ByteArrayInputStream(SQL_REM_CONTENT.getBytes(StandardCharsets.UTF_8))) {
                final String contentType = mimeTypeHelper.getContentType(is, "test.sql");
                // Without override, Tika detects based on content+filename
                assertNotNull(contentType);
            }
        }
    
        @Test
        public void test_init_nullConfig() throws IOException {
    Created: Tue Mar 31 13:07:34 GMT 2026
    - Last Modified: Sat Jan 24 09:06:33 GMT 2026
    - 12.1K bytes
    - Click Count (0)
  7. ADDING_NEW_LANGUAGE.md

    3. **Fallback**: English (from `fess_label.properties` and `fess_message.properties`)
    
    ### Document Language Detection
    
    During crawling and indexing, Fess:
    
    1. Detects language from document content using Apache Tika
    2. Validates against `supported.languages` list
    3. Creates language-specific fields (e.g., `content_ja`, `title_en`, `content_sv`)
    4. Applies language-specific analyzers for better search results
    
    Created: Tue Mar 31 13:07:34 GMT 2026
    - Last Modified: Thu Nov 06 11:36:30 GMT 2025
    - 10.4K bytes
    - Click Count (1)
  8. pom.xml

    			<groupId>com.ibm.icu</groupId>
    			<artifactId>icu4j</artifactId>
    			<version>${icu4j.version}</version>
    		</dependency>
    		<dependency>
    			<groupId>org.apache.tika</groupId>
    			<artifactId>tika-langdetect-optimaize</artifactId>
    			<version>${tika.version}</version>
    			<exclusions>
    				<exclusion>
    					<groupId>javax.annotation</groupId>
    					<artifactId>javax.annotation-api</artifactId>
    				</exclusion>
    Created: Tue Mar 31 13:07:34 GMT 2026
    - Last Modified: Thu Mar 19 07:04:54 GMT 2026
    - 49.9K bytes
    - Click Count (0)
  9. src/main/resources/fess_indices/fess/lv/stopwords.txt

    esam
    esat 
    būšu     
    būsi
    būs
    būsim
    būsiet
    tikt
    tiku
    tiki
    tika
    tikām
    tikāt
    tieku
    tiec
    tiek
    tiekam
    tiekat
    tikšu
    tiks
    tiksim
    tiksiet
    tapt
    tapi
    tapāt
    topat
    tapšu
    tapsi
    taps
    tapsim
    tapsiet
    kļūt
    kļuvu
    kļuvi
    kļuva
    kļuvām
    kļuvāt
    kļūstu
    kļūsti
    kļūst
    kļūstam
    kļūstat
    kļūšu
    kļūsi
    kļūs
    Created: Tue Mar 31 13:07:34 GMT 2026
    - Last Modified: Thu Jul 19 06:31:02 GMT 2018
    - 1.2K bytes
    - Click Count (0)
  10. README.md

    ## Technology Stack
    
    - **Java**: 21+ (requires Java 21 or higher)
    - **Build System**: Maven 3.x
    - **DI Container**: LastaFlute DI
    - **HTTP Client**: Apache HttpComponents
    - **Content Extraction**: Apache Tika, Apache POI, PDFBox
    - **Testing**: JUnit 4, UTFlute, Testcontainers
    - **Storage Backends**: OpenSearch, Memory-based
    
    ## Quick Start
    
    ### Prerequisites
    
    - Java 21 or higher
    - Maven 3.6 or higher
    
    Created: Sun Apr 12 03:50:13 GMT 2026
    - Last Modified: Sun Aug 31 05:32:52 GMT 2025
    - 15.3K bytes
    - Click Count (0)
Back to Top