- Sort Score
- Num 10 results
- Language All
Results 1 - 10 of 27 for tika (0.02 seconds)
The search processing time has exceeded the limit. The displayed results may be partial.
-
src/main/resources/tika.xml
<?xml version="1.0" encoding="UTF-8"?> <properties> <parsers> <parser class="org.apache.tika.parser.DefaultParser"> <parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/> </parser> </parsers>
Created: Tue Mar 31 13:07:34 GMT 2026 - Last Modified: Mon Feb 24 12:59:41 GMT 2020 - 241 bytes - Click Count (0) -
fess-crawler/pom.xml
<artifactId>tika-parser-html-module</artifactId> <version>${tika.version}</version> </dependency> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parser-image-module</artifactId> <version>${tika.version}</version> </dependency> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parser-mail-module</artifactId>
Created: Sun Apr 12 03:50:13 GMT 2026 - Last Modified: Sun Mar 29 01:35:48 GMT 2026 - 12.5K bytes - Click Count (0) -
fess-crawler-lasta/src/main/resources/crawler/extractor.xml
"application/x-js-taro", "application/x-tika-msoffice", "application/x-tika-msoffice-embedded", "application/x-tika-msoffice-embedded;format=ole10_native", "application/x-tika-msoffice-embedded;format=comp_obj", "application/x-tika-msworks-spreadsheet", "application/x-tika-ooxml", "application/x-tika-ooxml-protected", "application/x-tika-staroffice", "application/x-uc2-compressed",
Created: Sun Apr 12 03:50:13 GMT 2026 - Last Modified: Wed Feb 11 01:15:55 GMT 2026 - 50.4K bytes - Click Count (0) -
CLAUDE.md
- **License**: Apache 2.0 - **DI**: LastaFlute DI - **Repo**: https://github.com/codelibs/fess-crawler ### Tech Stack - **HTTP**: Apache HttpComponents 4.5+ and 5.x (switchable) - **Extraction**: Apache Tika, POI, PDFBox - **Testing**: JUnit 4, UTFlute, Mockito, Testcontainers - **Storage**: In-memory (default), OpenSearch (optional) - **Cloud**: AWS SDK v2 (S3), Google Cloud Storage ### Protocols
Created: Sun Apr 12 03:50:13 GMT 2026 - Last Modified: Thu Mar 12 03:39:20 GMT 2026 - 8.1K bytes - Click Count (0) -
src/main/java/org/codelibs/fess/crawler/transformer/FessStandardTransformer.java
public Logger getLogger() { return logger; } /** * Gets the appropriate extractor for the given response data. * Selects an extractor based on the MIME type or falls back to the Tika extractor. * * @param responseData the response data containing the document to extract * @return the extractor instance for processing the document
Created: Tue Mar 31 13:07:34 GMT 2026 - Last Modified: Fri Nov 28 16:29:12 GMT 2025 - 3.8K bytes - Click Count (0) -
src/test/java/org/codelibs/fess/crawler/helper/FessMimeTypeHelperTest.java
try (InputStream is = new ByteArrayInputStream(SQL_REM_CONTENT.getBytes(StandardCharsets.UTF_8))) { final String contentType = mimeTypeHelper.getContentType(is, "test.sql"); // Without override, Tika detects based on content+filename assertNotNull(contentType); } } @Test public void test_init_nullConfig() throws IOException {
Created: Tue Mar 31 13:07:34 GMT 2026 - Last Modified: Sat Jan 24 09:06:33 GMT 2026 - 12.1K bytes - Click Count (0) -
ADDING_NEW_LANGUAGE.md
3. **Fallback**: English (from `fess_label.properties` and `fess_message.properties`) ### Document Language Detection During crawling and indexing, Fess: 1. Detects language from document content using Apache Tika 2. Validates against `supported.languages` list 3. Creates language-specific fields (e.g., `content_ja`, `title_en`, `content_sv`) 4. Applies language-specific analyzers for better search results
Created: Tue Mar 31 13:07:34 GMT 2026 - Last Modified: Thu Nov 06 11:36:30 GMT 2025 - 10.4K bytes - Click Count (1) -
pom.xml
<groupId>com.ibm.icu</groupId> <artifactId>icu4j</artifactId> <version>${icu4j.version}</version> </dependency> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-langdetect-optimaize</artifactId> <version>${tika.version}</version> <exclusions> <exclusion> <groupId>javax.annotation</groupId> <artifactId>javax.annotation-api</artifactId> </exclusion>
Created: Tue Mar 31 13:07:34 GMT 2026 - Last Modified: Thu Mar 19 07:04:54 GMT 2026 - 49.9K bytes - Click Count (0) -
src/main/resources/fess_indices/fess/lv/stopwords.txt
esam esat būšu būsi būs būsim būsiet tikt tiku tiki tika tikām tikāt tieku tiec tiek tiekam tiekat tikšu tiks tiksim tiksiet tapt tapi tapāt topat tapšu tapsi taps tapsim tapsiet kļūt kļuvu kļuvi kļuva kļuvām kļuvāt kļūstu kļūsti kļūst kļūstam kļūstat kļūšu kļūsi kļūs
Created: Tue Mar 31 13:07:34 GMT 2026 - Last Modified: Thu Jul 19 06:31:02 GMT 2018 - 1.2K bytes - Click Count (0) -
README.md
## Technology Stack - **Java**: 21+ (requires Java 21 or higher) - **Build System**: Maven 3.x - **DI Container**: LastaFlute DI - **HTTP Client**: Apache HttpComponents - **Content Extraction**: Apache Tika, Apache POI, PDFBox - **Testing**: JUnit 4, UTFlute, Testcontainers - **Storage Backends**: OpenSearch, Memory-based ## Quick Start ### Prerequisites - Java 21 or higher - Maven 3.6 or higher
Created: Sun Apr 12 03:50:13 GMT 2026 - Last Modified: Sun Aug 31 05:32:52 GMT 2025 - 15.3K bytes - Click Count (0)