- Sort Score
- Result 10 results
- Languages All
Results 11 - 20 of 23 for Tika (0.03 sec)
-
fess-crawler-lasta/src/main/resources/crawler/extractor.xml
"application/x-texinfo", "application/x-tika-msoffice", "application/x-tika-msoffice-embedded", "application/x-tika-msoffice-embedded;format=ole10_native", "application/x-tika-msoffice-embedded;format=comp_obj", "application/x-tika-msworks-spreadsheet", "application/x-tika-ooxml", "application/x-tika-ooxml-protected", "application/x-tika-staroffice", "application/x-uc2-compressed",
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Nov 23 03:46:53 UTC 2025 - 50.1K bytes - Viewed (0) -
CLAUDE.md
- **Build**: Maven 3.x - **License**: Apache 2.0 - **DI**: LastaFlute DI - **Repo**: https://github.com/codelibs/fess-crawler ### Tech Stack - **HTTP**: Apache HttpComponents 4.5+ - **Extraction**: Apache Tika 3.0+, POI 5.3+, PDFBox 3.0+ - **Testing**: JUnit 4, UTFlute, Mockito 5.7.0 - **Storage**: In-memory (default), OpenSearch (optional) ### Protocols - **HTTP/HTTPS**: Full crawling, cookies, auth, robots.txt
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Fri Nov 28 17:31:34 UTC 2025 - 10.7K bytes - Viewed (0) -
src/main/java/org/codelibs/fess/crawler/transformer/FessStandardTransformer.java
public Logger getLogger() { return logger; } /** * Gets the appropriate extractor for the given response data. * Selects an extractor based on the MIME type or falls back to the Tika extractor. * * @param responseData the response data containing the document to extract * @return the extractor instance for processing the document
Registered: Sat Dec 20 09:19:18 UTC 2025 - Last Modified: Fri Nov 28 16:29:12 UTC 2025 - 3.8K bytes - Viewed (0) -
ADDING_NEW_LANGUAGE.md
3. **Fallback**: English (from `fess_label.properties` and `fess_message.properties`) ### Document Language Detection During crawling and indexing, Fess: 1. Detects language from document content using Apache Tika 2. Validates against `supported.languages` list 3. Creates language-specific fields (e.g., `content_ja`, `title_en`, `content_sv`) 4. Applies language-specific analyzers for better search results
Registered: Sat Dec 20 09:19:18 UTC 2025 - Last Modified: Thu Nov 06 11:36:30 UTC 2025 - 10.4K bytes - Viewed (0) -
README.md
## Technology Stack - **Java**: 21+ (requires Java 21 or higher) - **Build System**: Maven 3.x - **DI Container**: LastaFlute DI - **HTTP Client**: Apache HttpComponents - **Content Extraction**: Apache Tika, Apache POI, PDFBox - **Testing**: JUnit 4, UTFlute, Testcontainers - **Storage Backends**: OpenSearch, Memory-based ## Quick Start ### Prerequisites - Java 21 or higher - Maven 3.6 or higher
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Sun Aug 31 05:32:52 UTC 2025 - 15.3K bytes - Viewed (0) -
src/main/java/org/codelibs/fess/job/CrawlJob.java
Registered: Sat Dec 20 09:19:18 UTC 2025 - Last Modified: Fri Nov 28 16:29:12 UTC 2025 - 19.6K bytes - Viewed (0) -
fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/TikaExtractorTest.java
final String content = extractData.getContent(); CloseableUtil.closeQuietly(in); logger.info(content); assertTrue(content.contains("テスト")); } // TODO tika needs to support pdfbox 2.0 // public void test_getTika_pdf() { // final InputStream in = ResourceUtil // .getResourceAsStream("extractor/test.pdf");
Registered: Sat Dec 20 11:21:39 UTC 2025 - Last Modified: Thu Aug 07 02:55:08 UTC 2025 - 30.6K bytes - Viewed (0) -
src/main/resources/fess_config.properties
# Type of hot thread monitoring (e.g., cpu). crawler.hotthread.type=cpu # Metadata fields to exclude from document content. crawler.metadata.content.excludes=resourceName,X-Parsed-By,Content-Encoding.*,Content-Type.*,X-TIKA.*,X-FESS.* # Mapping for document metadata names. crawler.metadata.name.mapping=\ title=title:string\n\ Title=title:string\n\ dc:title=title:string\n\ # html
Registered: Sat Dec 20 09:19:18 UTC 2025 - Last Modified: Thu Dec 11 09:47:03 UTC 2025 - 54.8K bytes - Viewed (0) -
src/main/java/org/codelibs/fess/mylasta/direction/FessConfig.java
/** The key of the configuration. e.g. cpu */ String CRAWLER_HOTTHREAD_TYPE = "crawler.hotthread.type"; /** The key of the configuration. e.g. resourceName,X-Parsed-By,Content-Encoding.*,Content-Type.*,X-TIKA.*,X-FESS.* */ String CRAWLER_METADATA_CONTENT_EXCLUDES = "crawler.metadata.content.excludes"; /** The key of the configuration. e.g. title=title:string<br> * Title=title:string<br>Registered: Sat Dec 20 09:19:18 UTC 2025 - Last Modified: Sat Dec 13 02:21:17 UTC 2025 - 525.7K bytes - Viewed (2) -
src/main/resources/fess_indices/_aws/fess.json
"niinä", "niiksi", "kuka", "kenen", "kenet", "ketä", "kenessä", "kenestä", "keneen", "kenellä", "keneltä", "kenelle", "kenenä", "keneksi", "ketkä", "keiden", "ketkä", "keitä", "keissä", "keistä", "keihin", "keillä", "keiltä", "keille", "keinä", "keiksi", "mikä", "minkä", "minkä", "mitä", "missä", "mistä", "mihin", "millä", "miltä", "mille", "minä", "miksi", "mitkä", "joka", "jonka", "jota", "jossa", "josta", "johon", "jolla", "jolta", "jolle", "jona", "joksi", "jotka", "joiden", "joita", "joissa", "joista",...
Registered: Sat Dec 20 09:19:18 UTC 2025 - Last Modified: Sat Jun 14 00:36:40 UTC 2025 - 117.3K bytes - Viewed (0)