- Sort Score
- Result 10 results
- Languages All
Results 11 - 20 of 35 for tika (0.02 sec)
-
fess-crawler-lasta/src/main/resources/crawler/extractor.xml
"application/x-texinfo", "application/x-tika-msoffice", "application/x-tika-msoffice-embedded", "application/x-tika-msoffice-embedded;format=ole10_native", "application/x-tika-msoffice-embedded;format=comp_obj", "application/x-tika-msworks-spreadsheet", "application/x-tika-ooxml", "application/x-tika-ooxml-protected", "application/x-tika-staroffice", "application/x-uc2-compressed",
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sat Aug 01 21:40:30 UTC 2020 - 49K bytes - Viewed (0) -
src/main/java/org/codelibs/fess/crawler/transformer/FessStandardTransformer.java
public Logger getLogger() { return logger; } /** * Gets the appropriate extractor for the given response data. * Selects an extractor based on the MIME type or falls back to the Tika extractor. * * @param responseData the response data containing the document to extract * @return the extractor instance for processing the document
Registered: Thu Sep 04 12:52:25 UTC 2025 - Last Modified: Thu Jul 17 08:28:31 UTC 2025 - 3.8K bytes - Viewed (0) -
README.md
## Technology Stack - **Java**: 21+ (requires Java 21 or higher) - **Build System**: Maven 3.x - **DI Container**: LastaFlute DI - **HTTP Client**: Apache HttpComponents - **Content Extraction**: Apache Tika, Apache POI, PDFBox - **Testing**: JUnit 4, UTFlute, Testcontainers - **Storage Backends**: OpenSearch, Memory-based ## Quick Start ### Prerequisites - Java 21 or higher - Maven 3.6 or higher
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Aug 31 05:32:52 UTC 2025 - 15.3K bytes - Viewed (0) -
src/main/java/org/codelibs/fess/job/CrawlJob.java
Registered: Thu Sep 04 12:52:25 UTC 2025 - Last Modified: Thu Jul 17 08:28:31 UTC 2025 - 19.6K bytes - Viewed (0) -
fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/TikaExtractorTest.java
final String content = extractData.getContent(); CloseableUtil.closeQuietly(in); logger.info(content); assertTrue(content.contains("テスト")); } // TODO tika needs to support pdfbox 2.0 // public void test_getTika_pdf() { // final InputStream in = ResourceUtil // .getResourceAsStream("extractor/test.pdf");
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Thu Aug 07 02:55:08 UTC 2025 - 30.6K bytes - Viewed (0) -
src/main/resources/fess_indices/_aws/fess.json
"taču", "nu", "pat", "tiklab", "iekšpus", "nedz", "tik", "nevis", "turpretim", "jeb", "iekam", "iekām", "iekāms", "kolīdz", "līdzko", "tiklīdz", "jebšu", "tālab", "tāpēc", "nekā", "itin", "jā", "jau", "jel", "nē", "nezin", "tad", "tikai", "vis", "tak", "iekams", "vien", "# modal verbs", "būt ", "biju ", "biji", "bija", "bijām", "bijāt", "esmu", "esi", "esam", "esat ", "būšu ", "būsi", "būs", "būsim", "būsiet", "tikt", "tiku", "tiki", "tika", "tikām", "tikāt", "tieku", "tiec", "tiek", "tiekam",...
Registered: Thu Sep 04 12:52:25 UTC 2025 - Last Modified: Sat Jun 14 00:36:40 UTC 2025 - 117.3K bytes - Viewed (0) -
src/main/resources/fess_indices/_cloud/fess.json
"taču", "nu", "pat", "tiklab", "iekšpus", "nedz", "tik", "nevis", "turpretim", "jeb", "iekam", "iekām", "iekāms", "kolīdz", "līdzko", "tiklīdz", "jebšu", "tālab", "tāpēc", "nekā", "itin", "jā", "jau", "jel", "nē", "nezin", "tad", "tikai", "vis", "tak", "iekams", "vien", "# modal verbs", "būt ", "biju ", "biji", "bija", "bijām", "bijāt", "esmu", "esi", "esam", "esat ", "būšu ", "būsi", "būs", "būsim", "būsiet", "tikt", "tiku", "tiki", "tika", "tikām", "tikāt", "tieku", "tiec", "tiek", "tiekam",...
Registered: Thu Sep 04 12:52:25 UTC 2025 - Last Modified: Sat Feb 27 09:26:16 UTC 2021 - 117.3K bytes - Viewed (0) -
src/main/resources/fess_config.properties
# Type of hot thread monitoring (e.g., cpu). crawler.hotthread.type=cpu # Metadata fields to exclude from document content. crawler.metadata.content.excludes=resourceName,X-Parsed-By,Content-Encoding.*,Content-Type.*,X-TIKA.*,X-FESS.* # Mapping for document metadata names. crawler.metadata.name.mapping=\ title=title:string\n\ Title=title:string\n\ dc:title=title:string\n\ # html
Registered: Thu Sep 04 12:52:25 UTC 2025 - Last Modified: Sat Jul 05 14:45:37 UTC 2025 - 54.7K bytes - Viewed (0) -
src/main/java/org/codelibs/fess/mylasta/direction/FessConfig.java
/** The key of the configuration. e.g. cpu */ String CRAWLER_HOTTHREAD_TYPE = "crawler.hotthread.type"; /** The key of the configuration. e.g. resourceName,X-Parsed-By,Content-Encoding.*,Content-Type.*,X-TIKA.*,X-FESS.* */ String CRAWLER_METADATA_CONTENT_EXCLUDES = "crawler.metadata.content.excludes"; /** The key of the configuration. e.g. title=title:string<br> * Title=title:string<br>
Registered: Thu Sep 04 12:52:25 UTC 2025 - Last Modified: Thu Jul 17 08:28:31 UTC 2025 - 525.6K bytes - Viewed (2) -
docs/id/docs/index.md
Registered: Sun Sep 07 07:19:17 UTC 2025 - Last Modified: Sun Aug 31 10:49:48 UTC 2025 - 20.5K bytes - Viewed (0)