- Sort Score
- Result 10 results
- Languages All
Results 1 - 10 of 19 for PDF (0.01 sec)
-
fess-crawler/src/test/java/org/codelibs/fess/crawler/extractor/impl/TikaExtractorTest.java
url = "http://test.com/hoge1.pdf"; resourceName = null; assertNull(tikaExtractor.getPassword(createParams(url, resourceName))); url = "http://test.com/hoge1.pdf"; resourceName = "hoge2.pdf"; assertNull(tikaExtractor.getPassword(createParams(url, resourceName))); url = null; resourceName = "hoge2.pdf";Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Thu Aug 07 02:55:08 UTC 2025 - 30.6K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/PdfExtractor.java
/** * PdfExtractor extracts text content from PDF files using Apache PDFBox. * It supports password-protected PDFs and can extract embedded documents and annotations. * * <p>The extractor runs text extraction in a separate thread with a configurable timeout * to prevent hanging on problematic PDF files. It also extracts metadata from the PDF * document and includes it in the extraction result. * * <p>Features:
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 12.7K bytes - Viewed (0) -
docs/debugging/README.md
Registered: Sun Sep 07 19:28:11 UTC 2025 - Last Modified: Tue Aug 12 18:20:36 UTC 2025 - 8.6K bytes - Viewed (0) -
cmd/erasure-sets_test.go
testCases := []struct { objectName string sipHash int }{ // cases which should pass the test. // passing in valid object name. {"object", 37}, {"The Shining Script <v1>.pdf", 38}, {"Cost Benefit Analysis (2009-2010).pptx", 59}, {"117Gn8rfHL2ACARPAhaFd0AGzic9pUbIA/5OCn5A", 35}, {"SHØRT", 49}, {"There are far too many object names, and far too few bucket names!", 8}, {"a/b/c/", 159},Registered: Sun Sep 07 19:28:11 UTC 2025 - Last Modified: Fri Aug 29 02:39:48 UTC 2025 - 6.8K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/JodExtractor.java
// Presentation Formats extensionMap.put("odp", "pdf"); extensionMap.put("otp", "pdf"); extensionMap.put("sxi", "pdf"); extensionMap.put("ppt", "pdf"); extensionMap.put("pptx", "pdf"); // Drawing Formats extensionMap.put("odg", "svg"); extensionMap.put("otg", "svg"); extractorMap.put("pdf", new PdfExtractor());Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 10.3K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/ExtractorBuilder.java
* </p> * * <p> * Example usage: * </p> * * <pre> * {@code * try (InputStream in = new FileInputStream("example.pdf")) { * ExtractData extractData = new ExtractorBuilder(crawlerContainer, in, new HashMap<>()) * .mimeType("application/pdf") * .filename("example.pdf") * .maxContentLength(1024 * 1024) * .extract(); * * String content = extractData.getContent();
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 10.1K bytes - Viewed (0) -
fess-crawler/src/test/java/org/codelibs/fess/crawler/filter/UrlFilterTest.java
String sessionId = "test-session-019"; urlFilter.init(sessionId); urlFilter.addInclude(".*\\.PDF$"); // Test case sensitivity assertFalse(urlFilter.match("https://example.com/document.pdf")); assertTrue(urlFilter.match("https://example.com/document.PDF")); } /** * Test very long URL handling */ public void test_veryLongUrl() {Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Wed Sep 03 14:42:53 UTC 2025 - 19K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/TikaExtractor.java
* <li>Maximum term sizes for alphanumeric and symbolic terms</li> * <li>Custom Tika configuration</li> * <li>Tesseract OCR configuration for image-based documents</li> * <li>PDF Parser configuration for PDF documents</li> * </ul> * * <p> * The {@link TikaDetectParser} inner class extends {@link CompositeParser} to provide auto-detection of the MIME type
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Thu Aug 07 02:55:08 UTC 2025 - 30.7K bytes - Viewed (0) -
docs/compression/README.md
``` Default config includes most common highly compressible content extensions and mime-types. ```bash ~ mc admin config set myminio compression extensions=".pdf" mime_types="application/pdf" ``` To show help on setting compression config values. ```bash ~ mc admin config set myminio compression ``` To enable compression for all content, no matter the extension and content type
Registered: Sun Sep 07 19:28:11 UTC 2025 - Last Modified: Tue Aug 12 18:20:36 UTC 2025 - 5.2K bytes - Viewed (0) -
README.md
crawler.crawlerContext.setDefaultIntervalTime(1000); // 1 second ``` ### URL Filtering ```java // Include patterns crawler.urlFilter.addInclude("https://example.com/.*"); crawler.urlFilter.addInclude(".*\\.pdf$"); // Exclude patterns crawler.urlFilter.addExclude(".*\\.js$"); crawler.urlFilter.addExclude(".*login.*"); ``` ## Supported Protocols and Formats ### Protocols
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Aug 31 05:32:52 UTC 2025 - 15.3K bytes - Viewed (0)