- Sort Score
- Result 10 results
- Languages All
Results 1 - 10 of 104 for extraction (0.06 sec)
-
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/ExtractorBuilder.java
* It encapsulates the process of extracting data from an input stream using a specified or detected extractor. * The builder allows setting parameters such as MIME type, filename, extractor name, maximum content length, * and cache file size to optimize the extraction process. * * <p> * The main purpose of this class is to simplify the extraction process by providing a fluent interface
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 10.1K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/PdfExtractor.java
* document and includes it in the extraction result. * * <p>Features: * <ul> * <li>Text extraction from PDF pages</li> * <li>Embedded document extraction</li> * <li>Annotation extraction (file attachments)</li> * <li>Metadata extraction</li> * <li>Password-protected PDF support</li> * <li>Configurable timeout for extraction process</li> * </ul> * * @author shinsuke */
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 12.7K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/Extractor.java
*/ package org.codelibs.fess.crawler.extractor; import java.io.InputStream; import java.util.Map; import org.codelibs.fess.crawler.entity.ExtractData; /** * The Extractor interface defines methods for extracting text data from an input stream. * Implementations of this interface should provide the logic for extracting text and * optionally override the default weight value. */
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sat Mar 15 06:52:00 UTC 2025 - 1.6K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/LhaExtractor.java
import org.codelibs.fess.crawler.extractor.Extractor; import org.codelibs.fess.crawler.extractor.ExtractorFactory; import org.codelibs.fess.crawler.helper.MimeTypeHelper; import org.codelibs.fess.crawler.util.IgnoreCloseInputStream; import jp.gr.java_conf.dangan.util.lha.LhaFile; import jp.gr.java_conf.dangan.util.lha.LhaHeader; /** * Extractor implementation for LHA (LZH) archive files.
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 5.8K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/AbstractXmlExtractor.java
/** * Default character encoding for content extraction. */ protected String encoding = Constants.UTF_8; /** * The preload size for charset detection. */ protected int preloadSizeForCharset = 2048; /** * Indicates whether comment tags should be ignored during extraction. */ protected boolean ignoreCommentTag = false; /**
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 8.5K bytes - Viewed (0) -
README.md
## Overview **Fess Crawler** is a powerful, flexible Java-based web crawling framework designed for enterprise-scale content extraction and processing. Built with a modular architecture, it supports multiple protocols (HTTP/HTTPS, File System, FTP, SMB, Cloud Storage) and provides extensive content extraction capabilities from various document formats. ### Key Features
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Aug 31 05:32:52 UTC 2025 - 15.3K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/TikaExtractor.java
* <li>Handling resource names and content types</li> * <li>Retrying extraction without resource name or content type if the initial attempt fails</li> * <li>Extracting text from metadata if the main content extraction fails</li> * <li>Reading content as plain text if all other methods fail</li> * <li>Applying post-extraction filters</li> * <li>Handling Tika exceptions, including zip bomb exceptions</li> * </ul> *
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Thu Aug 07 02:55:08 UTC 2025 - 30.7K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/exception/UnsupportedExtractException.java
* governing permissions and limitations under the License. */ package org.codelibs.fess.crawler.exception; /** * UnsupportedExtractException is thrown when the content extraction is not supported. * It extends ExtractException and indicates that the requested extraction operation cannot be performed. * */ public class UnsupportedExtractException extends ExtractException { private static final long serialVersionUID = 1L;
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 1.2K bytes - Viewed (0) -
src/test/java/jcifs/smb1/smb1/SmbFileTest.java
// Test file name extraction assertEquals("file.txt", new SmbFile("smb1://server/share/file.txt").getName()); // Test directory name extraction (should include trailing slash) assertEquals("dir/", new SmbFile("smb1://server/share/dir/").getName()); // Test share name extraction assertEquals("share/", new SmbFile("smb1://server/share/").getName());
Registered: Sun Sep 07 00:10:21 UTC 2025 - Last Modified: Thu Aug 14 05:31:44 UTC 2025 - 8.5K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/PasswordBasedExtractor.java
* * <p>The extractor supports two types of password management: * <ul> * <li>Static passwords configured via {@link #addPassword(String, String)}</li> * <li>Dynamic passwords provided through extraction parameters</li> * </ul> * * <p>Passwords are matched against URLs or resource names using regular expression patterns.
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Thu Aug 07 02:55:08 UTC 2025 - 5.1K bytes - Viewed (0)