Search Options

Results per page
Sort
Preferred Languages
Advance

Results 1 - 10 of 43 for extraction (0.04 sec)

  1. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/PdfExtractor.java

     * document and includes it in the extraction result.
     *
     * <p>Features:
     * <ul>
     *   <li>Text extraction from PDF pages</li>
     *   <li>Embedded document extraction</li>
     *   <li>Annotation extraction (file attachments)</li>
     *   <li>Metadata extraction</li>
     *   <li>Password-protected PDF support</li>
     *   <li>Configurable timeout for extraction process</li>
     * </ul>
     *
     * @author shinsuke
     */
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 12.7K bytes
    - Viewed (0)
  2. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/ExtractorBuilder.java

     * It encapsulates the process of extracting data from an input stream using a specified or detected extractor.
     * The builder allows setting parameters such as MIME type, filename, extractor name, maximum content length,
     * and cache file size to optimize the extraction process.
     *
     * <p>
     * The main purpose of this class is to simplify the extraction process by providing a fluent interface
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 10.1K bytes
    - Viewed (0)
  3. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/AbstractXmlExtractor.java

        /**
         * Default character encoding for content extraction.
         */
        protected String encoding = Constants.UTF_8;
    
        /**
         * The preload size for charset detection.
         */
        protected int preloadSizeForCharset = 2048;
    
        /**
         * Indicates whether comment tags should be ignored during extraction.
         */
        protected boolean ignoreCommentTag = false;
    
        /**
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 8.5K bytes
    - Viewed (0)
  4. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/LhaExtractor.java

    import org.codelibs.fess.crawler.extractor.Extractor;
    import org.codelibs.fess.crawler.extractor.ExtractorFactory;
    import org.codelibs.fess.crawler.helper.MimeTypeHelper;
    import org.codelibs.fess.crawler.util.IgnoreCloseInputStream;
    
    import jp.gr.java_conf.dangan.util.lha.LhaFile;
    import jp.gr.java_conf.dangan.util.lha.LhaHeader;
    
    /**
     * Extractor implementation for LHA (LZH) archive files.
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 5.8K bytes
    - Viewed (0)
  5. README.md

    ## Overview
    
    **Fess Crawler** is a powerful, flexible Java-based web crawling framework designed for enterprise-scale content extraction and processing. Built with a modular architecture, it supports multiple protocols (HTTP/HTTPS, File System, FTP, SMB, Cloud Storage) and provides extensive content extraction capabilities from various document formats.
    
    ### Key Features
    
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Aug 31 05:32:52 UTC 2025
    - 15.3K bytes
    - Viewed (0)
  6. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/TikaExtractor.java

     *   <li>Handling resource names and content types</li>
     *   <li>Retrying extraction without resource name or content type if the initial attempt fails</li>
     *   <li>Extracting text from metadata if the main content extraction fails</li>
     *   <li>Reading content as plain text if all other methods fail</li>
     *   <li>Applying post-extraction filters</li>
     *   <li>Handling Tika exceptions, including zip bomb exceptions</li>
     * </ul>
     *
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Thu Aug 07 02:55:08 UTC 2025
    - 30.7K bytes
    - Viewed (0)
  7. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/PasswordBasedExtractor.java

     *
     * <p>The extractor supports two types of password management:
     * <ul>
     *   <li>Static passwords configured via {@link #addPassword(String, String)}</li>
     *   <li>Dynamic passwords provided through extraction parameters</li>
     * </ul>
     *
     * <p>Passwords are matched against URLs or resource names using regular expression patterns.
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Thu Aug 07 02:55:08 UTC 2025
    - 5.1K bytes
    - Viewed (0)
  8. fess-crawler/src/main/java/org/codelibs/fess/crawler/exception/UnsupportedExtractException.java

     * governing permissions and limitations under the License.
     */
    package org.codelibs.fess.crawler.exception;
    
    /**
     * UnsupportedExtractException is thrown when the content extraction is not supported.
     * It extends ExtractException and indicates that the requested extraction operation cannot be performed.
     *
     */
    public class UnsupportedExtractException extends ExtractException {
    
        private static final long serialVersionUID = 1L;
    
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 1.2K bytes
    - Viewed (0)
  9. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/TextExtractor.java

            } catch (final Exception e) {
                throw new ExtractException(e);
            }
        }
    
        /**
         * Returns the encoding used for text extraction.
         * @return the encoding
         */
        public String getEncoding() {
            return encoding;
        }
    
        /**
         * Sets the encoding.
         * @param encoding The encoding to set.
         */
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 2K bytes
    - Viewed (0)
  10. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/AbstractExtractor.java

     */
    package org.codelibs.fess.crawler.extractor.impl;
    
    import java.io.File;
    import java.io.IOException;
    import java.util.List;
    
    import org.codelibs.fess.crawler.container.CrawlerContainer;
    import org.codelibs.fess.crawler.exception.CrawlerSystemException;
    import org.codelibs.fess.crawler.extractor.Extractor;
    import org.codelibs.fess.crawler.extractor.ExtractorFactory;
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 4.2K bytes
    - Viewed (0)
Back to top