Search Options

Results per page
Sort
Preferred Languages
Advance

Results 51 - 59 of 59 for include (0.04 sec)

  1. fess-crawler/src/main/java/org/codelibs/fess/crawler/helper/SitemapsHelper.java

     * and to parse an input stream into a {@link SitemapSet} object.
     * It uses SAX parser for XML sitemaps and XML sitemap indexes,
     * and handles potential exceptions during parsing.
     * The class also includes inner classes for handling XML sitemap and sitemap index parsing.
     */
    public class SitemapsHelper {
        private static final Logger logger = LogManager.getLogger(SitemapsHelper.class);
    
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 14.7K bytes
    - Viewed (0)
  2. fess-crawler/src/main/java/org/codelibs/fess/crawler/transformer/impl/XmlTransformer.java

     * </p>
     * <ul>
     *   <li>Namespace awareness</li>
     *   <li>Coalescing</li>
     *   <li>Entity expansion</li>
     *   <li>Ignoring comments and whitespace</li>
     *   <li>Validation</li>
     *   <li>XInclude awareness</li>
     * </ul>
     *
     * <p>
     * It also allows defining field rules using XPath expressions to extract specific data from the XML document and map it to fields in the ResultData.
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 23.9K bytes
    - Viewed (0)
  3. fess-crawler/src/main/java/org/codelibs/fess/crawler/CrawlerContext.java

     * This class provides methods to access and modify these attributes, allowing for control and monitoring
     * of the crawler's behavior.
     *
     * <p>
     * The context includes information such as the session ID, active thread count, access count, crawler status,
     * URL filter, rule manager, interval controller, robots.txt URL set, sitemaps, number of threads,
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 8.9K bytes
    - Viewed (0)
  4. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/HtmlXpathExtractor.java

     * It uses XPath expressions to extract text content from HTML documents.
     * <p>
     * This class provides methods to configure the XPath expressions, parser features, and properties.
     * It also includes caching mechanism for XPathAPI instances to improve performance.
     * </p>
     * <p>
     * The extracted text is obtained from the nodes selected by the {@code targetNodePath} XPath expression.
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 10.3K bytes
    - Viewed (0)
  5. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/ExtractorFactory.java

    import jakarta.annotation.Resource;
    
    /**
     * Factory class for managing and retrieving {@link Extractor} instances.
     * This class provides methods to add, retrieve, and manage extractors based on a key.
     * It also includes a builder for creating extractors.
     *
     * <p>
     * The factory maintains a map of keys to an array of {@link Extractor} objects.
     * When multiple extractors are associated with a single key, they are sorted by weight
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 7.3K bytes
    - Viewed (0)
  6. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/TikaExtractor.java

        protected int initialBufferSize = 10000;
    
        /**
         * If true, duplicated terms are replaced.
         */
        protected boolean replaceDuplication = false;
    
        /**
         * Space characters. Default includes common space characters.
         */
        protected int[] spaceChars = { '\u0020', '\u00a0', '\u3000', '\ufffd' };
    
        /**
         * Memory size.
         */
        protected int memorySize = 1024 * 1024; //1mb
    
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Thu Aug 07 02:55:08 UTC 2025
    - 30.7K bytes
    - Viewed (0)
  7. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/PdfExtractor.java

     *
     * <p>The extractor runs text extraction in a separate thread with a configurable timeout
     * to prevent hanging on problematic PDF files. It also extracts metadata from the PDF
     * document and includes it in the extraction result.
     *
     * <p>Features:
     * <ul>
     *   <li>Text extraction from PDF pages</li>
     *   <li>Embedded document extraction</li>
     *   <li>Annotation extraction (file attachments)</li>
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 12.7K bytes
    - Viewed (0)
  8. fess-crawler-opensearch/src/main/java/org/codelibs/fess/crawler/service/impl/AbstractCrawlerService.java

                final T bean = BeanUtil.copyMapToNewBean(source, clazz, option -> {
                    option.converter(new EsTimestampConverter(), timestampFields).excludeWhitespace();
                    option.exclude(OpenSearchAccessResult.ACCESS_RESULT_DATA);
                });
                @SuppressWarnings("unchecked")
                final Map<String, Object> data = (Map<String, Object>) source.get(OpenSearchAccessResult.ACCESS_RESULT_DATA);
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Thu Aug 07 02:55:08 UTC 2025
    - 34.2K bytes
    - Viewed (0)
  9. fess-crawler/src/main/java/org/codelibs/fess/crawler/client/http/HcHttpClient.java

                                            crawlerContext.getUrlFilter().addInclude(urlValue);
                                            if (logger.isInfoEnabled()) {
                                                logger.info("Included URL: {}", urlValue);
                                            }
                                        }
                                    }
                                }
                            }
                        }
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Thu Aug 07 02:55:08 UTC 2025
    - 52.2K bytes
    - Viewed (0)
Back to top