- Sort Score
- Result 10 results
- Languages All
Results 51 - 59 of 59 for include (0.04 sec)
-
fess-crawler/src/main/java/org/codelibs/fess/crawler/helper/SitemapsHelper.java
* and to parse an input stream into a {@link SitemapSet} object. * It uses SAX parser for XML sitemaps and XML sitemap indexes, * and handles potential exceptions during parsing. * The class also includes inner classes for handling XML sitemap and sitemap index parsing. */ public class SitemapsHelper { private static final Logger logger = LogManager.getLogger(SitemapsHelper.class);Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 14.7K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/transformer/impl/XmlTransformer.java
* </p> * <ul> * <li>Namespace awareness</li> * <li>Coalescing</li> * <li>Entity expansion</li> * <li>Ignoring comments and whitespace</li> * <li>Validation</li> * <li>XInclude awareness</li> * </ul> * * <p> * It also allows defining field rules using XPath expressions to extract specific data from the XML document and map it to fields in the ResultData.
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 23.9K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/CrawlerContext.java
* This class provides methods to access and modify these attributes, allowing for control and monitoring * of the crawler's behavior. * * <p> * The context includes information such as the session ID, active thread count, access count, crawler status, * URL filter, rule manager, interval controller, robots.txt URL set, sitemaps, number of threads,
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 8.9K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/HtmlXpathExtractor.java
* It uses XPath expressions to extract text content from HTML documents. * <p> * This class provides methods to configure the XPath expressions, parser features, and properties. * It also includes caching mechanism for XPathAPI instances to improve performance. * </p> * <p> * The extracted text is obtained from the nodes selected by the {@code targetNodePath} XPath expression.Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 10.3K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/ExtractorFactory.java
import jakarta.annotation.Resource; /** * Factory class for managing and retrieving {@link Extractor} instances. * This class provides methods to add, retrieve, and manage extractors based on a key. * It also includes a builder for creating extractors. * * <p> * The factory maintains a map of keys to an array of {@link Extractor} objects. * When multiple extractors are associated with a single key, they are sorted by weightRegistered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 7.3K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/TikaExtractor.java
protected int initialBufferSize = 10000; /** * If true, duplicated terms are replaced. */ protected boolean replaceDuplication = false; /** * Space characters. Default includes common space characters. */ protected int[] spaceChars = { '\u0020', '\u00a0', '\u3000', '\ufffd' }; /** * Memory size. */ protected int memorySize = 1024 * 1024; //1mbRegistered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Thu Aug 07 02:55:08 UTC 2025 - 30.7K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/PdfExtractor.java
* * <p>The extractor runs text extraction in a separate thread with a configurable timeout * to prevent hanging on problematic PDF files. It also extracts metadata from the PDF * document and includes it in the extraction result. * * <p>Features: * <ul> * <li>Text extraction from PDF pages</li> * <li>Embedded document extraction</li> * <li>Annotation extraction (file attachments)</li>
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Sun Jul 06 02:13:03 UTC 2025 - 12.7K bytes - Viewed (0) -
fess-crawler-opensearch/src/main/java/org/codelibs/fess/crawler/service/impl/AbstractCrawlerService.java
final T bean = BeanUtil.copyMapToNewBean(source, clazz, option -> { option.converter(new EsTimestampConverter(), timestampFields).excludeWhitespace(); option.exclude(OpenSearchAccessResult.ACCESS_RESULT_DATA); }); @SuppressWarnings("unchecked") final Map<String, Object> data = (Map<String, Object>) source.get(OpenSearchAccessResult.ACCESS_RESULT_DATA);
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Thu Aug 07 02:55:08 UTC 2025 - 34.2K bytes - Viewed (0) -
fess-crawler/src/main/java/org/codelibs/fess/crawler/client/http/HcHttpClient.java
crawlerContext.getUrlFilter().addInclude(urlValue); if (logger.isInfoEnabled()) { logger.info("Included URL: {}", urlValue); } } } } } }
Registered: Sun Sep 21 03:50:09 UTC 2025 - Last Modified: Thu Aug 07 02:55:08 UTC 2025 - 52.2K bytes - Viewed (0)