Search Options

Results per page
Sort
Preferred Languages
Advance

Results 1 - 6 of 6 for Expression (0.04 sec)

  1. fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/HtmlXpathExtractor.java

     * It uses XPath expressions to extract text content from HTML documents.
     * <p>
     * This class provides methods to configure the XPath expressions, parser features, and properties.
     * It also includes caching mechanism for XPathAPI instances to improve performance.
     * </p>
     * <p>
     * The extracted text is obtained from the nodes selected by the {@code targetNodePath} XPath expression.
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 10.3K bytes
    - Viewed (0)
  2. fess-crawler/src/main/java/org/codelibs/fess/crawler/Crawler.java

        @Override
        public void close() {
            clientFactory.close();
        }
    
        /**
         * Adds an include filter for URLs.
         * Only URLs matching this regular expression will be crawled.
         * @param regexp The regular expression for the include filter.
         */
        public void addIncludeFilter(final String regexp) {
            if (StringUtil.isNotBlank(regexp)) {
                urlFilter.addInclude(regexp);
            }
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 14K bytes
    - Viewed (0)
  3. fess-crawler/src/main/java/org/codelibs/fess/crawler/transformer/impl/XpathTransformer.java

        private static final Pattern SPACE_PATTERN = Pattern.compile("\\s+", Pattern.MULTILINE);
    
        /**
         * A map of field rules, where the key is the field name and the value is the XPath expression.
         */
        protected Map<String, String> fieldRuleMap = new LinkedHashMap<>();
    
        /** Flag to enable or disable trimming of whitespace characters. */
        protected boolean trimSpaceEnabled = true;
    
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 13.1K bytes
    - Viewed (0)
  4. fess-crawler/src/main/java/org/codelibs/fess/crawler/transformer/impl/XmlTransformer.java

        }
    
        /**
         * Retrieves a list of XPath nodes from the document.
         *
         * @param doc The XML document.
         * @param xpath The XPath expression.
         * @return A list of XPath nodes.
         * @throws XPathExpressionException if an XPath expression error occurs.
         */
        protected XPathNodes getNodeList(final Document doc, final String xpath) throws XPathExpressionException {
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 23.9K bytes
    - Viewed (0)
  5. fess-crawler/src/main/java/org/codelibs/fess/crawler/transformer/impl/HtmlTransformer.java

     *   <li><b>preloadSizeForCharset:</b> The number of bytes to read from the input
     *       stream to determine the character set encoding.</li>
     *   <li><b>invalidUrlPattern:</b> A regular expression pattern used to identify
     *       invalid URLs.</li>
     * </ul>
     *
     * <p>
     * <b>Usage:</b>
     * </p>
     * <p>
     * The {@code transform} method is the main entry point for transforming an HTML
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 28.5K bytes
    - Viewed (0)
  6. fess-crawler/src/main/java/org/codelibs/fess/crawler/entity/RobotsTxt.java

                }
            }
            return null;
        }
    
        /**
         * Adds a directive to the robots.txt rules.
         * The user-agent pattern in the directive is converted to a regular expression pattern,
         * where '*' is replaced with '.*' for pattern matching, and stored case-insensitively.
         *
         * @param directive The directive to add to the robots.txt rules
         */
    Registered: Sun Sep 21 03:50:09 UTC 2025
    - Last Modified: Sun Jul 06 02:13:03 UTC 2025
    - 10K bytes
    - Viewed (0)
Back to top