racking - Code Search

fess-crawler/src/main/java/org/codelibs/fess/crawler/filter/impl/UrlFilterImpl.java

 * This class provides functionality to filter URLs based on include and exclude patterns.
 * It uses a {@link UrlFilterService} to manage the URL filtering rules.
 * The class supports caching of include and exclude patterns for scenarios where a session ID is not available.
 * It also provides methods to initialize the filter with a session ID, clear the filter,

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 9.2K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/client/FaultTolerantClient.java

 * exceptions during retries.</p>
 *
 * <p>Key features:</p>
 * <ul>
 *   <li>Configurable maximum retry attempts</li>
 *   <li>Adjustable interval between retries</li>
 *   <li>Exception tracking and aggregation</li>
 *   <li>Request lifecycle monitoring through listener</li>
 * </ul>
 *
 * <p>By default, it will:</p>
 * <ul>
 *   <li>Retry up to 5 times</li>
 *   <li>Wait 500ms between retries</li>

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 7.8K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/CrawlerContext.java

    /**
     * Lock object for synchronizing access to active thread count.
     */
    protected Object activeThreadCountLock = new Object();

    /**
     * Atomic counter for tracking the number of accesses made.
     */
    protected AtomicLong accessCount = new AtomicLong(0);

    /**
     * Current status of the crawler.
     */

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 8.9K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/HtmlXpathExtractor.java

 * It uses XPath expressions to extract text content from HTML documents.
 * <p>
 * This class provides methods to configure the XPath expressions, parser features, and properties.
 * It also includes caching mechanism for XPathAPI instances to improve performance.
 * </p>
 * <p>
 * The extracted text is obtained from the nodes selected by the {@code targetNodePath} XPath expression.

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 10.3K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/entity/SitemapUrl.java

     * archived URLs.
     *
     * Please note that the value of this tag is considered a hint and not a
     * command. Even though search engine crawlers may consider this information
     * when making decisions, they may crawl pages marked "hourly" less
     * frequently than that, and they may crawl pages marked "yearly" more
     * frequently than that. Crawlers may periodically crawl pages marked

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 6.5K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/test/java/org/codelibs/fess/crawler/transformer/impl/AbstractTransformerTest.java

            assertEquals(expected[i], actual[i]);
        }

        // Test with null
        assertNull(testTransformer.getData(null));
    }

    /**
     * Test name tracking transformer
     */
    public void test_nameTrackingTransformer() {
        NameTrackingTransformer tracker = new NameTrackingTransformer();
        ResponseData responseData = new ResponseData();

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sat Sep 06 04:15:37 UTC 2025

- 20.8K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/client/http/HcHttpClient.java

import jakarta.annotation.Resource;

/**
 * HcHttpClient is an HTTP client implementation that extends AbstractCrawlerClient.
 * It provides various configurations and settings for making HTTP requests, including
 * connection timeouts, proxy settings, user agent, request headers, cookie management,
 * and SSL configurations. The client also supports robots.txt parsing and form-based
 * authentication schemes.

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Thu Aug 07 02:55:08 UTC 2025

- 52.2K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/resources/org/codelibs/fess/crawler/mime/tika-mimetypes.xml

    <sub-class-of type="message/rfc822"/>
  </mime-type>

  <mime-type type="message/s-http"/>
  <mime-type type="message/sip"/>
  <mime-type type="message/sipfrag"/>
  <mime-type type="message/tracking-status"/>
  <mime-type type="message/vnd.si.simp"/>
  <mime-type type="model/e57">
    <_comment>3d imaging data exchange</_comment>
    <magic priority="60">

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Thu Mar 13 08:18:01 UTC 2025

- 320.1K bytes

- Viewed (1)

Search Options