Crawling - Code Search

src/main/java/org/codelibs/fess/exec/Crawler.java

 * <ul>
 * <li>Web crawling - crawls web sites and web content</li>
 * <li>File system crawling - crawls file systems and documents</li>
 * <li>Data store crawling - crawls databases and other data sources</li>
 * <li>Combined crawling - runs multiple crawling types simultaneously</li>
 * </ul>
 *
 * <p>Command line usage:
 * <pre>
 * java org.codelibs.fess.exec.Crawler [options...]

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Thu Mar 26 02:24:08 GMT 2026

- 32.4K bytes

- Click Count (0)

github.com/codelibs/fess

src/main/resources/fess_label_en.properties

labels.createdTime=Created Time
labels.depth=Depth
labels.excludedPaths=Excluded Paths for Crawling
labels.excludedUrls=Excluded URLs for Crawling
labels.excludedDocPaths=Excluded Paths for Searching
labels.excludedDocUrls=Excluded URLs for Searching
labels.hostname=Hostname
labels.id=ID
labels.includedPaths=Included Paths for Crawling
labels.includedUrls=Included URLs for Crawling
labels.includedDocPaths=Included Paths for Searching

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Sat Mar 28 11:54:13 GMT 2026

- 48.9K bytes

- Click Count (0)

github.com/codelibs/fess

src/main/java/org/codelibs/fess/app/web/admin/general/EditForm.java

    @Size(max = 10)
    public String thumbnail;

    /**
     * Types of crawling failures to ignore during crawling operations.
     * Specified failure types will not be logged or counted as errors.
     */
    @Size(max = 1000)
    public String ignoreFailureType;

    /**
     * Threshold for failure count before stopping crawling of a URL.
     * Set to -1 to disable the threshold check.
     */
    @Required

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Thu Mar 26 02:24:08 GMT 2026

- 15.8K bytes

- Click Count (0)

github.com/codelibs/fess

src/test/java/org/codelibs/fess/it/admin/CrawlerLogTests.java

        assertEquals(0, logListAfter.size(), "All crawling info logs should be deleted after calling delete all endpoint");

        // Log the result
        if (sizeBeforeDeletion > 0) {
            logger.info("Successfully deleted {} crawling info log(s) using bulk delete", sizeBeforeDeletion);
        } else {
            logger.info("No crawling info logs to delete (may have been deleted by previous test)");
        }

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Mon Mar 30 14:01:34 GMT 2026

- 13.4K bytes

- Click Count (0)

github.com/codelibs/fess

src/main/resources/fess_label.properties

labels.createdTime=Created Time
labels.depth=Depth
labels.excludedPaths=Excluded Paths for Crawling
labels.excludedUrls=Excluded URLs for Crawling
labels.excludedDocPaths=Excluded Paths for Searching
labels.excludedDocUrls=Excluded URLs for Searching
labels.hostname=Hostname
labels.id=ID
labels.includedPaths=Included Paths for Crawling
labels.includedUrls=Included URLs for Crawling
labels.includedDocPaths=Included Paths for Searching

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Sat Mar 28 11:54:13 GMT 2026

- 48.9K bytes

- Click Count (0)

github.com/codelibs/fess

src/main/java/org/codelibs/fess/helper/DocumentHelper.java

        }
    }

    /**
     * Processes a crawling request for a specific URL.
     * Executes the full crawling pipeline including client execution, rule processing,
     * transformation, and data extraction.
     *
     * @param crawlingConfig the crawling configuration to use
     * @param crawlingInfoId the crawling session ID
     * @param url the URL to process

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Mon Mar 30 14:27:04 GMT 2026

- 17.4K bytes

- Click Count (0)

github.com/codelibs/fess

src/main/java/org/codelibs/fess/Constants.java

    /** Property key for incremental crawling configuration. */
    public static final String INCREMENTAL_CRAWLING_PROPERTY = "crawling.incremental";

    /** Property key for crawling thread count configuration. */
    public static final String CRAWLING_THREAD_COUNT_PROPERTY = "crawling.thread.count";

    /** Property key for crawling user agent configuration. */

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Sat Mar 28 11:55:54 GMT 2026

- 35.8K bytes

- Click Count (0)

github.com/codelibs/fess

src/main/java/org/codelibs/fess/crawler/transformer/FessXpathTransformer.java

            putResultDataBody(dataMap, fessConfig.getIndexFieldContent(), content);
        }
    }

    /**
     * Retrieves the crawling configuration for the given response data.
     *
     * @param responseData the response data from crawling
     * @return the crawling configuration
     */
    protected CrawlingConfig getCrawlingConfig(final ResponseData responseData) {

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Thu Mar 12 01:46:45 GMT 2026

- 55.3K bytes

- Click Count (0)

github.com/codelibs/fess

CLAUDE.md

**Key Capabilities:**
- Full-text search with OpenSearch backend
- Web, file system, and data store crawling
- Multi-format document support (Office, PDF, etc.)
- Admin GUI for configuration
- REST API for programmatic access
- SSO integration (OIDC, SAML, SPNEGO, Entra ID)
- i18n support (20+ languages)

## Tech Stack

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Thu Mar 19 09:48:10 GMT 2026

- 7.8K bytes

- Click Count (0)

github.com/codelibs/fess

src/main/java/org/codelibs/fess/mylasta/action/FessLabels.java

    /** The key of the message: Start Crawling */
    public static final String LABELS_wizard_start_crawling_title = "{labels.wizard_start_crawling_title}";

    /** The key of the message: Crawler */
    public static final String LABELS_wizard_start_crawler_title = "{labels.wizard_start_crawler_title}";

    /** The key of the message: You can start crawling now by clicking "Start Crawling" button. */

Created: Tue Mar 31 13:07:34 GMT 2026

- Last Modified: Sat Mar 28 11:54:13 GMT 2026

- 172.6K bytes

- Click Count (0)

Search Options