Search Options

Results per page
Sort
Preferred Languages
Advance

Results 11 - 20 of 90 for crawl (0.02 sec)

  1. MIGRATION.md

    - `GET /backup/export` - Export configurations
    - `PUT /documents/bulk` - Bulk document import
    - `GET /webconfig` - List web crawl configs
    - `POST /webconfig` - Create web crawl config
    - `PUT /webconfig/{id}` - Update web crawl config
    - `DELETE /webconfig/{id}` - Delete web crawl config
    - (Similar CRUD for `fileconfig`, `dataconfig`, `labeltype`, etc.)
    
    **Example - List All Web Configs**:
    ```bash
    Registered: Sat Dec 20 09:19:18 UTC 2025
    - Last Modified: Thu Nov 06 12:40:11 UTC 2025
    - 23.2K bytes
    - Viewed (0)
  2. build-logic-commons/code-quality-rules/src/main/resources/checkstyle/checkstyle-api.xml

      ~ See the License for the specific language governing permissions and
      ~ limitations under the License.
      -->
    <!DOCTYPE module PUBLIC
            "-//Puppy Crawl//DTD Check Configuration 1.2//EN"
            "http://www.puppycrawl.com/dtds/configuration_1_2.dtd">
    <module name="Checker">
        <module name="SuppressionFilter">
            <property name="file" value="${config_loc}/suppressions.xml"/>
    Registered: Wed Dec 31 11:36:14 UTC 2025
    - Last Modified: Thu Nov 17 23:20:14 UTC 2022
    - 1.6K bytes
    - Viewed (0)
  3. src/main/java/org/codelibs/fess/app/web/admin/fileconfig/CreateForm.java

        @Size(max = 200)
        public String name;
    
        /** The description of the file configuration (maximum 1000 characters). */
        @Size(max = 1000)
        public String description;
    
        /** The file paths to crawl (required, must be valid file URIs). */
        @Required
        @UriType(protocolType = ProtocolType.FILE)
        @CustomSize(maxKey = "form.admin.max.input.size")
        public String paths;
    
    Registered: Sat Dec 20 09:19:18 UTC 2025
    - Last Modified: Thu Jul 17 08:28:31 UTC 2025
    - 5.6K bytes
    - Viewed (0)
  4. src/main/java/org/codelibs/fess/helper/DataIndexHelper.java

         * specified in the configIdList parameter.
         *
         * @param sessionId unique identifier for this crawling session
         * @param configIdList list of data configuration IDs to crawl
         */
        public void crawl(final String sessionId, final List<String> configIdList) {
            final List<DataConfig> configList = ComponentUtil.getCrawlingConfigHelper().getDataConfigListByIds(configIdList);
    
    Registered: Sat Dec 20 09:19:18 UTC 2025
    - Last Modified: Fri Nov 28 16:29:12 UTC 2025
    - 19K bytes
    - Viewed (0)
  5. CLAUDE.md

    extractorFactory.addExtractor("text/html", htmlExtractor, 2);  // Weight 2
    extractorFactory.addExtractor("text/html", tikaExtractor, 1);  // Fallback
    ```
    
    ### Helpers
    
    **RobotsTxtHelper**: RFC 9309 parsing, user-agent matching, crawl-delay, sitemaps
    **SitemapsHelper**: Sitemap XML parsing, index handling
    **MimeTypeHelper**: MIME detection via Tika
    **EncodingHelper**: Charset detection with BOM
    **UrlConvertHelper**: URL normalization
    
    ---
    
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Fri Nov 28 17:31:34 UTC 2025
    - 10.7K bytes
    - Viewed (0)
  6. README.md

    </components>
    ```
    
    ### Crawler Context Configuration
    
    ```java
    // Set maximum number of URLs to crawl
    crawler.crawlerContext.setMaxAccessCount(1000);
    
    // Set number of crawler threads
    crawler.crawlerContext.setNumOfThread(10);
    
    // Set maximum crawl depth
    crawler.crawlerContext.setMaxDepth(3);
    
    // Set request interval (politeness)
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Aug 31 05:32:52 UTC 2025
    - 15.3K bytes
    - Viewed (0)
  7. fess-crawler/src/main/java/org/codelibs/fess/crawler/entity/SitemapUrl.java

         * command. Even though search engine crawlers may consider this information
         * when making decisions, they may crawl pages marked "hourly" less
         * frequently than that, and they may crawl pages marked "yearly" more
         * frequently than that. Crawlers may periodically crawl pages marked
         * "never" so that they can handle unexpected changes to those pages.
         */
        private String changefreq;
    
        /**
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Thu Nov 13 13:34:36 UTC 2025
    - 9.1K bytes
    - Viewed (0)
  8. src/main/java/org/codelibs/fess/app/web/admin/dataconfig/EditForm.java

     * This form extends CreateForm to include fields necessary for updating existing data config entries,
     * including tracking information for optimistic locking and audit trails.
     * Data configs define how to crawl and extract data from databases, CSV files, and other data sources.
     *
     */
    public class EditForm extends CreateForm {
    
        /**
         * Creates a new EditForm instance.
         */
        public EditForm() {
    Registered: Sat Dec 20 09:19:18 UTC 2025
    - Last Modified: Thu Jul 17 08:28:31 UTC 2025
    - 2.3K bytes
    - Viewed (0)
  9. src/main/java/org/codelibs/fess/app/web/api/admin/webconfig/SearchBody.java

        /**
         * Default constructor.
         */
        public SearchBody() {
            super();
        }
    
        /** Name of the web crawling configuration */
        public String name;
    
        /** URLs to crawl */
        public String urls;
    
        /** Description of the web crawling configuration */
        public String description;
    Registered: Sat Dec 20 09:19:18 UTC 2025
    - Last Modified: Thu Jul 17 08:28:31 UTC 2025
    - 1.2K bytes
    - Viewed (0)
  10. src/main/java/org/codelibs/fess/ds/callback/FileListIndexUpdateCallbackImpl.java

            /**
             * Constructs a new crawl request.
             *
             * @param url the URL to crawl
             * @param depth the depth of this URL in the crawling hierarchy
             */
            CrawlRequest(final String url, final int depth) {
                this.url = url;
                this.depth = depth;
            }
    
            /**
             * Gets the URL of this crawl request.
             *
    Registered: Sat Dec 20 09:19:18 UTC 2025
    - Last Modified: Fri Nov 28 16:29:12 UTC 2025
    - 29.7K bytes
    - Viewed (3)
Back to top