Search Options

Results per page
Sort
Preferred Languages
Advance

Results 1 - 6 of 6 for RobotsTxtHelper (0.06 sec)

  1. fess-crawler/src/main/java/org/codelibs/fess/crawler/helper/RobotsTxtHelper.java

        /** Whether robots.txt processing is enabled. */
        protected boolean enabled = true;
    
        /**
         * Creates a new RobotsTxtHelper instance.
         */
        public RobotsTxtHelper() {
            // Default constructor
        }
    
        /**
         * Parses a robots.txt file from the given input stream using UTF-8 encoding.
         * @param stream the input stream to parse
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Fri Nov 14 12:52:01 UTC 2025
    - 11.4K bytes
    - Viewed (0)
  2. fess-crawler/src/test/java/org/codelibs/fess/crawler/helper/RobotsTxtHelperTest.java

    public class RobotsTxtHelperTest extends PlainTestCase {
        public RobotsTxtHelper robotsTxtHelper;
    
        @Override
        protected void setUp() throws Exception {
            super.setUp();
            StandardCrawlerContainer container = new StandardCrawlerContainer().singleton("robotsTxtHelper", RobotsTxtHelper.class);
            robotsTxtHelper = container.getComponent("robotsTxtHelper");
        }
    
        public void testParse() {
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Mon Nov 24 03:59:47 UTC 2025
    - 20.6K bytes
    - Viewed (0)
  3. fess-crawler/src/test/java/org/codelibs/fess/crawler/client/http/HcHttpClientTest.java

                    .singleton("urlFilterService", UrlFilterServiceImpl.class)//
                    .singleton("urlFilter", UrlFilterImpl.class)//
                    .singleton("robotsTxtHelper", RobotsTxtHelper.class)//
                    .singleton("httpClient", HcHttpClient.class);
            httpClient = container.getComponent("httpClient");
            urlFilter = container.getComponent("urlFilter");
        }
    
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sat Sep 06 04:15:37 UTC 2025
    - 11.7K bytes
    - Viewed (0)
  4. fess-crawler/src/main/java/org/codelibs/fess/crawler/client/http/HcHttpClient.java

        /** Logger instance for this class */
        private static final Logger logger = LogManager.getLogger(HcHttpClient.class);
    
        /** Helper for processing robots.txt files */
        @Resource
        protected RobotsTxtHelper robotsTxtHelper;
    
        /** Helper for managing content length limits */
        @Resource
        protected ContentLengthHelper contentLengthHelper;
    
        /** Helper for determining MIME types */
        @Resource
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Sun Nov 23 12:19:14 UTC 2025
    - 53.7K bytes
    - Viewed (0)
  5. fess-crawler/src/test/java/org/codelibs/fess/crawler/CrawlerTest.java

                        transformer.setChildUrlRuleMap(childUrlRuleMap);
                    })
                    .singleton("dataHelper", MemoryDataHelper.class)
                    .singleton("robotsTxtHelper", RobotsTxtHelper.class)
                    .<CrawlerClientFactory> singleton("clientFactory", CrawlerClientFactory.class, factory -> {
                        factory.addClient("http:.*", container.getComponent("httpClient"));
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Tue Nov 11 13:40:14 UTC 2025
    - 25.8K bytes
    - Viewed (0)
  6. CLAUDE.md

    **Registration**:
    ```java
    extractorFactory.addExtractor("text/html", htmlExtractor, 2);  // Weight 2
    extractorFactory.addExtractor("text/html", tikaExtractor, 1);  // Fallback
    ```
    
    ### Helpers
    
    **RobotsTxtHelper**: RFC 9309 parsing, user-agent matching, crawl-delay, sitemaps
    **SitemapsHelper**: Sitemap XML parsing, index handling
    **MimeTypeHelper**: MIME detection via Tika
    **EncodingHelper**: Charset detection with BOM
    Registered: Sat Dec 20 11:21:39 UTC 2025
    - Last Modified: Fri Nov 28 17:31:34 UTC 2025
    - 10.7K bytes
    - Viewed (0)
Back to top