Documents - Code Search

src/main/java/org/codelibs/fess/suggest/index/SuggestIndexer.java

    }

    /**
     * Indexes documents from an array of maps.
     * @param documents The documents to index.
     * @return The SuggestIndexResponse.
     */
    public SuggestIndexResponse indexFromDocument(final Map<String, Object>[] documents) {
        final long start = System.currentTimeMillis();
        try {
            final Stream<Map<String, Object>> stream = Stream.of(documents);
            if (parallel) {

Registered: Fri Sep 19 09:08:11 UTC 2025

- Last Modified: Thu Aug 07 02:41:28 UTC 2025

- 34.8K bytes

- Viewed (0)

github.com/codelibs/fess-suggest

src/main/java/org/codelibs/fess/suggest/index/contents/document/ESSourceReader.java

 * reader.setScrollSize(1000); // Set the scroll size
 * reader.setLimitOfDocumentSize(1024 * 1024); // Limit document size to 1MB
 * reader.setQuery(QueryBuilders.termQuery("field", "value")); // Set a query
 *
 * Map<String, Object> document;
 * while ((document = reader.read()) != null) {
 *     // Process the document
 *     System.out.println(document);
 * }
 *
 * reader.close(); // Close the reader to release resources
 * }
 * </pre>
 */

Registered: Fri Sep 19 09:08:11 UTC 2025

- Last Modified: Thu Aug 07 02:41:28 UTC 2025

- 11K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/PdfExtractor.java

    }

    /**
     * Extracts text from embedded documents in the PDF.
     * @param document the PDF document
     * @param writer the writer to append extracted text to
     */
    protected void extractEmbeddedDocuments(final PDDocument document, final StringWriter writer) {
        final PDDocumentNameDictionary namesDictionary = new PDDocumentNameDictionary(document.getDocumentCatalog());

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 12.7K bytes

- Viewed (0)

github.com/codelibs/fess-suggest

README.md

```

## Advanced Usage

### Index from Existing Documents

```java
import org.codelibs.fess.suggest.index.contents.document.ESSourceReader;

// Index suggestions from existing Elasticsearch documents
DocumentReader reader = new ESSourceReader(
    client,
    suggester.settings(),
    "content-index",        // source index
    "document"             // document type
);

suggester.indexer()

Registered: Fri Sep 19 09:08:11 UTC 2025

- Last Modified: Sun Aug 31 03:31:14 UTC 2025

- 12.1K bytes

- Viewed (1)

github.com/codelibs/fess-crawler

fess-crawler-opensearch/src/main/java/org/codelibs/fess/crawler/service/impl/AbstractCrawlerService.java

        }
    }

    /**
     * Checks if a document exists in the OpenSearch index for the given session ID and URL.
     *
     * @param sessionId The session ID of the document.
     * @param url The URL of the document.
     * @return true if the document exists, false otherwise.
     * @throws OpenSearchAccessException if the existence check fails.
     */

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Thu Aug 07 02:55:08 UTC 2025

- 34.2K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

README.md

- **SMB/CIFS**: Windows network shares
- **Storage**: Cloud storage systems (MinIO, S3-compatible)

### Content Formats

#### Office Documents
- Microsoft Office (Word, Excel, PowerPoint)
- OpenOffice/LibreOffice documents
- RTF, WordPerfect

#### PDFs and Images
- PDF documents (text and metadata extraction)
- Images (JPEG, PNG, GIF, TIFF, BMP)
- Image metadata (EXIF, IPTC, XMP)

#### Archives and Compressed Files

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Aug 31 05:32:52 UTC 2025

- 15.3K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/HtmlXpathExtractor.java

 * as well as the alt and title attributes.
 * </p>
 * <p>
 * The class uses {@link DOMParser} to parse HTML documents and {@link XPathAPI} to execute XPath queries.
 * It also provides methods to add custom features and properties to the {@link DOMParser}.
 * </p>
 * <p>

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Sun Jul 06 02:13:03 UTC 2025

- 10.3K bytes

- Viewed (0)

github.com/tiangolo/fastapi

docs/en/docs/contributing.md

* Do not change the paths in links to images, code files, Markdown documents.

* However, when a Markdown document is translated, the `#hash-parts` in links to its headings may change. Update these links if possible.
    * Search for such links in the translated document using the regex `#[^# ]`.
    * Search in all documents already translated into your language for `your-translated-document.md`. For example VS Code has an option "Edit" -> "Find in Files".

Registered: Sun Sep 07 07:19:17 UTC 2025

- Last Modified: Sat Jul 26 11:35:42 UTC 2025

- 14.9K bytes

- Viewed (0)

github.com/codelibs/fess-crawler

fess-crawler/src/main/java/org/codelibs/fess/crawler/extractor/impl/TikaExtractor.java

 *   <li>Tesseract OCR configuration for image-based documents</li>
 *   <li>PDF Parser configuration for PDF documents</li>
 * </ul>
 *
 * <p>
 * The {@link TikaDetectParser} inner class extends {@link CompositeParser} to provide auto-detection of the MIME type
 * of the document. It also handles zip bomb prevention and embedded document extraction.
 * </p>
 *
 * <p>

Registered: Sun Sep 21 03:50:09 UTC 2025

- Last Modified: Thu Aug 07 02:55:08 UTC 2025

- 30.7K bytes

- Viewed (0)

github.com/codelibs/fess-suggest

src/main/java/org/codelibs/fess/suggest/util/SuggestUtil.java

    }

    /**
     * Deletes documents from the specified index based on the given query.
     *
     * @param client the OpenSearch client to use for executing the query and delete operations
     * @param settings the settings for the suggest feature, including timeouts and scroll settings
     * @param index the name of the index from which documents should be deleted

Registered: Fri Sep 19 09:08:11 UTC 2025

- Last Modified: Mon Sep 01 13:33:03 UTC 2025

- 17.4K bytes

- Viewed (0)

Search Options