Commit 8688c9ab authored by Pablo Panero's avatar Pablo Panero
Browse files

Es central service

parent 270a2406
......@@ -13,7 +13,7 @@ Lets assume the following JSON schema and Elasticsearch mapping for our demo doc
```json
{
"title": "Custom record schema v0.0.1",
"id": "http://localhost:5000/schemas/doc-v0.0.1.json",
"id": "http://localhost:5000/schemas/cernsearch-test-doc_v0.0.1.json",
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
......@@ -97,8 +97,8 @@ curl -X GET -H 'Content-Type: application/json' -H 'Accept: application/json' \
### Query documents
In order to query documents we need to perform a *GET* operation. We can specify the amount of
documents to be returned (in total and per page), among other options. For a full list check refer to
{TODO INSERT LINK}
documents to be returned (in total and per page), among other options. For a full list check
[here](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html).
An example query for the terms _awesome_ and _document_ looks like this:
......@@ -107,6 +107,80 @@ curl -X GET -H 'Content-Type: application/json' -H 'Accept: application/json' \
'http://<host:port>/records/?q=awesome+document'
```
We can use pagination to restrict the amount of results. for example we are going to obtain the second page of a query
that get one element per page:
```bash
curl -X GET -H 'Content-Type: application/json' -H 'Accept: application/json' \
'http://<host:port>/api/records/?page=2&size=1'
```
The answer would look something similar to:
```json
{
"aggregations": {},
"hits": {
"hits": [
{
"created": "2018-03-19T08:16:53.218017+00:00",
"id": 5,
"links": {
"self": "http://<host:port>/api/record/5"
},
"metadata": {
"control_number": "5",
"description": "This is an awesome description for our first uploaded document",
"title": "Demo document"
},
"updated": "2018-03-19T08:16:53.218042+00:00"
}
],
"total": 2
},
"links": {
"prev": "http://<host:port>/api/records/?page=1&size=1",
"self": "http://<host:port>/api/records/?page=2&size=1"
}
}
```
Note the *links* field, which is very useful to process the results. Allowing us to get the current, next and previous
pages (only the _next_ for the first page, and only the _previous_ for the last page).
### Update documents
To update a document we need to perform a *PUT* operation over the _record_ endpoint. Therefore, the _ID_ or ETag of the
record is part of the URL. Nonetheless, due to workflow issues the data of the request *must* also contain this _ID_ in
the _control_number_ field.
```bash
curl -X PUT -H 'Content-Type: application/json' -H 'Accept: application/json' \
-i 'http://<host:port>/api/record/5' --data '
{
"control_number": "5",
"description": "This is an awesome updated description",
"title": "Update Test 1"
}
'
```
### Delete documents
To delete a document we need to perform a *DELETE* operation. For this we simply need to specify the document _ID_ by
querying the _record_ endpoint:
```bash
curl -XDELETE -H 'Content-Type: application/json' -H 'Accept: application/json' \
'http://<host:port>/api/record/5'
```
If afterwards we query (get,put,delete) for the specific item we will obtain a 410:
```json
{
"status": 410,
"message": "PID has been deleted."
}
```
## Setup
An instance can be deployed using the OpenShift template (can be found in _template/cern-search-api.yml_)
......@@ -116,11 +190,23 @@ Take into account:
The URI of the SQL database is set through a secret since it has to carry the user and password to access it. Therefore,
a secret must be created in OpenShift (e.g. running oc create -f <secret_file>). The following can be used as template:
```
```yaml
apiVersion: v1
kind: Secret
metadata:
name: srchdb-dev
stringData:
dburi: postgresql+psycopg2://user:password@host:port/databasename
```
ES Secret
```yaml
apiVersion: v1
kind: Secret
metadata:
name: es
stringData:
# Localhost
es_credentials: "[{'host': 'endpoint', 'url_prefix': '/es', 'port': 443, 'use_ssl': True, 'verify_certs': True, 'ca_certs':'/etc/pki/tls/certs/ca-bundle.trust.crt', 'http_auth': ('user','password')}]"
```
\ No newline at end of file
......@@ -2,16 +2,32 @@
# -*- coding: utf-8 -*-
from __future__ import absolute_import, print_function
from .modules.records.permissions import (record_read_permission_factory,
record_create_permission_factory,
record_update_permission_factory,
record_delete_permission_factory)
import copy
from invenio_oauthclient.contrib import cern
from .modules.cernsearch.permissions import (record_read_permission_factory,
record_create_permission_factory,
record_update_permission_factory,
record_delete_permission_factory)
def _(x):
"""Identity function used to trigger string extraction."""
return x
# OAuth Client
# ============
CERN_REMOTE_APP = copy.deepcopy(cern.REMOTE_APP)
CERN_REMOTE_APP["params"].update(dict(request_token_params={
"resource": "test-cern-search.cern.ch", # replace with your server
"scope": "Name Email Bio Groups",
}))
OAUTHCLIENT_REMOTE_APPS = dict(
cern=CERN_REMOTE_APP,
)
# JSON Schemas configuration
# ==========================
......@@ -24,21 +40,22 @@ JSONSCHEMAS_REGISTER_ENDPOINTS_UI = False
# Indexer
# =======
INDEXER_DEFAULT_DOC_TYPE = 'doc-v0.0.1'
INDEXER_DEFAULT_INDEX = 'records-doc-v0.0.1'
# TODO use ES central service. Change INDEXER_RECORD_TO_INDEX = 'invenio_indexer.utils.default_record_to_index'
INDEXER_DEFAULT_DOC_TYPE = 'test-doc_v0.0.1'
INDEXER_DEFAULT_INDEX = 'cernsearch-test-doc_v0.0.1'
# Search configuration
# =====================
SEARCH_MAPPINGS = ['records']
# SEARCH_ELASTIC_HOSTS = None # default localhost
SEARCH_MAPPINGS = ['cernsearch']
# Records REST configuration
# =====================
# ===========================
#: Records REST API configuration
_Record_PID = 'pid(recid, record_class="cern_search_rest.modules.records.api:CernSearchRecord")' # TODO
_Record_PID = 'pid(recid, record_class="cern_search_rest.modules.cernsearch.api:CernSearchRecord")' # TODO
RECORDS_REST_ENDPOINTS = dict(
docid=dict(
......@@ -50,7 +67,7 @@ RECORDS_REST_ENDPOINTS = dict(
item_route='/record/<{0}:pid_value>'.format(_Record_PID),
list_route='/records/',
links_factory_imp='invenio_records_rest.links:default_links_factory',
record_class='cern_search_rest.modules.records.api:CernSearchRecord', # TODO
record_class='cern_search_rest.modules.cernsearch.api:CernSearchRecord', # TODO
# record_loaders={ # TODO
# 'application/json': 'mypackage.loaders:json_loader'
# },
......@@ -60,7 +77,7 @@ RECORDS_REST_ENDPOINTS = dict(
},
search_class='invenio_search.api.RecordsSearch',
# search_factory_imp=search_factory(), # Default TODO
search_index='records-doc-v0.0.1',
search_index='cernsearch-test-doc_v0.0.1',
search_serializers={
'application/json': ('invenio_records_rest.serializers'
':json_v1_search'),
......@@ -74,4 +91,4 @@ RECORDS_REST_ENDPOINTS = dict(
delete_permission_factory_imp=record_delete_permission_factory,
# error_handlers={}, # TODO
)
)
\ No newline at end of file
)
{
"title": "Custom record schema v0.0.1",
"id": "http://localhost:5000/schemas/doc-v0.0.1.json",
"id": "http://localhost:5000/schemas/test-doc_v0.0.1.json",
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
......
......@@ -4,7 +4,7 @@
"index.mapping.total_fields.limit": 3000
},
"mappings": {
"doc-v0.0.1": {
"test-doc_v0.0.1": {
"numeric_detection": true,
"_all": {
"analyzer": "english"
......
#!/usr/bin/python
# -*- coding: utf-8 -*-
"""Helper methods for CERN Search records."""
from flask import g
from flask import current_app
from invenio_search import current_search
from invenio_search.utils import schema_to_index
def get_user_provides():
"""Extract the user's provides from g."""
return [need.value for need in g.identity.provides]
def cern_search_record_to_index(record):
"""Get index/doc_type given a record.
It tries to extract from `record['$schema']` the index and doc_type,
the index has `CERN_SEARCH_INDEX_PREFIX` as prefix or `CERN_SEARCH_DEFAULT_INDEX_PREFIX`
if it is not set up to be able to use the ES central service.
If it fails, return the default values. In this case the prefix is the default value.
:param record: The record object.
:returns: Tuple (index, doc_type).
"""
INDEX_PREFIX = current_app.config['CERN_SEARCH_DEFAULT_INDEX_PREFIX']
index_names = current_search.mappings.keys()
schema = record.get('$schema', '')
if isinstance(schema, dict):
schema = schema.get('$ref', '')
aux = current_app.config['CERN_SEARCH_INDEX_PREFIX']
if aux:
INDEX_PREFIX = aux
index, doc_type = schema_to_index(schema, index_names=index_names)
if index and doc_type:
return '{0}{1}'.format(INDEX_PREFIX, index), doc_type
else:
return ('{0}{1}'.format(current_app.config['CERN_SEARCH_DEFAULT_INDEX_PREFIX'],
current_app.config['INDEXER_DEFAULT_INDEX']),
current_app.config['INDEXER_DEFAULT_DOC_TYPE'])
#!/usr/bin/python
# -*- coding: utf-8 -*-
"""Helper methods for CERN Search records."""
from flask import g
def get_user_provides():
"""Extract the user's provides from g."""
return [need.value for need in g.identity.provides]
\ No newline at end of file
......@@ -6,5 +6,6 @@ invenio-jsonschemas>=1.0.0a5,<1.1.0
invenio-records-rest>=1.0.0b1,<1.1.0
invenio-records[postgresql]>=1.0.0b2
invenio-rest[cors]>=1.0.0b2
invenio-oauthclient>=1.0.0b5
invenio-search[elasticsearch5]>=1.0.0a10,<1.1.0
redis>=2.10.0
\ No newline at end of file
......@@ -44,11 +44,13 @@ install_requires = [
'invenio-app>=1.0.0b1,<1.1.0',
'invenio-base>=1.0.0a15,<1.1.0',
'invenio-config>=1.0.0b3,<1.1.0',
'invenio-indexer[elasticsearch5]>=1.0.0,<1.1.0',
'invenio-jsonschemas>=1.0.0a5,<1.1.0',
'invenio-records-rest>=1.0.0b1,<1.1.0',
'invenio-records-rest[elasticsearch5]>=1.0.0b1,<1.1.0',
'invenio-records[postgresql]>=1.0.0b2',
'invenio-rest[cors]>=1.0.0b2',
'invenio-search>=1.0.0a10,<1.1.0',
'invenio-oauthclient>=1.0.0b5',
'invenio-search[elasticsearch5]>=1.0.0a10,<1.1.0',
'redis>=2.10.0',
]
......@@ -78,10 +80,10 @@ setup(
'cern_search_rest = cern_search_rest.config',
],
'invenio_search.mappings': [
'records = cern_search_rest.modules.records.mappings',
'cernsearch = cern_search_rest.modules.cernsearch.mappings',
],
'invenio_jsonschemas.schemas': [
'cern_search_rest_schemas = cern_search_rest.modules.records.jsonschemas'
'cern_search_rest_schemas = cern_search_rest.modules.cernsearch.jsonschemas'
],
},
extras_require=extras_require,
......
......@@ -51,21 +51,21 @@ objects:
- configMapRef:
name: env-configmap
env:
- name: INVENIO_SEARCH_ELASTIC_HOSTS
valueFrom:
secreteKeyRef:
name: es
key: es_credentials
- name: INVENIO_SQLALCHEMY_DATABASE_URI
valueFrom:
secretKeyRef:
name: srchdb
key: dburi
- name: CERN_APP_CREDENTIALS_KEY
valueFrom:
secretKeyRef:
name: oauth
key: clientid
- name: CERN_APP_CREDENTIALS_SECRET
- name: INVENIO_CERN_APP_CREDENTIALS
valueFrom:
secretKeyRef:
name: oauth
key: clientkey
key: oauth_credentials
image: gitlab-registry.cern.ch/ppanero/cern_search_rest:latest
imagePullPolicy: Always
name: cern-search-api
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment