diff --git a/docs/pics/monitoring-architecture-in-alien.png b/docs/pics/monitoring-architecture-in-alien.png new file mode 100644 index 0000000000000000000000000000000000000000..2097be93a72eb6a77e5b60a5ed888080f7cd6042 Binary files /dev/null and b/docs/pics/monitoring-architecture-in-alien.png differ diff --git a/docs/site/monalisa.md b/docs/site/monalisa.md new file mode 100644 index 0000000000000000000000000000000000000000..905bc47619712411f0d2b008ff90aa881a1de6db --- /dev/null +++ b/docs/site/monalisa.md @@ -0,0 +1,46 @@ +# ALICE Grid Monitoring with MonALISA + +When talking about a worldwide distributed system, like the ALICEs Grid, you have to take into consideration various platforms, software and consequently, error conditions. +In order to quickly understand what is happening in a system of this scale, monitoring should provide a global view of the entire system. + +It is important to be able to correlate the evolution of various monitored parameters, on different grid sites or in relation with the central services parameters. +Aside from that, the monitoring system must be non-intrusive, accurate and it should provide both historical and near real-time image of the Grids status and performance. + +Based on these requirements, [MonALISA](https://github.com/MonALISA-CIT/) framework was chosen to monitor the entire JAliEn Grid system. +Currently almost all JAliEn components are monitored as shown in the table below: + +| | | +|:-|:-| +| __Central Services__ | Task Queue, Information Service, Optimizers, API Service etc. | +| __Site Services__ | Job Agents, Cluster Monitor, Computing and Storage Elements | +| | LCG Services (on VOBoxes) | +| __Jobs__ | Job status and resource usage | +| | Network traffic inter/intra-site | + + +## Monitoring Architecture in JAliEn + +JAliEn monitoring follows closely the MonALISA architecture: each JAliEn service, including the Job Agent, is instrumented with ApMon, the Perl and C++ versions. +It regularly sends monitoring data to the local MonALISA service running on the site. +Here, data from all the services, jobs and nodes is aggregated, the site profile being generated with a resolution of 2 minutes. +Local on-site MonALISA services keep a short (in memory only) history about each received or aggregated parameter. +All these can be queried with a MonALISA GUI Client. +Only the aggregated data is collected by the MonALISA Repository for long term histories. + + + +## Deployment and Configuration + +For JAliEn monitoring, MonALISA is packaged and prepared for installation by the JAliEn Build and Test System, deployed in CMVFS. + +Configuration files for MonALISA are generated automatically from JAliEn LDAP at startup. +If a MonALISA entry for the site is not present in LDAP, MonALISA won't start. + +Then, MonALISA behaves like any other AliEn service using the following commands: + +| Action | Command | +|:-------|:--------| +| Start | ```~$ alien StartMonaLisa``` | +| Stop | ```~$ alien StopMonaLisa``` | +| Check status | ```~$ alien StatusMonaLisa``` | + diff --git a/docs/site/vobox.md b/docs/site/vobox.md index 3b40f32f25f77fd64605879691fb98579c594e81..411796ff44871f74288bc0e4ba6cdb31af94a8a4 100644 --- a/docs/site/vobox.md +++ b/docs/site/vobox.md @@ -9,8 +9,8 @@ See the following quick links to setup steps depending on your preferred deploym | | | |-|-| -| __Generic/VM__ | Step 1: [General requirements](#requirements), [Network setup](#requirements)<br>Step 2: [WLCG VO-Box Installation](#wlcg-vo-box)<br>Step 3: [HTCondor/ARC Specifics](../vobox_htc_arc/) | -| __Container__ | Step 1: [Container requirements](../vobox_container/#requirements), [Network Setup](../vobox_container/#setup-networking)<br>Step 2: [Install HTCondor/ARC VOBox container](../vobox_container/#create-container) | +| __Generic/VM__ | Step 1: [General requirements](#requirements), [Network setup](#requirements)<br>Step 2: [WLCG VO-Box Installation](#wlcg-vo-box)<br>Step 3: [HTCondor/ARC Specifics](../vobox_htc_arc/)<br>Step 4: [Grid Monitoring: MonALISA](../monalisa/#deployment-and-configuration) | +| __Container__ | Step 1: [Container requirements](../vobox_container/#requirements), [Network Setup](../vobox_container/#setup-networking)<br>Step 2: [Install HTCondor/ARC VOBox container](../vobox_container/#create-container)<br>Step 3: [Grid Monitoring: MonALISA](../monalisa/#deployment-and-configuration) | ## Requirements @@ -18,7 +18,7 @@ General requirements for the VO node agents/services are as follows: | | | |-|-| -| __OS__ | SL6 or CentOS/EL7, 64-bit Linux. The machine usually will need to be a WLCG VOBOX | +| __OS__ | CentOS/EL7, 64-bit Linux. The machine usually will need to be a WLCG VOBOX | | __Hardware__ | Minimum 4GB RAM, any standard CPU, 20GB for logs, 5GB cache | ## Network diff --git a/docs/site/vobox_htc_arc.md b/docs/site/vobox_htc_arc.md index 8dfa6363bf817973ad5316b9afdd01670beedb60..a3148339175d1d46cf025e1e14d1a8a8b940a318 100644 --- a/docs/site/vobox_htc_arc.md +++ b/docs/site/vobox_htc_arc.md @@ -6,7 +6,7 @@ Reference the appropriate section as needed. ## HTCondor The VOBox will run its own HTCondor services that are __independent__ of the HTCondor services for your CE and batch system. -The following instructions assume you are using __SL6.8+__ or __CentOS/EL 7.5+__. +The following instructions assume you are using __CentOS/EL 7.5+__. ### Install HTCondor @@ -20,7 +20,6 @@ The following instructions assume you are using __SL6.8+__ or __CentOS/EL 7.5+__ | | | |-|-| - |__SL6__| ```~# wget http://research.cs.wisc.edu/htcondor/yum/repo.d/htcondor-stable-rhel6.repo``` | |__CentOS/EL7__| ```~# wget http://research.cs.wisc.edu/htcondor/yum/repo.d/htcondor-stable-rhel7.repo``` | 3. Import RPM key for the repository: diff --git a/docs/site/vobox_proxy.md b/docs/site/vobox_proxy.md index 09141ef87602ecf3e2907f2509624a9c732e6cb6..7de0b0d91beb07223718246051a6333084a3bfe4 100644 --- a/docs/site/vobox_proxy.md +++ b/docs/site/vobox_proxy.md @@ -58,11 +58,6 @@ Proxies can be examined in two ways: LCG-UI> voms-proxy-info -all [-f ] ``` -!!! info "VOMS-extended Proxy" - Since gLite 3.1, the job submission services need a VOMS-extended proxy, i.e. a proxy with some VO-specific information attached; in our implementations, these are just VO membership and role. - For more information about VOMS please refer to the relevant gLite user guide section and the [VOMS user guide](http://www.google.it/url?sa=t&ct=res&cd=3&url=http%3A%2F%2Fegee.cesnet.cz%2Fen%2Fvoce%2Fvoms-guide.pdf&ei=QQJySL_lI4mC7gXLgvCCBA&usg=AFQjCNGcfSHrU1aE0QeP0mojS02ppdtYTA&sig2=HclwwnXSfEKv39I4Ml931Q). - Thus, two proxies need to have VOMS extensions; the user proxy, and the login proxy. - ### The _login proxy_ This is a plain user proxy that the VO-Box administrator uses to log in on the machine, via: diff --git a/mkdocs.yml b/mkdocs.yml index 74bf77d2ddb732c1d45313880ef9b5d408a49890..c732a0734f61fa6fbe6d8ea48587715e0595a663 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -40,6 +40,7 @@ nav: - VOBox (Container): site/vobox_container.md - VOBox (HTCondor/ARC): site/vobox_htc_arc.md - Manage VOBox Proxy: site/vobox_proxy.md + - "Monitoring: MonALISA": site/monalisa.md - Computing Element Guide: site/ce_guide.md - Reference: - alien.py: alienpy_commands.md