Adding CORS Headers Using Lambda@Edge and Amazon CloudFront

Cross-origin resource sharing (CORS) defines a way for client web applications that are loaded in one domain to interact with resources in a different domain, it allows to build rich client-side web applications and selectively allow cross-origin access to your resources. To see how to enable CORS on most web servers see https://enable-cors.org/index.html.

If you’re using Amazon CloudFront to serve your content, you will need to configure the distribution to respect the CORS settings by forwarding specific headers in the CloudFront behavior.

When HTTP headers are forwarded to the origin, CloudFront caches separate versions of a specified object based on the header values in viewer requests. It is a best practice to not forward requests to the origin if the requested object is already cached at the edge location, as this decreases the cache Hit ratio. Response bodies in requests that contain the Origin header are the same even if the values of the Origin header are different.

In this post we will leverage Lambda@Edge to set the CORS headers when the request contains the Origin header. With this solution, you don’t need to enable CORS at the origin or to forward the Origin Header in the CloudFront distribution.

Lambda@Edge provides the ability to execute a Lambda function at an Amazon CloudFront Edge Location. This capability enables intelligent processing of HTTP requests at locations that are close (for the purposes of latency) to your customers. To get started, you simply upload your code (Lambda function written in Node.js) and pick one of the CloudFront behaviours associated with your distribution.

You can run a Lambda@Edge function in response to four different CloudFront events:

      Viewer Request: When CloudFront receives a request from a viewer
      Origin Request: Before CloudFront forwards a request to the origin
      Origin Response: When CloudFront receives a response from the origin
      Viewer Response: Before CloudFront returns the response to the viewer

For more information about Lambda@Edge and how it works, see: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-at-the-edge.html.

For the purpose of this blog post, we’ll be focusing on the Viewer Response Event, I use the below Node.js code in the main index.js file. For step by step guide on how to create a Lambda function and associate it to a CloudFront behaviour see https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-edge-how-it-works-tutorial.html.

Viewer Response Code Snippet

'use strict';
exports.handler = (event, context, callback) => {

//Get contents of response
const response = event.Records[0].cf.response;
const headers = response.headers;

if ('origin' in event.Records[0].cf.request.headers) {
//The Request contains the Origin Header - Set CORS headers
headers['access-control-allow-origin'] = [{key: 'Access-Control-Allow-Origin', value: "*"}];
headers['access-control-allow-methods'] = [{key: 'Access-Control-Allow-Methods', value: "GET, HEAD"}];
headers['access-control-max-age'] = [{key: 'Access-Control-Max-Age', value: "86400"}];
}
//Return modified response
callback(null, response);
};

I’ve associated the above Node.js code to a CloudFront distribution behaviour (Viewer Response) that has as Origin an S3 bucket for which CORS is not enabled (note the CORS Headers and the Hit from CloudFront on the second request):

$ curl -v http://d2fyouwzorwsid.cloudfront.net/file.txt >/dev/null
> GET /file.txt HTTP/1.1
> Host: d2fyouwzorwsid.cloudfront.net
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Content-Length: 5
< Connection: keep-alive
< x-amz-id-2: JYE+LYMvI6Vmu13MCoPJZtpF76Ozs3W5AiY6KvwrG8t4KEq3/CcSE1WCR85ra6f9yxSh1zqGIi4=
< x-amz-request-id: 1FE8E25F4E96C9AF
< Date: Thu, 02 May 2019 19:15:34 GMT
< Last-Modified: Thu, 02 May 2019 19:13:46 GMT
< ETag: "2205e48de5f93c784733ffcca841d2b5"
< Cache-Control: max-age=500
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: Miss from cloudfront
< Via: 1.1 2e50d9b1ee017f302768660f02b7418e.cloudfront.net (CloudFront)
< X-Amz-Cf-Id: fWxXPrnynlKchjfD3gS3q4QxvAsPyVH1XEZRYKhR3pdO3NPlpE04Jg==

$ curl -v -H "Origin:www.example.com" http://d2fyouwzorwsid.cloudfront.net/file.txt >/dev/null
> GET /file.txt HTTP/1.1
> Host: d2fyouwzorwsid.cloudfront.net
> User-Agent: curl/7.54.0
> Accept: */*
> Origin:www.example.com
>
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Content-Length: 5
< Connection: keep-alive
< x-amz-id-2: JYE+LYMvI6Vmu13MCoPJZtpF76Ozs3W5AiY6KvwrG8t4KEq3/CcSE1WCR85ra6f9yxSh1zqGIi4=
< x-amz-request-id: 1FE8E25F4E96C9AF
< Date: Thu, 02 May 2019 19:15:34 GMT
< Last-Modified: Thu, 02 May 2019 19:13:46 GMT
< ETag: "2205e48de5f93c784733ffcca841d2b5"
< Cache-Control: max-age=500
< Accept-Ranges: bytes
< Server: AmazonS3
< Age: 11
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: GET, HEAD
< Access-Control-Max-Age: 86400
< X-Cache: Hit from cloudfront
< Via: 1.1 1448f69604d5be1f9c9f0c64cfa90595.cloudfront.net (CloudFront)
< X-Amz-Cf-Id: CNvc-aOmxsrlAbogKUOYTBm-oOEV3VQjawQZ5LCj5CW1VYxK97a8-A==

Congratulations! You have successfully implemented cross-origin resource sharing without enabling CORS at the origin server or forwarding the Origin Header to the origin.

Elasticsearch and Kibana Search Speed Tuning

This blog post provides some tips for Elasticsearch and Kibana Search Speed Tuning, apart from those recommended by Elasticsearch official website [1]:

  • giving enough memory to the filesystem cache
  • using faster hardware
  • searching as few fields as possible
  • pre-indexing data
  • avoiding scripts
  • considering mapping identifiers as keywords
  • etc.
GIS Map using Vega

ELK Stack

I recently installed the ELK Stack (Elasticsearch, Logstash and Kibana) on a single node cluster (m4.large EC2 instance type) to analyze Amazon S3 and CloudFront logs, and tried to improve search speed (I’ll not discuss indexing speed in this post). In production it’s recommended to have at least 3 nodes for a better performance, so I had to do some more tuning, in addition to the above.

1. Shard Allocation

The shards are the data containers for Elasticsearch and the number of shards have the effect on the performance of ES cluster. Since a shard is essentially a Lucene index, it consumes file handles, memory, and CPU resources. If there is a large number of shards, it will use too much resources from the cluster to manage the cluster activities; the cluster will have a large cluster state table and each search request has to touch a copy of every shard in the indexes, this will be a resource consuming process for the cluster. Also, the more shards the Elasticsearch cluster has, the more likely you are to get the “courier fetch error”.

If indexes are less than 30GB in size, 1 shard (primary) per index would be enough. Index Template [2] can be used to set a default for all indices as follows:

1.1. Check the number of shards and indices in the cluster by calling the _cat API operation

GET _cat/indices/logstash*

1.2. Define an index template, which specifies the number of shards for all new indices that will be created in the cluster, i.e.:

POST _template/default
{
"index_patterns": ["*"],
"settings": {
"number_of_shards": "1",
"number_of_replicas": "0"
}
}

1.3. Re-index by calling the _reindex API operation (this will move the data to a new index with the number of shards that was specified in the index template)

POST _reindex
{
"source": {
"index": "logstash-2019.02.06"
},
"dest": {
"index": "logstash-2019.02.06.new"
}
}

2. JVM Memory pressure and Heap Sizing

In Elasticsearch, JVM memory pressure indicates the fill rate of the old generation pool. Two common factors jointly contribute to the JVM memory pressure: the amount of data on the cluster, and the workload on the cluster (based on the cluster’s size). The full Java Garbage Collection can stop all the operation in the node, it also takes a lot of resources to the cluster (CPU, memory, etc.) when it is in action. It’s recommended to keep JVM Memory Pressure to around 70% [3].

  • When the heap size reaches 75%, Elasticsearch uses Concurrent Mark and Sweep (CMS) as default garbage collector(GC) to de-allocate space on the heap. This algorithm happens concurrently with other operations.
  • If previous GC operations have not restored enough memory and utilization is still above 75%, ES temporarily halts or slows processing of other threads to free memory using a different algorithm for GC. This compounds problems in the cluster as requests are backlogged, but it is necessary because otherwise an OOM error would occur quickly.
  • When the heap size is near 95%, ElasticSearch will kill the thread of the process that is trying to allocate memory. Because memory is a common resource, it is not guaranteed that this action won’t kill a critical process.

The less heap memory is allocated to Elasticsearch, the more RAM remains available for other OS processes and Lucene, which relies heavily on the file system cache to serve requests quickly. It’s recommended to set the Heap Size to around 30% ~ 60% of available memory and monitor the JMV Heap, make sure it’s less than 75%.

To set the minimum and maximum JVM Heap sizes, change the values of -Xms and –Xmx in jvm.options respectively, environment variables can also be used; it’s recommended to have the same value for -Xms and -Xmx [4].

3. Swapping

Given the memory used by ELK Stack, a Swap memory [5] can be added to the server (if there is no swap) and disable Swap utilization for Elasticsearch; swapping makes things slow as disks are slower than the memory [4]. This property can be set in the YAML file:
bootstrap.memory_lock: true

4. Kibana Dashboards

Use a single dashboard per page, containing different visualizations. Indeed, Kibana sends all queries related to Visualizations in a Dashboard in a single _msearch request, which executes in parallel. When a page has different dashboards, requests are sent individually, causing some visualizations to be rendered a bit quickly, but the entire dashboard as a whole to be rendered more slowly (from start to finish). This should be a bad experience if the user is on a slow connection, some dashboards might never load. Another drawback of having different dashboards on a single page is that it requires more ES resources, increasing the server Memory and CPU Utilization.

5. Refresh Interval (5s by default)

Elasticsearch creates new segment every time a refresh happens. Increasing the refresh interval would help reduce the segment count and reduce the IO cost for search. And, the cache would be invalid once a refresh happens and data is changed. Increasing the refresh interval can make Elasticsearch utilize cache more efficiently. To increase the Refresh Interval:
PUT /logstash-2019.02.06/_settings
{
"index" : {
"refresh_interval" : "60s"
}
}

6. Shard Request Cache

The shard-level request cache module caches the local results on each shard, which allows frequently used search requests to return results almost instantly. To increase (decrease) it size change indices.requests.cache.size in Elasticsearch configuration file:
indices.requests.cache.size: 10%
The request below can be used to check whether the shard query cache has an effect:
GET logstash*/_stats/request_cache?human
It should display something like the following
***
"_all": {
"primaries": {
"request_cache": {
"memory_size": "17.1mb",
"memory_size_in_bytes": 17975017,
"evictions": 0,
"hit_count": 8915,
"miss_count": 1955
}
},
"total": {
"request_cache": {
"memory_size": "17.1mb",
"memory_size_in_bytes": 17975017,
"evictions": 0,
"hit_count": 8915,
"miss_count": 1955
}
}
},
***

7. Use Filter Context instead of Query Context

Some visualizations on the current dashboards are based on queries. It’s recommended, whenever possible, to use Filter Context, as frequently used filters are cached automatically to speed up performance [5].

References:
[1] Tune Elasticsearch for Search Speed
[2] Indice Template
[3] Memory Pressure Indicator
[4] Heap – Sizing and Swapping
[5] Query and filter context

GIS Map in Elasticsearch with Kibana using Vega, Scripted Fields & Painless

Elasticsearch is a distributed open source, RESTful search engine built on top of Apache Lucene and released under an Apache license. Kibana is an open source data visualization plugin for Elasticsearch. Vega (and Vega-lite) allows to beyond the built-in visualizations offered by Kibana.

In this short tutorial we will use Vega to create a GIS map that displays individual documents in Elasticsearch into a Kibana map as marks, that have different shapes and colors, with information about the documents on the marks.

The graph will look like

GIS Map using Vega

GIS Map using Vega

{
"$schema": "https://vega.github.io/schema/vega/v3.0.json",
config: {
kibana: {
type: map
latitude: 15
longitude: -10
zoom: 3
}
}
"data": [
{
"name": "observations",
"url": {
index: logstash-2018*
"body": {
"size": 10000
_source: {
includes: ["@timestamp", "items_per_minute", "school", "geoip.location", "county", "subcounty","zone", "lesson_start_time","subject", "end_time", "start_time", "class"]
}
script_fields : {
lesson_duration : {
script : {
lang: 'painless',
source: doc['end_time'].date.minuteOfDay - doc['start_time'].date.minuteOfDay;
}
}
lesson_date : {
script : {
lang: 'painless',
// Get the time in the correct format
source: doc['start_time'].date.yearOfEra + '-' + doc['start_time'].date.monthOfYear + '-' + doc['start_time'].date.dayOfMonth
}
}
grade : {
script : {
lang: 'painless',
"source": "if(doc.containsKey('formId.keyword')){if(doc['formId.keyword'].value == 'Gradethreeobservationtool'){return 3;}if(doc['formId.keyword'].value == 'maths-grade3'){return 3;}if(doc['formId.keyword'].value == 'class-12-lesson-observation-with-pupil-books'){if(doc.containsKey('class.keyword')){return doc['class.keyword'].value;}}if(doc['formId.keyword'].value == 'maths-teachers-observation-tool'){if(doc.containsKey('class.keyword')){return doc['class.keyword'].value;}}}"
}
}
}
"sort": {"start_time":"desc"}
"query": {
"bool": {
"must": [
// This string will be replaced with the auto-generated "MUST" clause
"%dashboard_context-must_clause%",
// apply timefilter (upper right corner) to the @timestamp variable
{
"range": {
"@timestamp": {
// "%timefilter%" will be replaced with the current
// values of the time filter (from the upper right corner)
"%timefilter%": true
// Only work with %timefilter%
// Shift the current timefilter by 10 units back
"shift": 10,
// supports week, day (default), hour, minute, second.
"unit": "minute"
}
}
}
{"exists": {"field": "geoip.location"}}
{"exists": {"field": "school"}}
{"exists": {"field": "start_time"}}
],
"must_not": [
// This string will be replaced with the auto-generated "MUST-NOT" clause
"%dashboard_context-must_not_clause%"
{"match": { "school": "orphaned" }}
]
}
}
}}
"format": { "type": "json", "property": "hits.hits"}
"transform": [
{
"lookup": "geoip_location",
"type": "geopoint",
"projection": "projection",
"fields": [
_source.geoip\.location.lon
_source.geoip\.location.lat
]
}
]
}
{
"name": "selected",
"source": "observations",
"transform": [
{
"type": "filter",
"expr": "datum === selected"
}
]
}
]
"signals": [
{
"name": "selected",
"value": null,
"on": [
{"events": "symbol:mouseover", "update": "datum"},
{"events": "symbol:mouseout", "update": "null"}
]
}
],
"marks": [
{
name: "observation"
type: "symbol"
from: {data: "observations"}
encode: {
"enter": {
// different shapes for grades
"shape": [
{"test": "datum.fields.grade == 1", "value": "square"}
{"test": "datum.fields.grade == 2", "value": "triangle-up"}
{"value": "circle"}
]
// different colors for subjects
"fill": [
{"test": "datum._source.subject === 'English'", "value": "#159"}
{"test": "datum._source.subject === 'Maths'", "value": "#195"}
{"value": "#815"}
]
"size": {"value": "200"}
},
update: {
xc: {signal: "datum.x"}
yc: {signal: "datum.y"}
tooltip: {
signal: "{School: datum._source.school, County: datum._source.county, Subcounty: datum._source.subcounty, Zone: datum._source.zone, Subject: datum._source.subject, Grade:''+ datum.fields.grade, 'Items Per Minute': datum._source.items_per_minute, Date: ''+datum.fields.lesson_date, 'Start Time': hours(datum._source.start_time) + 3 + ':' + minutes(datum._source.start_time), 'End Time': hours(datum._source.end_time) + 3 + ':' + minutes(datum._source.end_time) + ' (' + datum.fields.lesson_duration + ' minutes)'}"
}
"fillOpacity": {"value": 0.7}
}
"hover": {
"fillOpacity": {"value": 0.3}
}
}
}
]
}

How to create a CloudFront distribution with AWS SDK for Go

Amazon Web Services provides different SDKs, Toolkits and Command Line Tools to develop and manage application running on AWS. AWS SDK for Go is one of the latest tools provided. New versions are pushed almost every 5 days.

In this blog post, we will write a simple Go code to create a CloudFront distribution with the default settings and the following:

  • An S3 bucket as origin for the distribution
  • A Lambda@Edge function associated to the default behavior
  • A WAF Rule

For more information about:

  • CloudFront
  • Installing and configuring AWS SDK for Go
  • CloudFront APIs with AWS SDK for Go
  • Lambda@Edge
  • WAF (Web Application Firewal)

  • package main

    import (
    "fmt"
    "github.com/aws/aws-sdk-go/aws"
    "github.com/aws/aws-sdk-go/aws/session"
    "github.com/aws/aws-sdk-go/service/cloudfront"
    "github.com/aws/aws-sdk-go/aws/awserr"
    )

    func main() {

    creds := aws.Creds(accessKey, secretKey, "")
    svc := cloudfront.New(creds, "us-east-1", nil)

    // svc := cloudfront.New(session.New()) Can replace the 2 lines above if using Instance Role or Env. Variables

    input := &cloudfront.CreateDistributionWithTagsInput{
    Tags: &cloudfront.Tags{
    Items: []*cloudfront.Tag{
    },
    },
    DistributionConfig: &cloudfront.DistributionConfig{
    CallerReference: aws.String("Sat Sept. 30 2017"),
    Comment: aws.String("My WordPress Blog"),
    Enabled: aws.Bool(true),
    WebACLId: aws.String("eSamplec-5a3e-4857-9b92-0a5Sample3a4"),
    Origins: &cloudfront.Origins{
    Quantity: aws.Int64(1),
    Items: []*cloudfront.Origin{
    {
    Id: aws.String("Jil_S3Origin"),
    DomainName: aws.String("mydomain.com.s3.amazonaws.com"),
    S3OriginConfig: &cloudfront.S3OriginConfig{
    OriginAccessIdentity: aws.String(""),
    },
    },
    },
    },
    DefaultCacheBehavior: &cloudfront.DefaultCacheBehavior{
    TargetOriginId: aws.String("Jil_S3Origin"),
    MinTTL: aws.Int64(10),
    ViewerProtocolPolicy: aws.String("allow-all"),
    LambdaFunctionAssociations: &cloudfront.LambdaFunctionAssociations{
    Quantity: aws.Int64(1),
    Items: []*cloudfront.LambdaFunctionAssociation{
    {
    EventType: aws.String("viewer-request"), // "viewer-request" | "viewer-response" | "origin-request" | "origin-response"
    LambdaFunctionARN: aws.String("arn:aws:lambda:us-east-1:123456789012:function:myFunctionName:2"), // the version of the function must be added
    },
    },
    },
    TrustedSigners: &cloudfront.TrustedSigners{
    Enabled: aws.Bool(false),
    Quantity: aws.Int64(0),
    },
    ForwardedValues: &cloudfront.ForwardedValues{
    Cookies: &cloudfront.CookiePreference{
    Forward: aws.String("none"),
    },
    QueryString: aws.Bool(false),
    },
    },
    },
    }

    result, err := svc.CreateDistributionWithTags(input)

    if err != nil {
    if aerr, ok := err.(awserr.Error); ok {
    switch aerr.Code() {
    case cloudfront.ErrCodeCNAMEAlreadyExists:
    fmt.Println(cloudfront.ErrCodeCNAMEAlreadyExists, aerr.Error())
    case cloudfront.ErrCodeDistributionAlreadyExists:
    fmt.Println(cloudfront.ErrCodeDistributionAlreadyExists, aerr.Error())
    case cloudfront.ErrCodeInvalidOrigin:
    fmt.Println(cloudfront.ErrCodeInvalidOrigin, aerr.Error())
    case cloudfront.ErrCodeInvalidOriginAccessIdentity:
    fmt.Println(cloudfront.ErrCodeInvalidOriginAccessIdentity, aerr.Error())
    case cloudfront.ErrCodeAccessDenied:
    fmt.Println(cloudfront.ErrCodeAccessDenied, aerr.Error())
    case cloudfront.ErrCodeTooManyTrustedSigners:
    fmt.Println(cloudfront.ErrCodeTooManyTrustedSigners, aerr.Error())
    case cloudfront.ErrCodeTrustedSignerDoesNotExist:
    fmt.Println(cloudfront.ErrCodeTrustedSignerDoesNotExist, aerr.Error())
    case cloudfront.ErrCodeInvalidViewerCertificate:
    fmt.Println(cloudfront.ErrCodeTooManyCertificates, aerr.Error())
    case cloudfront.ErrCodeInvalidLocationCode:
    fmt.Println(cloudfront.ErrCodeInvalidLocationCode, aerr.Error())
    case cloudfront.ErrCodeInvalidGeoRestrictionParameter:
    fmt.Println(cloudfront.ErrCodeInvalidGeoRestrictionParameter, aerr.Error())
    case cloudfront.ErrCodeInvalidProtocolSettings:
    fmt.Println(cloudfront.ErrCodeInvalidProtocolSettings, aerr.Error())
    case cloudfront.ErrCodeInvalidTTLOrder:
    fmt.Println(cloudfront.ErrCodeInvalidTTLOrder, aerr.Error())
    case cloudfront.ErrCodeInvalidWebACLId:
    fmt.Println(cloudfront.ErrCodeInvalidWebACLId, aerr.Error())
    case cloudfront.ErrCodeTooManyOriginCustomHeaders:
    fmt.Println(cloudfront.ErrCodeTooManyOriginCustomHeaders, aerr.Error())
    case cloudfront.ErrCodeTooManyQueryStringParameters:
    fmt.Println(cloudfront.ErrCodeTooManyQueryStringParameters, aerr.Error())
    case cloudfront.ErrCodeInvalidQueryStringParameters:
    fmt.Println(cloudfront.ErrCodeInvalidQueryStringParameters, aerr.Error())
    case cloudfront.ErrCodeTooManyDistributionsWithLambdaAssociations:
    fmt.Println(cloudfront.ErrCodeTooManyDistributionsWithLambdaAssociations, aerr.Error())
    case cloudfront.ErrCodeTooManyLambdaFunctionAssociations:
    fmt.Println(cloudfront.ErrCodeTooManyLambdaFunctionAssociations, aerr.Error())
    case cloudfront.ErrCodeInvalidLambdaFunctionAssociation:
    fmt.Println(cloudfront.ErrCodeInvalidLambdaFunctionAssociation, aerr.Error())
    case cloudfront.ErrCodeInvalidOriginReadTimeout:
    fmt.Println(cloudfront.ErrCodeInvalidOriginReadTimeout, aerr.Error())
    case cloudfront.ErrCodeInvalidOriginKeepaliveTimeout:
    fmt.Println(cloudfront.ErrCodeInvalidOriginKeepaliveTimeout, aerr.Error())
    default:
    fmt.Println(aerr.Error())
    }
    } else { // Print the error, cast err to awserr.Error to get the Code and Message from an error.
    fmt.Println(err.Error())
    }
    return
    }
    fmt.Println(result)
    }

14 ans déjà que le Lion du Panjshir nous a quitté

Le Lion du Panjshir

Commandant Massoud

09 septembre 2001 – 09 septembre 2015, 14 ans déjà que le Lion du Panjshir nous a quitté. Celui qui a combattu les russes, les Talibans et Al Qaïda est mort suite à un attentat-suicide de deux prétendus journalistes qui font exploser leur camera pendant une interview.

A 25 ans (1978) Ahmed Chah Massoud (qui deviendra plus tard le Commandant Massoud) crée et prend la tête du “Conseil de surveillance”, il est un tacticien et un stratège hors pair et le seul chef de la Résistance à avoir jamais réussi à imposer une trêve avec l’Armée Rouge en échange de son retrait.

Il périt deux jours avant les attentats du 11 septembre, il avait pourtant plusieurs fois tenté d’alerter la communauté internationale sur le danger Al Qaïda.

RIP‬ Massoud!

Rapport mHealth au Congo – Etude de faisabilité et recommandations

De septembre à décembre 2013, j’ai mené une étude de faisabilité de l’utilisation de la téléphonie mobile pour l’amélioration de la couverture vaccinale au Congo (Brazzaville).

Objectifs de l’étude

L’objectif de la mission était d’étudier la faisabilité tant sur les plans technique, organisationnel que technologique du projet d” « utilisation de la téléphonie mobile pour l’amélioration de la couverture des interventions à haut impact au Congo : Vaccination, déparasitage, supplémentation en vitamine A » et de réviser le draft du projet sur la base des résultats et du contexte effectif au Congo.

L’étude devrait fournir suffisamment d’informations à l’UNICEF et aux principaux partenaires (gouvernement, compagnies de téléphonie mobile, etc.) pour leur permettre de valider l’implémentation du projet. Elle devra également permettre d’orienter les choix stratégiques et les modalités de mise en œuvre d’un tel projet, les informations suivantes doivent en ressortir :

  • Les usages des téléphones portables par les ménages au Congo,
  • Mapping des projets utilisant la téléphonie mobile et évaluation des bonnes
    pratiques en matière de téléphonie mobile pour le développement au Congo,
  • Développement du document du projet pilote dans les départements de Brazzaville et Pointe Noire (plan d’implémentation, budget détaillé, équipe projet avec rôle de chaque partie prenante, suivi et évaluation, risques du projet, etc.),
  • Recommandations sur l’implémentation du projet pilote, les négociations avec les compagnies de téléphonie mobile, l’appropriation par le gouvernement et son
    extension à l’ensemble du territoire congolais.

Méthodologie utilisée

Etude de faisabilité : Méthodologie

Télécharger le rapport

Rapport sur l’étude de faisabilité et les recommandations (PDF, 40 pages, 1.9Mo)

 

Téléphonie mobile pour le développement – Risques et mesures d’atténuation

Total African Mobile Connections and Penetration Rate (million, percentage penetration).

Total African Mobile Connections and Penetration Rate (million, percentage penetration).

Plusieurs organisations internationales et ONG (UNESCO, UNICEF, Plan International, Carter Center, etc.) souhaitent profiter du taux de pénétration de la téléphonie mobile dans les pays en développement pour améliorer les conditions de vie des populations. De plus en plus projets sur la santé, l’éducation, la bonne gouvernance, etc.  incluant les téléphones portables voient donc le jour, un bon pourcentage utilise les SMS (Short Message Services) via les applications telles que RapidSMS et FrontLineSMS.

Comme pour toute initiative, en particulier celles basées sur les technologies de l’information, plusieurs risques peuvent impacter son bon déroulement et affecter de façon significative les résultats attendus. Dans le tableau ci-dessous un certain nombre de risques et des mesures d’atténuation à prendre en compte dans la conception du projet, risques à adapter en fonction du projet bien évidemment ;-), divisés en deux types :

  1. Les risques fonctionnels, techniques et liés aux processus sont associés au fonctionnement, aux procédures, aux ressources, à la communication du projet
  2. Les risques du projet sont associés aux aspects techniques et de mise en œuvre
Risque Prob. Impact Mesures d’atténuation
Risques fonctionnels, techniques et liés aux processus
Erreurs dans l’envoi des SMS par le public cible Moyen Elevé. Données non utilisables par le système Limiter le nombre de procédures, utiliser une nomenclature simple, simple bien former les utilisateurs
Les interfaces utilisateurs sont difficiles à comprendre, ou l’utilisation requiert une longue procédure Moyen Modéré. Système difficile à utiliser, utilisateurs finaux découragés Prendre en compte l’utilisateur pendant la conception, Produire les règles d’ergonomie
Anomalies de fonctionnement et instabilité de l’environnement Bas Elevé. Intégrité des données. Système souvent hors utilisation Renforcement des tests, Recensement des bugs, Choix techniques
Risques du projet
Ne pas avoir l’adhésion du gouvernement
et des partenaires
Appropriation et viabilité à long terme du projet Impliquer le gouvernement et les partenaires dès la conception du projet
Ne pas héberger le système au sein du gouvernement Inclure un coût pour l’hébergement et le personnel technique. Renforcer les capacités afin d’assurer le support
Ne pas avoir un spécialiste TIC et un coordonateur du projet au sein du gouvernement
Les personnes chargées du monitoring sur le terrain ne possèdent pas de téléphone, ou ont des téléphones non fiables Impact élevé sur la mise à jour de la BD et l’intégrité des données Inclure un coût supplémentaire dans le budget, évaluer les téléphones pendant les formations
La langue d’envoi des messages n’est pas appropriée Impact élevé, les utilisateurs auront de la peine à les lire Savoir les préférences des utilisateurs finaux. Utiliser plusieurs langues au besoin
Le public bénéficiaire ne consulte pas sa messagerie Un bon pourcentage de messages ne sont pas lus Renforcer la communication autour du projet
Ne pas avoir une compagnie IT locale (ou un développeur local) pour développer l’application et en assurer le suivi Difficultés dans l’implémentation et le support du système Mettre sur pied une équipe technique à long terme (entreprise IT ou recruter un développeur)
Les utilisateurs finaux ne sont pas impliqués dès le début du projet, et ne sont pas bien formés Mauvaise utilisation et appropriation du dispositif Créer un groupe de travail pour valider les spécifications, Créer un comité d’utilisateurs

Utilisation de la téléphonie mobile pour l’amélioration de la santé de l’enfant au Congo

Le bureau de l’UNICEF à Brazzaville met sur pieds en ce moment un projet innovant sur l’utilisation de la téléphonie mobile pour l’amélioration de la couverture des interventions à haut impact (Vaccination, déparasitage, supplémentation en vitamine A). Utilisation des mobiles pour le développement

Au Congo 54,5% d’enfants de 12-24 mois n’ont pas été complètement vaccinés, de plus la malnutrition chronique touche 24.4% d’enfants de moins de 5 ans et constitue un problème de santé publique majeur. Le projet vise à exploiter le fort taux de pénétration de la téléphonie (plus de 92% de ménages en milieu urbain possèdent un téléphone) pour informer (alerter) les parents des dates de vaccination de leurs enfants par SMS et éventuellement par messages vocaux automatiques (OBD, IVR). Le projet mettra également sur pieds un système de calendrier vaccinal électronique.

Un serveur enregistrera quelques informations lorsqu’une femme vient accoucher (nom de l’enfant, date et lieu d’accouchement, noms des parents et numéros de téléphones, centre de santé, etc.) et enverra ensuite automatiquement des SMS (et éventuellement messages vocaux) de rappel aux parents, centres de santé et agents de santé, une semaine et un jour avant un événement, et éventuellement 3 jours et une semaine après l’événement, si les parents ne se sont pas présentés. Il sera aussi possible pour les parents d’envoyer des SMS gratuits au système afin de donner leur feedback sur les services dont ils bénéficient, des sondages par SMS seront aussi organisés pendant toute la durée du projet.

Toutes les informations et données enregistrées pendant la durée du projet seront consultables en temps réel sur un site web internet, ce qui pourra permettre au gouvernement et partenaires à mieux orienter leurs politiques. L’architecture technique du serveur sera conçue autour des technologies suivantes: la librairie RapidSMS, un OS GNU/Linux (Debian ou Ubuntu Server), Python et les Packages requis, le Framework Web Django, un Gateway SMS (Kannel?), MySQL comme base de données et un serveur web.

Présentation PowerPoint de l’étude de faisabilité du projet et des recommandations

6 things you should know about Amazon CloudSearch

Amazon CloudSearch is a fully-managed service in the AWS Cloud that makes it simple and cost-effective to set up, manage, and scale a custom search solution for your website or application.

Amazon CloudSearch supports a rich set of features including language-specific text processing for 34 languages, free text search, faceted search, geospatial search, customizable relevance ranking, highlighting, autocomplete and user configurable scaling and availability options [1].

Here are six key things you should know about Amazon CloudSearch.

1. How to build a search solution with Amazon CloudSearch?

There are three main steps:
Amazon-CloudSearch

  1. Create and configure a search domain: A search domain includes your searchable data and the search instances that handle your search requests. If you have multiple collections of data that you want to make searchable, you can create multiple search domains
  2. Upload the data you want to search to your domain: Amazon CloudSearch indexes your data and deploys the search index to one or more search instances,
  3. Search your domain: You send a search request to your domain’s search endpoint as an HTTP/HTTPS GET request.

2. How much data can you store in CloudSearch?

The number of partitions you need depends on your data and configuration. When you upload data, Amazon CloudSearch deploys one or more search instances. As your data volume grows, more search instances or larger search instances are deployed to contain your indexed data. The maximum number of search instances that can be deployed for a domain is 50, and a search index can be splited across a maximum of 10 partitions. For more information about CloudSearch limits, see Ref. [2].

3. How are you charged?

You are charged for Search instances, Document batch uploads, IndexDocuments requests and Data transfer. The Ref. [3] provides a useful resource to estimate your CloudSearch monthly bill.

4. Can you query for documents older than X and then send a batch delete request?

At the moment, Amazon CloudSearch does not provide this feature, but you can use one of their SDKs to list documents based on some parameters, and then delete them.

5. Do you need to keep a copy of the index anywhere?

You don’t really need, you can rely only on CloudSearch as source of the analytics data, as far as you respect AWS Security Best practices [5]. Amazon CloudSearch stores your data internally in high availability stores, you will not have to re-index your data, should a CloudSearch instance have an issue (this will be handled transparently). It offers key benefits including automatic node monitoring and recovery, built in data durability, easy setup and configuration and hands off auto scaling.

6. What’s the difference in the different instance sizes for CloudSearch?

A search instance is a single search engine that indexes documents and responds to search requests. It has a finite amount of RAM and CPU resources for indexing data and processing requests. There are four available instance types within CloudSearch: search.m1.small (2 Million documents), search.m1.large (8 Million documents), search.m2.xlarge (16 Million documents) and search.m2.2xlarge (32 Million documents).

Short Amazon CloudSearch Video

References:
[1] – Amazon CloudSearch
[2] – Understanding Amazon CloudSearch Limits
[3] – AWS Simple Monthly Calculator
[4] – AWS Security Best practices [pdf]

Uploading a Large File to Amazon S3

AWS_S3

Amazon Web Services Simple Storage Service

The largest single file that can be uploaded into an Amazon S3 Bucket in a single PUT operation is 5 GB. If you want to upload large objects (> 5 GB), you will consider using multipart upload API, which allows to upload objects from 5 MB up to 5 TB.

The Multipart Upload API is designed to improve the upload experience for larger objects, which can be uploaded in parts, independently, in any order, and in parallel. The AWS tool to use to perform this is API-Level (s3api) command set.

In this tutorial, we assume:

  • You have installed and configured AWS Command Line Interface on a Linux OS computer/server,
  • You have an Amazon account and a S3 Bucket (MyBucketName),
  • The size of the file to upload is 20 GB (MyObject.zip),
  • 100 MB can be uploaded without problem using our internet connection.

Theoretically, how it works

The process involves in 4 steps:

  1. Separate the object into multiple parts. There are several ways to do this in Linux, ‘dd‘, ‘split‘, etc. We will use ‘dd’ in this tutorial,
  2. Initiate the multipart upload and receive an upload id in return (aws s3api create-multipart-upload),
  3. Upload each part (a contiguous portion of an object’s data) accompanied by the upload id and a part number (aws s3api upload-object),
  4. Finalize the upload by providing the upload id and the part number / ETag pairs for each part of the object (aws s3api complete-multipart-upload).

And practically?

1. Separate the object into multiple parts

We will create 205 parts (100 MB * 204 + 80 MB):

$ dd if=/dev/urandom of=MyObject.zip bs=1024k count=20000


$ dd if=MyObject.zip of=MyObject1.zip bs=1024k skip=0 count=100
$ dd if=MyObject.zip of=MyObject2.zip bs=1024k skip=100 count=100
$ dd if=MyObject.zip of=MyObject3.zip bs=1024k skip=200 count=100
...
$ dd if=MyObject.zip of=MyObject10.zip bs=1024k skip=900 count=100
$ dd if=MyObject.zip of=MyObject11.zip bs=1024k skip=1000 count=100
$ dd if=MyObject.zip of=MyObject12.zip bs=1024k skip=1100 count=100
...
$ dd if=MyObject.zip of=MyObject203.zip bs=1024k skip=20200 count=100
$ dd if=MyObject.zip of=MyObject204.zip bs=1024k skip=20300 count=100
$ dd if=MyObject.zip of=MyObject205.zip bs=1024k skip=20400 count=100

A one line shell script can be written to automate this process:

$ for i in {1..205}; do dd if=MyObject.zip of=MyObject"$i".zip bs=1024k skip=$[i*100 - 100] count=100; done

2. Initiate the multipart upload and receive an upload id in return

aws s3api create-multipart-upload --bucket MyBucketName --key MyObject.zip
You will received as output something like:
{
"UploadId": "UVditMTG8U--MyLongUploadId--ksmFT7N6bNTWD",
"Bucket": "MyBucketName",
"Key": "MyObject.zip"
}

3. Upload each part

For the following commands, note the console output:
{
"ETag": "\"fggcd799--ETagValue1--dhe76dd8dc\""
}


$ aws s3api upload-part --bucket MyBucketName --key MyObject.zip --upload-id \ MyLongUploadId --part-number 1 --body MyObject1.zip
$ aws s3api upload-part --bucket MyBucketName --key MyObject.zip --upload-id \ MyLongUploadId --part-number 2 --body MyObject2.zip
...
$ aws s3api upload-part --bucket MyBucketName --key MyObject.zip --upload-id \ MyLongUploadId --part-number 100 --body MyObject100.zip
...
$ aws s3api upload-part --bucket MyBucketName --key MyObject.zip --upload-id \ MyLongUploadId --part-number 204 --body MyObject204.zip
$ aws s3api upload-part --bucket MyBucketName --key MyObject.zip --upload-id \ MyLongUploadId --part-number 205 --body MyObject205.zip

Note: Once more you can write a small shell script to automate this process.

Finalize the upload

Create a JSON file MyMultiPartUpload.json containing the following:

{
"Parts": [
{
"ETag": "\"ETagValue1\"",
"PartNumber": 1
},
{
"ETag": "\"ETagValue2\"",
"PartNumber": 2
},
...
{
"ETag": "\"ETagValue100\"",
"PartNumber": 100
},
...
{
"ETag": "\"ETagValue204\"",
"PartNumber": 204
},
{
"ETag": "\"ETagValue205\"",
"PartNumber": 205
},
]
}

$ aws s3api complete-multipart-upload --bucket MyBucketName --key \
MyObject.zip --upload-id MyLongUploadId --multipart-upload MyMultiPartUpload.json

That is all, you can verify that the large file is uploaded with:
aws s3 ls s3://MyBucketName/MyObject.zip
2014-09-18 20:29:19 20495340 MyObject.zip

References and resources: