Host your own private image gallery using Piwigo and Nginx

You want to show some photos of the kids to your family in a secure fashion, but do not want to rely on the cloud folks to store your data? Then running Piwigo on your home server or a VPS is a great way to do exactly that.

Piwigo is free, open source software with a rich feature set and lots of available plugins and themes. If you’re using Lightroom to manage your photo library, there is the Piwigo publisher Plug-In by alloyphoto (one of the best 15$ I’ve ever spent, great support included). Read More

The way to Elasticsearch 2.0, or how to reindex your dot fields with logstash and ruby filters

The Elasticsearch 2.0 release intruced a major annoyance by removing support for dots in field names. We use ES for our apache logs, with retention policy of 365 days, and of course _all_ of the indices contained fields with a dot in the name.

What’s even worse, at some point in time i had the idea to filter out the request parameters from the uri and run a kv filter on it. As we never used the resulting mess of request_params.* fields, those could just be dropped.

First step was to update our logstash configuration so no dots are used for field names.

Then we needed an automated way of re-indexing all of our indices, replacing all dots (.) with underscore (_) in the field names, dropping irrelevant fileds and move all data into a new index. I came up with method using logstash and a ruby filter, wrapped in a bash script that iterates over all indices, sed’ing the index name into below template, an running logstash with it. Logstash will shutdown itself after the index is read in completely.

Read More

Indexing and searching Weblogic logs using Logstash, Elasticsearch and Kibana

This is a re-edit of my previous post “Indexing and searching Weblogic logs using Logstash and Graylog2”. Meanwhile our setup has settled to use Kibana instead of the Graylog2 frontend. This Howto is meant to be a complete installation guide for “The Elasticsearch ELK stack” and using it to index tons of Weblogic server and application logs, from DEV over UA to the Production environment. Read More

Getting metrics from Graphite into Nagios and Centreon

Getting metrics from logs and various other sources into Graphite is quite simple. The most interesting metrics do represent critical performance data, and the pro-active-monitoring approach, means a person sitting there and waching the dashboard, isn’t suited to our needs. We use Nagios with Centreon as our monitoring plattform, and we want to alert on some of the metrics collected in Graphite. Also since version 2.4 Centreon supports custom dashboard views and, although this might sound like doublemobble, we wanted to get the metrics graphically integrated into the Centreon interface, as RRD graphs that is.

Looking around I found the check_graphite plugin by obfuscurity, and greatly enhanced it to support multiple metrics in one call, performance data with customizable metric shortnames and retry calls in case there were no datapoints in the given duration. It’s called check_graphite_multi, available from my nagios-scripts perfdata branch on github, and is especially usefull if you’d like to get multiple metrics of the same type into one RRD graph in Centreon or PNP4Nagios or thelike. Our usecase is a graph with JVM heap generation usage and garbage collector statistics. We alert on full old generation and high GC durations.

Here are some short usage notes:

–metrics|-m accepts a string of metrics, seperated by a pipe |

--metrics "scale(weblogic.server01.jvm.gc.oldgen_used_before_gc)|scale(weblogic.server01.jvm.gc.oldgen_used_after_gc)"

–shortname|-s accepts a comma separated list of aliases for the output of status and performance data

--shortname "jvm_eden_before_gc,jvm_eden_after_gc"

If no –shortname is specified for the given metric, it defaults to the full metric name.

–warn|-w also accepts a comma separated list

--warn "100,150"

At least one value is required, if only one value is given for multiple metrics, the given value counts for all

–critical|-c works the same as –warn

Also note:

When specifying multiple metrics, make sure to keep the order for all parameters, like

-m "metric1|metric2" -s "alias1,alias2" -w "warn1,warn2" -c "crit1,crit2"

If at least one of the metrics returns a CRITICAL state, the plugin exits with CRITICAL return code. Dito for WARNING.

By default, if the metric has no datapoints in the given –duration timeframe, the plugin retries with 10times the given duration. This is mostly cosmetic to prevent holes in RRD graphs, and I might make that configurable in the future. Unfortunately Graphite has no option via the render API to just return the last datapoint, so this is a hack to work around that.

Indexing and searching Weblogic logs using Logstash and Graylog2

Update 2013/10: we decided to replace Graylog2 with Kibana3 completely. The article below is just for reference, the logstash config is outdated since logstash 1.2 and the setup as described below is suboptimal anyway. I’ll post a new article shortly.

Update 2014/02: Finally, the new guide is here: Indexing and searching Weblogic logs using Logstash, Elasticsearch and Kibana.

 

Recently we decided to get rid of our Splunk “Free” log indexing solution as the 500MB limit is too limited for our 20+ Weblogic environments (we couldn’t even index all production apps with that) and $boss said “he won’t pay over 20000€ for a 2GB enterprise license, that’s just rediculous”. So I went out on the interwebs to watch out for alternatives, and stumbled over Graylog2 and Logstash. This stuff seemed to have potential, so I started playing around with it.

Logstash is a log pipeline that features various input methods, filters and output plugins. The basic process is to throw logs at it, parse the message for the correct date, split the message into fields if desired, and forward the result to some indexer and search it using some frontend. Logstash scales vertically, and for larger volumes it’s recommended to split the tasks of logshipping and log parsing to dedicated logstash instances. To avoid loosing logs when something goes down and to keep maintenance downtimes low, it’s also recommended to put some message queue between the shipper(s) and the parser(s).

Redis fits exactly in that pictures, and acts as a key-value message queue in the pipeline. Logstash has a hard coded queue size of 20 events per configured input. If the queue fills up, the input gets blocked. Using a dedicated message queue instead is a good thing to have.

Graylog2 consits of a server and a webinterface. The server stores the logs in Elasticsearch, the frontend lets you search the indexes.

So, our whole pipeline looks like this:

logfiles logstash shipper redis logstash indexer cluster gelf graylog2-server elasticsearch cluster graylog2-web-interface

Logstash is able to output to Elasticsearch directly, and there is the great Kibana frontend for it which is in many ways superior to graylog2-web-interface, but for reasons I explain at the end of the post we chose Graylog2 for weblogic logs.

Installation

The first step was to get graylog2, elasticsearch and mongodb up and running. We use RHEL 6, so this howto worked almost out of the box. I changed following:

  • latest stable elasticsearch-0.19.10
  • latest stable mongodb 2.2.0
  • default RHEL6 ruby 1.8.7 (so I left out any rvm stuff in that howto, and edited the provided scripts removing any rvm commands)

Prepare access to the logfiles for logstash

Next was to get logstash to index the logfiles correctly.

We decided to use SSHFS for mounting the logfile folders of all Weblogic instances onto a single box, and run the logstash shipper on that one using file input and output to redis. The reason for using SSHFS instead of installating logstash directly on the Weblogic machines and using for example a log4j appenders to logstash log4j inputs was mainly that our Weblogics are managed by a bank’s data centre, so getting new software installed requires a lot work. The SSH access was already in place.

We have weblogic server logs (usually the weblogic.log), and each application generates a log4j-style logfile.

/data/logfiles/prod/server01/app1.log
/data/logfiles/prod/server01/app2.log
/data/logfiles/prod/server01/weblogic.log
/data/logfiles/prod/server02/app1.log
...
/data/logfiles/qsu/server01/app1.log
... and so on

This is the configuration file for the file-to-redis shipper. The only filter in place is the multiline filter, so that multiline messages get stored in redis as a single event already.

input {
  # server logs
  file {
    type => "weblogic-log"
    path => [ "/data/logfiles/*/*/weblogic.log" ]
  }
  # application logs
  file {
    type => "application"
    path => [ "/data/logfiles/*/*/planethome.log",
              "/data/logfiles/*/*/marktplatz*.log",
              "/data/logfiles/*/*/immoplanet*.log",
              "/data/logfiles/*/*/planetphone.log" ]
  }
}
filter {
  # weblogic server log events always start with ####
  multiline {
    type => "weblogic"
    pattern => "^####"
    negate => true
    what => "previous"
  }
  # application logs use log4j syntax and start with the year. So below will work until 31.12.2099
  multiline {
    type => "application"
    pattern => "^20"
    negate => true
    what => "previous"
  }
}
output {
  redis {
    host => "phewu01"
    data_type => "list"
    key => "logstash-%{@type}"
  }
}

And this is the config for the logstash parsers. Here the main work happens, logs get parsed, fileds get extracted. This is CPU intensive, so depending on the amount of messages, you can simply add more instances with the same config.

input {
  redis {
    type => "weblogic"
    host => "phewu01"
    data_type => "list"
    key => "logstash-weblogic"
    message_format => "json_event"
  }
  redis {
    type => "application"
    host => "phewu01"
    data_type => "list"
    key => "logstash-application"
    message_format => "json_event"
  }
}

filter {
  ###################
  # weblogic server logs
  ###################
  grok {
     # extract server environment (prod, uat, dev etc..) from logfile path
     type => "weblogic"
     patterns_dir => "./patterns"
     match => ["@source_path", "%{PH_ENV:environment}"]
  }
  grok {
    type => "weblogic"
    pattern => ["####<%{DATA:wls_timestamp}> <%{WORD:severity}> <%{DATA:wls_topic}> <%{HOST:hostname}> <(%{WORD:server})?> %{GREEDYDATA:logmessage}"]
    add_field => ["application", "server"]
  }
  date {
    type => "weblogic"
    # joda-time doesn't know about localized CEST/CET (MESZ in German), 
    # so use 2 patterns to match the date
    wls_timestamp => ["dd.MM.yyyy HH:mm 'Uhr' 'MESZ'", "dd.MM.yyyy HH:mm 'Uhr' 'MEZ'"]
  }
  mutate {
    type => "weblogic"
    # set the "Host" in graylog to the environment the logs come from (prod, uat, etc..)
    replace => ["@source_host", "%{environment}"]
  }

  ######################
  # application logs
  ######################
  # match and pattern inside one single grok{} doesn't work
  # also using a hash in match didn't work as expected if the field is the same,
  # so split this into single grok{} directives
  grok {
    type => "application"
    patterns_dir => "./patterns"
    match => ["@source_path", "%{PH_ENV:environment}"]
  }
  grok {
    # extract app name from logfile name
    type => "application"
    patterns_dir => "./patterns"
    match => ["@source_path", "%{PH_APPS:application}"]
  }
  grok {
    # extract node name from logfile path
    type => "application"
    patterns_dir => "./patterns"
    match => ["@source_path", "%{PH_SERVERS:server}"]
  }
  grok {
    type => "application"
    pattern => "%{DATESTAMP:timestamp} %{DATA:some_id} %{WORD:severity} %{GREEDYDATA:logmessage}"
  }
  date {
    type => "application"
    timestamp => ["yyyy-MM-dd HH:mm:ss,SSS"]
  }
  mutate {
    type => "application"
    replace => ["@source_host", "%{environment}"]
  }
}

output {
  gelf {
    host => "localhost"
    facility => "%{@type}"
  }
}

In ./patterns there is a file containing 3 lines for the PH_ENV etc grok patterns to match.

I had several issues initially:

  • rsync to jumpbox and mounting to logserver via ssfhs would cause changes to go missing on some, but not all files
  • chaining rsyncs would cause some messages to be indexed with partial @message, as some files are huge and slow to transfer.
    • This was solved by mounting all logfile folders on the different environments directly with SSHFS, where possible.
    • The remaining rsync’ed files are rsync’ed with “–append –inplace” rsync parameters
  • indexing files would always start at position 0 of the file, over and over again
    • Only happened for rsync, using “–append –inplace” fixes this

Meanwhile I also took a look at Kibana, another great frontend to ElasticSearch. For the weblogic logs I’ll keep Graylog2, as it allows saving predefined streams and provides that easy-to-use quickfilter, which eases up log crawling for our developers (they only want to search for a string in a timeframe – they also never made use of the power of Splunk). Also Kibana doesn’t provide a way to view long stack traces in an acceptable fashion yet (cut message a n characters and provide a more link, something like that). But I added Apache logs in logstash, and those are routed to a logstash elasticsearch output and Kibana as WebUI. I’d really like to see some sort of merge between Kibana and Graylog2 and add saved searches to that mix – that would make a realy competitive Splunk alternative.

SOHO Mailserver with Postfix + Postgresql + Dovecot + SpamAssassin + Roundcube

This HowTo describes my Home-Mailserver Setup. Basically this is a sum-it-all-up article from various resources on the net. 

Used Software:

  • Arch Linux OS
  • Postfix MTA
  • PostgreSQL database backend
  • Dovecot IMAP Server
  • Roundcube Webmail + Apache Webserver
  • Spamassassin junk filter
  • Server-side filtering with Sieve
  • fetchmail (for pulling all spread accounts in this one place)

Preconditions in my setup:

  • Server behind Firewall/NAT
  • Dynamic IP (No-IP Plus managed DynDNS service with MX Record etc)
  • StartSSL certificate for both Web- and Mail-Server domain
  • ISP doesn’t allow running an outgoing mail server directly, requires relaying through their mail gateway
  • Apache + PHP + Postgresql already running and working
Read More

Howto create a youtube video from mp3/ogg audio using a picture

If you want to create a youtube video from an audio file, here is how to do this.
All you need is the audio file, a single picture, and ffmpeg.

First find out the lenght of the audio file in seconds, you’ll need it. Here is an example with a 420 seconds file:

ffmpeg -loop_input -i picture.jpg -vcodec mpeg4 -r 25.00 -qscale 2 -s 480x360 \
-i audiofile.mp3 -acodec libmp3lame -ab 128k -t 420 video.avi

This will create a Hi-Res MPEG-4 video with 128k audio. The trick here is to use that one picture and loop it for -t seconds.