APG+

Samstag, 12. November 2016

Is your CPU monitoring green too?

Defending some performance issues on a Windows based SQL Server 2008 R2 I took a look on CPU monitoring too. All green. Since months. That's good - I thought. Nagios as well as commercial tools like solarwinds comes up with average CPU usage, which means average acrosss single cores. In my case:

- 6 virtual Cores available
- 4 used by SQL Server at 90-100%
- 2 used by Windows at 2-5%

Makes an average around 60-70%. Fail. Now I'm looking for monitoring solutions per core. Any suggestions?

Mittwoch, 7. September 2016

Java 7 to 8 Upgrade. Are you production ready?

A couple of days ago I upgraded one of our production environments from Java 7 to 8. All were tested before, so I took the challenge:

- Installation of Java 8
- Setting JAVA_HOME to new location
- Some work around RedHat JBoss EAP 6 middleware
- Restart of JBoss and Elasticsearch

Nice. JBoss and Elasticsearch started up. We're in production with Java 8!
After a couple of days CPU on production machines started to get exhausted by Java process. A heavy monitoring on JVM (using Jolokia) came up with a full JVM CodeCache. So the server started to compile just in time to serve requests.

CodeCache - if not set configured on JVM startup - comes with a default of 48 megabytes, which is too less for production environments. So a

-XX:ReservedCodeCacheSize=256M

helped to solve the problem. The new parameter was implemented in JBoss standalone.conf.bat. The CodeCache now levels around 60 megabytes.

Sonntag, 24. Januar 2016

ELK, ELG or ELKG?

Elasticsearch: Check. Logstash: Check. But what about Kibana vs. Grafana? My first impression: Quite similar.

After taking a closer look, Grafana looks much more handy for displaying timeline series of whatever data you've got. It's quite easy to add new data sources and I fell in love with those markers, which can be used to display single events throughout your time series. That's in fact great for getting an idea of e.g. error rate increases after application or module deployment.

I'm pretty sure I'll use both - Kibana and Grafana - along with each other to get the best of both tools.

Montag, 17. August 2015

I've got CURL and I will use it!

A meeting someday and somewhere.

Me: "Please, could I get command line access to HAProxy?"
Ops: "Wooo-hooo! Well. Let's say...I think... No!"
Me: "Why? It would speed up my work."
Ops: "Well. Quite a complex system submitting the wrong command will let us all die!"
Me: "Oh. I didn't know."

So I told the story to my friend CURL. And here's our solution:

If you're using basic authentication for your HAProxy admin page, find something like

Authorization:Basic UsuAnb123neb2Zc39=

in your web browsers request headers (F12 in Chrome will do the job). Assume your HAProxy web admin interface is running on 10.0.0.10:4444, your CURL would look like:

curl --header "Authorization:Basic UsuAnb123neb2Zc39=" --data "s=SERVERNAME&action=[disable|enable]&b=INTERFACE" http://10.0.0.10:4444

Replace INTERFACE by name shown on admin page for each interface which is load balanced. Replace SERVERNAME with name of the machine to be enabled/disabled.

Great fun to bulk enable/disable nodes/interfaces without anoying use of HAProxy web interface. Gets you a step further to deployment or maintenance automation.

Montag, 22. Juni 2015

Assumptions vs. Reality in DevOps

Sonntag, 14. Juni 2015

ELK GeoIP

Ever set up a cloud machine using Amazon EC2 or any other cloud provider? If not, it's big fun. After your machine is up an running wait some minutes and attackers start their work by brute forcing root ssh login.

So I thought it would be a good idea to know where they come from. I started my Ubuntu 14.04 machine and set up fail2ban, which logs to /var/log/fail2ban.log. Read more on fail2ban setup.

A simple

cat /var/log/fail2ban.log*|grep Ban|awk '{print $1,$2,$7;}' | sed 's/ /T/' | sed 's/,/./' >> fail2ban.log

extracts all banned IP addresses to fail2ban.log, which looks like

2015-06-05T00:37:20.887 186.121.210.50
2015-06-05T01:12:02.366 182.100.67.114
2015-06-05T02:20:53.002 218.65.30.107
2015-06-05T02:23:13.149 186.147.233.125
2015-06-05T04:08:53.423 119.97.184.14
2015-06-05T05:59:14.905 43.255.188.146
2015-06-05T07:02:10.099 66.210.34.180

2015-06-05T07:18:58.651 58.218.211.166

Truely, this list gets quite long over time. Next download and install ELK stack. Elasticsearch and Logsatsh are running out of the box. Edit Kibanas config file kibana.yml and set Elasticsearch URL pointing to your local ES instance:

elasticsearch_url: "http://localhost:9200"

Next create a fail2ban directory on same level as ELK resides and put your fail2ban.log in. Last step: Create grok filter for Logstash named auth.conf and save it to ELK level directory:

input {

file {

type => "fail2ban"

path => [ "/home/papa/Projekte/es/fail2ban/fail2ban.log*" ]

start_position => "beginning"

discover_interval => 1

}

filter {

if [type] == "fail2ban"

{

grok {

match => [ "message", "%{TIMESTAMP_ISO8601:bandate} %{IPORHOST:ip}" ]

}

geoip {

source => "ip"

}

output {

elasticsearch {

host => localhost

}

You're fine now just starting up the ELK:

./elasticsearch-1.5.2/bin/elasticsearch >> es.log &

./logstash-1.5.0/bin/logstash -f ./auth.conf >> ls.log &

./kibana-4.0.2/bin/kibana >> kib.log &

Simply put these 3 line to a run.sh file for reuse.

Point your favourite web browser to localhost:5601, which should come up with an unconfigured Kibana. Check Use event times to create index names and set thee Time-field name to bandate.

Switch to Visualize and create new Tile Map from new search. Bucket type is like Geo Coordinates, Aggregation GeoHash, field geoip.location. Don't forget to click apply. Maybe you'll need to adjust time settings to get your data displayed in Kibana. By default last 15mins are selected.

Your're done! Finally you should get somwthing like this displayed for your banned IPs trying to ssh your server.

But maybe you do not want to know about banned IPs but to accepted logins. Take a look on your /var/log/auth.log and adjust auth.conf accordingly.

Dienstag, 25. November 2014

DevOps hits enterprise

Honestly – small teams and startups have built in DevOps capabilities. Where a couple of people generating ideas, dropping the first lines of code, getting some cloud infrastructure up and running, deploying, running and permanently changing their own software, each of them has to know both universes: Dev and Ops.
For enterprises it’s some kind of more complex. Distributed development teams with hundreds of developers have to be integrated by a centralized IT department. Supplying development environments as well as on demand testing infrastructure, code repositories, build and test pipelines, automatic infrastructure provisioning, deployment pipelines and several pre-production and production stages.Sounds huge? It is! But – how does DevOps starts? 3 steps to get into.

Talk to each other
Set up a weekly (that’s enough for starting with) jour fixe. Attendees are developers, IT staff and management people. Discuss business plans, how business will grow, which features to implement and which impacts this has for providing and running tools and infrastructure. Let the ops be part of your dev planning meetings (assumes some sort of agile development framework is already implemented for your dev teams) will be the next step. But be aware: There’s usually 1 ops for n dev teams or dev team members. So involving ops into detailed dev planning will end up with an ops meeting marathon where the ops will have no time left for doing their jobs. So some sort of abstraction is necessary.

Use Checklists
Before automation is part of your daily business, use checklists to get your software or parts of it to production. I’m serious. Checklists ensure:

Integrity of deployed software (use for example SHA256 for checksums of your packages)
Antivirus scan – yes, it’s quite easy to inject some sort of malware during your build pipeline
Detailed deployment steps – besides artefacts, which changes to apply e.g. to configuration files or database schema (and of course you’re using a database versioning tool like flyway :-)
Approval – who finally approves your deployment? This could be a role like product management in conjunction with quality assurance (QA). No approval, no deployment!
Inform all stakeholders on new release – keep an email template for announcements. Announce a new release before deployment and a after a successfully finished deployment.
Ensure all required documents are ready – people would like to know what changed since version 2.0.0 and the new 2.1.0. So release notes are a good idea.

Start automation
A simple example - deploying artefacts to several JBoss instances:Deploying a module, hotfix or any other package throughout several stages with in sum 15 servers will take you say 30mins for each cycle, which includes (per server):

SSH or RDP login to server
Distribute package to server
Copy files to deployment directory
Check logs for errors

Automating this single task will end up in a 30sec deployment time:

Copy package to single distribution server (yes, assumption is that you have some machine you can deal with which sits in the same network as your stages do)

Run remote deployment using JBoss management interface

But that’s not out of the box. There’s a bunch of work to do:

Configure (management interface) and secure (management user) JBoss servers for remote access
Adjust firewall settings for remote access
Maybe you have to set up your own management network with its own IP address range
Write deployment scripts on distribution server and run them frequently (as cron jobs for example)
Do heavy testing

Maybe you will encounter running out of JVM PermGen on frequent deployments running Java 7
Outcome of this is to upgrade to newer JVM version, which has a large impact to your dev and QA teams

Create an output pipeline such as a website, email notification or monitoring integration to get feedback of your deployments – no problem, as long as everything is fine, but you should get notified in case of deployment errors

As you can imagine, there’s a large initial effort to get each single step automated and predictable up and running. It could take months to years to do so. But there’s a huge benefit taking a closer look on time reduction and reproducible behaviour.

What’s next?
Besides deployment of artefacts next steps could be implementing configuration management tools such as Puppet or Chef to deal with configuration files on your stages. Like your centralized deployment server there will be a centralized configuration management master to keep your stages configuration up to date.But before setting up configuration management tools you should think about streamlining your configuration across your stages. Make use of configuration files for stage specific settings such as database connectors or IP addresses.Culture first!

As I wrote in one of my recent articles DevOps – was bisher geschah (German only), introducing DevOps to grown structures, it’s about 80% culture and 20% tools. So start with culture:

Blameless Postmortems (1): Get rid of your dev and ops divided thinking. Start analyse incidents without blaming each other.
Faults happen: Don't try to avoid faults and incidents but have a plan when they happen
Assumption: People doing their best. People are not evil by default trying to break a system
Kill features nobody uses. Sounds simple but explain to a manager to kill features which took invest and man power to develop. Have fun! But throwbacks are part of the game. No management support, no DevOps!

___________________________
(1) Thanks to @roidrage of giving an overview on this on Continuous Lifecycle Conference 2014 in Mannheim, Germany