Post summary: This post explains Java 8 Stream API with very basic code examples.

In Java 8 features – Lambda expressions, Interface changes, Stream API, DateTime API post I have briefly described most interesting Java 8 features. In the current post, I will give special attention to Stream API. This post is with more advanced code examples to elaborate on basic examples described in Java 8 features – Stream API basic examples post. Code examples here can be found in GitHub java-samples/java8 repository.

Memory consumption and better design

Stream API has operations that are short-circuiting, such as limit(). Once their goal is achieved they stop processing the stream. Most of the operators are not such. Here I have prepared an example for possible pitfall when using not short-circuiting operators. For testing purposes, I have created PeekObject which outputs a message to the console once its constructor is called.

public class PeekObject {
	private String message;

	public PeekObject(String message) {
		this.message = message;
		System.out.println("Constructor called for: " + message);
	}

	public String getMessage() {
		return message;
	}
}

Assume a situation where there is a stream of many instances of PeekObject, but only several elements of the stream are needed, thus they have to be limited. Only 2 constructors are called in this case.

limit the stream

public static List<PeekObject> limit_shortCircuiting(List<String> stringList,
							int limit) {
	return stringList.stream()
		.map(PeekObject::new)
		.limit(limit)
		.collect(Collectors.toList());
}

unit test

@Test
public void test_limit_shortCircuiting() {
	System.out.println("limit_shortCircuiting");

	List<String> stringList = Arrays.asList("a", "b", "a", "c", "d", "a");

	List<PeekObject> result = AdvancedStreamExamples
		.limit_shortCircuiting(stringList, 2);

	assertThat(result.size(), is(2));
}

console output

limit_shortCircuiting
Constructor called for: a
Constructor called for: b

Now stream has to be sorted before the limit is applied.

code

public static List<PeekObject> sorted_notShortCircuiting(
					List<String> stringList, int limit) {
	return stringList.stream()
		.map(PeekObject::new)
		.sorted((left, right) -> 
			left.getMessage().compareTo(right.getMessage()))
		.limit(limit)
		.collect(Collectors.toList());
}

unit test

@Test
public void test_sorted_notShortCircuiting() {
	System.out.println("sorted_notShortCircuiting");

	List<String> stringList = Arrays.asList("a", "b", "a", "c", "d", "a");

	List<PeekObject> result = AdvancedStreamExamples
		.sorted_notShortCircuiting(stringList, 2);

	assertThat(result.size(), is(2));
}

console output

sorted_notShortCircuiting
Constructor called for: a
Constructor called for: b
Constructor called for: a
Constructor called for: c
Constructor called for: d
Constructor called for: a

Notice that constructors for all objects in the stream are called. This will require Java to allocate enough memory for all the objects. There are 6 objects in this example, but what if there are 6 million. Also, current objects are very lightweight, but what if they are much bigger. The conclusion is that you have to know very well Stream API operations and apply them carefully when designing your stream pipeline.

Convert comma separated List to a Map with handling duplicates

There is a List of comma separated values which need to be converted to a Map. List value “11,21” should become Map entry with key 11 and value 21. Duplicated keys also should be considered: Arrays.asList(“11,21”, “12,21”, “13,23”, “13,24”).

code

public static Map<Long, Long> splitToMap(List<String> stringsList) {
	return stringsList.stream()
		.filter(StringUtils::isNotEmpty)
		.map(line -> line.split(","))
		.filter(array -> array.length == 2 
			&& NumberUtils.isNumber(array[0])
			&& NumberUtils.isNumber(array[1]))
		.collect(Collectors.toMap(array -> Long.valueOf(array[0]), 
			array -> Long.valueOf(array[1]), (first, second) -> first)));
}

unit test

@Test
public void test_splitToMap() {
	List<String> stringList = Arrays
			.asList("11,21", "12,21", "13,23", "13,24");

	Map<Long, Long> result = AdvancedStreamExamples.splitToMap(stringList);

	assertThat(result.size(), is(3));
	assertThat(result.get(11L), is(21L));
	assertThat(result.get(12L), is(21L));
	assertThat(result.get(13L), is(23L));
}

The important bit in this conversion is (first, second) -> first), if it is not present there will be error java.lang.IllegalStateException: Duplicate key 23 (slightly misleading error, as the duplicated key is 13, the value is 23). This is a merge function which resolves collisions between values associated with the same key. It evaluates two values found for the same key – first and second where current lambda returns the first. If overwrite is needed, hence keep the last entered value then lambda would be: (first, second) -> second).

Examples of custom object

Examples to follow use custom object Employee, where Position is an enumeration: public enum Position { DEV, DEV_OPS, QA }.

import java.util.List;

public class Employee {
	private String firstName;
	private String lastName;
	private Position position;
	private List<String> skills;
	private int salary;

	public Employee() {
	}

	public Employee(String firstName, String lastName,
				Position position, int salary) {
		this.firstName = firstName;
		this.lastName = lastName;
		this.position = position;
		this.salary = salary;
	}

	public void setSkills(String... skills) {
		this.skills = Arrays.stream(skills).collect(Collectors.toList());
	}

	public String getName() {
		return this.firstName + " " + this.lastName;
	}

	... Getters and Setters
}

A company has been created, it consists of 6 developers, 2 QAs and 2 DevOps..

private List<Employee> createCompany() {
	Employee dev1 = new Employee("John", "Doe", Position.DEV, 110);
	dev1.setSkills("C#", "ASP.NET", "React", "AngularJS");
	Employee dev2 = new Employee("Peter", "Doe", Position.DEV, 120);
	dev2.setSkills("Java", "MongoDB", "Dropwizard", "Chef");
	Employee dev3 = new Employee("John", "Smith", Position.DEV, 115);
	dev3.setSkills("Java", "JSP", "GlassFish", "MySql");
	Employee dev4 = new Employee("Brad", "Johston", Position.DEV, 100);
	dev4.setSkills("C#", "MSSQL", "Entity Framework");
	Employee dev5 = new Employee("Philip", "Branson", Position.DEV, 140);
	dev5.setSkills("JavaScript", "React", "AngularJS", "NodeJS");
	Employee dev6 = new Employee("Nathaniel", "Barth", Position.DEV, 99);
	dev6.setSkills("Java", "Dropwizard");
	Employee qa1 = new Employee("Ronald", "Wynn", Position.QA, 100);
	qa1.setSkills("Selenium", "C#", "Java");
	Employee qa2 = new Employee("Erich", "Kohn", Position.QA, 105);
	qa2.setSkills("Selenium", "JavaScript", "Protractor");
	Employee devOps1 = new Employee("Harold", "Jess", Position.DEV_OPS, 116);
	devOps1.setSkills("CentOS", "bash", "c", "puppet", "chef", "Ansible");
	Employee devOps2 = new Employee("Karl", "Madsen", Position.DEV_OPS, 123);
	devOps2.setSkills("Ubuntu", "bash", "Python", "chef");

	return Arrays.asList(dev1, dev2, dev3, dev4, dev5, dev6,
				qa1, qa2, devOps1, devOps2);
}

Company skill set

This method accepts none, one or many positions. If no positions are provided then information for all positions is printed. Positions array is transferred to List<String> because all objects used in lambda should be effectively final. Transferring array to stream is done with Arrays.stream() method. Employees are filtered based on the desired position. Each skills list is concatenated and flattened to a stream with flatMap(). After this operation, there is a stream of strings with all skills. Duplicates are removed with distinct(). Finally, stream is collected to a list.

code

public static List<String> gatherEmployeeSkills(
		List<Employee> employees, Position... positions) {
	positions = positions == null || positions.length == 0 
		? Position.values() : positions;
	List<Position> searchPositions = Arrays.stream(positions)
			.collect(Collectors.toList());
	return employees == null ? Collections.emptyList()
		: employees.stream()
			.filter(employee 
				-> searchPositions.contains(employee.getPosition()))
			.flatMap(employee -> employee.getSkills().stream())
			.distinct()
			.collect(Collectors.toList());
}

unit test

@Test
public void test_gatherEmployeeSkills() {
	List<Employee> company = createCompany();

	List<String> skills = AdvancedStreamExamples
			.gatherEmployeeSkills(company);

	assertThat(skills.size(), is(25));
}

Skillset per position

This method first received a list of all skills per position and converts it to a stream. The stream can be collected to a String with Collectors.joining() method. It accepts delimiter, prefix, and suffix.

code

public static String printEmployeeSkills(
		List<Employee> employees, Position position) {
	List<String> skills = gatherEmployeeSkills(employees, position);
	return skills.stream()
		.collect(Collectors.joining("; ",
			"Our " + position + "s have: ", " skills"));
}

unit test

@Test
public void test_printEmployeeSkills() {
	List<Employee> company = createCompany();

	String skills = AdvancedStreamExamples
			.printEmployeeSkills(company, Position.QA);

	assertThat(skills, is("Our employees have: "
		+ "Selenium; C#; Java; JavaScript; Protractor skills"));
}

Salary statistics

This method returns Map with Position as key and IntSummaryStatistics as value. Collectors.groupingBy() groups employees by position key and then using Collectors.summarizingInt() to get statistics of employee’s salary.

code

public static Map<Position, IntSummaryStatistics> salaryStatistics(
		List<Employee> employees) {
	return employees.stream()
		.collect(Collectors.groupingBy(Employee::getPosition,
			Collectors.summarizingInt(Employee::getSalary)));
}

unit test

@Test
public void test_salaryStatistics() {
	List<Employee> company = createCompany();

	Map<Position, IntSummaryStatistics> salaries = AdvancedStreamExamples
			.salaryStatistics(company);

	assertThat(salaries.get(Position.DEV).getAverage(), is(114D));
	assertThat(salaries.get(Position.QA).getAverage(), is(102.5D));
	assertThat(salaries.get(Position.DEV_OPS).getAverage(), is(119.5D));
}

Position with the lowest average salary

Map with position and salary summary is retrieved and then with entrySet().stream() map is converted to stream of Entry<Position, IntSummaryStatistics> objects. Entries are sorted by average value in ascending order by custom comparator Double.compare(). findFirst() returns Optional<Entry>. The entry itself is obtained with get() method. The key which is basically the position is obtained with getKey() method.

code

public static Position positionWithLowestAverageSalary(
		List<Employee> employees) {
	return salaryStatistics(employees)
		.entrySet().stream()
		.sorted((entry1, entry2) 
			-> Double.compare(entry1.getValue().getAverage(),
				entry2.getValue().getAverage()))
		.findFirst()
		.get()
		.getKey();
}

unit test

@Test
public void test_positionWithLowestAverageSalary() {
	List<Employee> company = createCompany();

	Position position = AdvancedStreamExamples
			.positionWithLowestAverageSalary(company);

	assertThat(position, is(Position.QA));
}

Employees per each position

Grouping is done per position and employees are aggregated to list with Collectors.toList() method.

code

public static Map<Position, List<Employee>> employeesPerPosition(
		List<Employee> employees) {
	return employees.stream()
		.collect(Collectors.groupingBy(Employee::getPosition,
				Collectors.toList()));
}

unit test

@Test
public void test_employeesPerPosition() {
	List<Employee> company = createCompany();

	Map<Position, List<Employee>> employees = AdvancedStreamExamples
			.employeesPerPosition(company);

	assertThat(employees.get(Position.QA).size(), is(2));
	assertThat(employees.get(Position.QA).get(0).getName(),
		is("Ronald Wynn"));
	assertThat(employees.get(Position.QA).get(1).getName(),
		is("Erich Kohn"));
}

Employee names per each position

Similar to the method above, but one more mapping is needed here. Employee name should be extracted and converted to List<String>. This is done with Collectors.mapping(Employee::getName, Collectors.toList()) method.

code

public static Map<Position, List<String>> employeeNamesPerPosition(
		List<Employee> employees) {
	return employees.stream()
		.collect(Collectors.groupingBy(Employee::getPosition,
			Collectors.mapping(Employee::getName,
						Collectors.toList())));
}

unit test

@Test
public void test_employeeNamesPerPosition() {
	List<Employee> company = createCompany();

	Map<Position, List<String>> employees = AdvancedStreamExamples
			.employeeNamesPerPosition(company);

	assertThat(employees.get(Position.QA).size(), is(2));
	assertThat(employees.get(Position.QA).get(0), is("Ronald Wynn"));
	assertThat(employees.get(Position.QA).get(1), is("Erich Kohn"));
}

Employee count per position

Getting the count is done by Collectors.counting() method. It returns Long by default. If Integer is needed then this can be changed to Collectors.reducing(0, e -> 1, Integer::sum).

code

public static Map<Position, Long> employeesCountPerPosition(
			List<Employee> employees) {
	return employees.stream()
		.collect(Collectors.groupingBy(Employee::getPosition,
						Collectors.counting()));
}

unit test

@Test
public void test_employeesCountPerPosition() {
	List<Employee> company = createCompany();

	Map<Position, Long> employees = AdvancedStreamExamples
				.employeesCountPerPosition(company);

	assertThat(employees.get(Position.DEV), is(6L));
	assertThat(employees.get(Position.QA), is(2L));
	assertThat(employees.get(Position.DEV_OPS), is(2L));
}

Employees with duplicated first name

Employees are grouped into a map with key first name and List<Employee> as value. This map is converted to stream and filtered for List<Employee> greater than 1 element. The list is flattened with flatMap() and collected to List<Employee>.

code

public static List<Employee> employeesWithDuplicateFirstName(
		List<Employee> employees) {
	return employees.stream()
		.collect(Collectors.groupingBy(Employee::getFirstName,
						Collectors.toList()))
		.entrySet().stream()
		.filter(entry -> entry.getValue().size() > 1)
		.flatMap(entry -> entry.getValue().stream())
		.collect(Collectors.toList());
}

unit test

@Test
public void test_employeesWithDuplicateFirstName() {
	List<Employee> company = createCompany();

	List<Employee> employees = AdvancedStreamExamples
			.employeesWithDuplicateFirstName(company);

	assertThat(employees.size(), is(2));
	assertThat(employees.get(0).getName(), is("John Doe"));
	assertThat(employees.get(1).getName(), is("John Smith"));
}

Conclusion

In this post, I have just scratched the Java 8 Stream API. It offers a vast amount of functionalities which can be very useful for data processing. Beware when generating stream pipeline because it might end up consuming too many resources.

Automation Rhapsody

Java 8 features – Stream API advanced examples

Post summary: This post explains Java 8 Stream API with very basic code examples.

Memory consumption and better design

limit the stream

unit test

console output

code

unit test

console output

Convert comma separated List to a Map with handling duplicates

code

unit test

Examples of custom object

Company skill set

code

unit test

Skillset per position

code

unit test

Salary statistics

code

unit test

Position with the lowest average salary

code

unit test

Employees per each position

code

unit test

Employee names per each position

code

unit test

Employee count per position

code

unit test

Employees with duplicated first name

code

unit test

Conclusion

Related Posts