Java 8 features – Stream API advanced examples

Last Updated on by

Post summary: This post explains Java 8 Stream API with very basic code examples.

In Java 8 features – Lambda expressions, Interface changes, Stream API, DateTime API post I have briefly described most interesting Java 8 features. In current post I will give special attention to Stream API. This post is with more advanced code examples to elaborate on basic examples described in Java 8 features – Stream API basic examples post. Code examples here can be found in GitHub java-samples/java8 repository.

Memory consumption and better design

Stream API has operations that are short-circuiting, such as limit(). Once their goal is achieved they stop processing the stream. Most of the operators are not such. Here I have prepared example for possible pitfall when using not short-circuiting operators. For testing purposes I have created PeekObject which outputs a message to console once its constructor is called.

public class PeekObject {
	private String message;

	public PeekObject(String message) {
		this.message = message;
		System.out.println("Constructor called for: " + message);
	}

	public String getMessage() {
		return message;
	}
}

Assume a situation where there is a stream of many instances of PeekObject, but only several elements of the stream are needed, thus they have to be limited. Only 2 constructors are called in this case.

limit the stream

public static List<PeekObject> limit_shortCircuiting(List<String> stringList,
							int limit) {
	return stringList.stream()
		.map(PeekObject::new)
		.limit(limit)
		.collect(Collectors.toList());
}

unit test

@Test
public void test_limit_shortCircuiting() {
	System.out.println("limit_shortCircuiting");

	List<String> stringList = Arrays.asList("a", "b", "a", "c", "d", "a");

	List<PeekObject> result = AdvancedStreamExamples
		.limit_shortCircuiting(stringList, 2);

	assertThat(result.size(), is(2));
}

console output

limit_shortCircuiting
Constructor called for: a
Constructor called for: b

Now stream has to be sorted before limit is applied.

code

public static List<PeekObject> sorted_notShortCircuiting(
					List<String> stringList, int limit) {
	return stringList.stream()
		.map(PeekObject::new)
		.sorted((left, right) -> 
			left.getMessage().compareTo(right.getMessage()))
		.limit(limit)
		.collect(Collectors.toList());
}

unit test

@Test
public void test_sorted_notShortCircuiting() {
	System.out.println("sorted_notShortCircuiting");

	List<String> stringList = Arrays.asList("a", "b", "a", "c", "d", "a");

	List<PeekObject> result = AdvancedStreamExamples
		.sorted_notShortCircuiting(stringList, 2);

	assertThat(result.size(), is(2));
}

console output

sorted_notShortCircuiting
Constructor called for: a
Constructor called for: b
Constructor called for: a
Constructor called for: c
Constructor called for: d
Constructor called for: a

Notice that constructors for all objects in the stream are called. This will require Java to allocate enough memory for all the objects. There are 6 object in this example, but what if there are 6 million. Also current objects are very lightweight, but what if they are much bigger. Conclusion is that you have to know very well Stream API operations and apply them carefully when designing your stream pipeline.

Convert comma separated List to a Map with handling duplicates

There is a List of comma separated values which needs to be converted to a Map. List value “11,21” should become Map entry with key 11 and value 21. Duplicated keys also should be considered: Arrays.asList(“11,21”, “12,21”, “13,23”, “13,24”).

code

public static Map<Long, Long> splitToMap(List<String> stringsList) {
	return stringsList.stream()
		.filter(StringUtils::isNotEmpty)
		.map(line -> line.split(","))
		.filter(array -> array.length == 2 
			&& NumberUtils.isNumber(array[0])
			&& NumberUtils.isNumber(array[1]))
		.collect(Collectors.toMap(array -> Long.valueOf(array[0]), 
			array -> Long.valueOf(array[1]), (first, second) -> first)));
}

unit test

@Test
public void test_splitToMap() {
	List<String> stringList = Arrays
			.asList("11,21", "12,21", "13,23", "13,24");

	Map<Long, Long> result = AdvancedStreamExamples.splitToMap(stringList);

	assertThat(result.size(), is(3));
	assertThat(result.get(11L), is(21L));
	assertThat(result.get(12L), is(21L));
	assertThat(result.get(13L), is(23L));
}

Important bit in this conversion is (first, second) -> first), if it is not present there will be error java.lang.IllegalStateException: Duplicate key 23 (slightly misleading error, as duplicated key is 13, value is 23). This is a merge function which resolves collisions between values associated for the same key. It evaluates two values found for same key – first and second where current lambda returns the first. If overwrite is needed, hence keep the last entered value then lambda would be: (first, second) -> second).

Examples with custom object

Examples to follow use custom object Employee, where Position is an enumeration: public enum Position { DEV, DEV_OPS, QA }.

import java.util.List;

public class Employee {
	private String firstName;
	private String lastName;
	private Position position;
	private List<String> skills;
	private int salary;

	public Employee() {
	}

	public Employee(String firstName, String lastName,
				Position position, int salary) {
		this.firstName = firstName;
		this.lastName = lastName;
		this.position = position;
		this.salary = salary;
	}

	public void setSkills(String... skills) {
		this.skills = Arrays.stream(skills).collect(Collectors.toList());
	}

	public String getName() {
		return this.firstName + " " + this.lastName;
	}

	... Getters and Setters
}

A company has been created, it consists of 6 developers, 2 QAs and 2 DevOps..

private List<Employee> createCompany() {
	Employee dev1 = new Employee("John", "Doe", Position.DEV, 110);
	dev1.setSkills("C#", "ASP.NET", "React", "AngularJS");
	Employee dev2 = new Employee("Peter", "Doe", Position.DEV, 120);
	dev2.setSkills("Java", "MongoDB", "Dropwizard", "Chef");
	Employee dev3 = new Employee("John", "Smith", Position.DEV, 115);
	dev3.setSkills("Java", "JSP", "GlassFish", "MySql");
	Employee dev4 = new Employee("Brad", "Johston", Position.DEV, 100);
	dev4.setSkills("C#", "MSSQL", "Entity Framework");
	Employee dev5 = new Employee("Philip", "Branson", Position.DEV, 140);
	dev5.setSkills("JavaScript", "React", "AngularJS", "NodeJS");
	Employee dev6 = new Employee("Nathaniel", "Barth", Position.DEV, 99);
	dev6.setSkills("Java", "Dropwizard");
	Employee qa1 = new Employee("Ronald", "Wynn", Position.QA, 100);
	qa1.setSkills("Selenium", "C#", "Java");
	Employee qa2 = new Employee("Erich", "Kohn", Position.QA, 105);
	qa2.setSkills("Selenium", "JavaScript", "Protractor");
	Employee devOps1 = new Employee("Harold", "Jess", Position.DEV_OPS, 116);
	devOps1.setSkills("CentOS", "bash", "c", "puppet", "chef", "Ansible");
	Employee devOps2 = new Employee("Karl", "Madsen", Position.DEV_OPS, 123);
	devOps2.setSkills("Ubuntu", "bash", "Python", "chef");

	return Arrays.asList(dev1, dev2, dev3, dev4, dev5, dev6,
				qa1, qa2, devOps1, devOps2);
}

Company skill set

This method accepts none, one or many positions. If no positions are provided then information for all positions is printed. Positions array is transferred to List<String> because all objects used in lambda should be effectively final. Transferring array to stream is done with Arrays.stream() method. Employees are filtered based on desired position. Each skills list is concatenated and flattened to a stream with flatMap(). After this operation there is a stream of strings with all skills. Duplicates are removed with distinct(). Finally stream is collected to a list.

code

public static List<String> gatherEmployeeSkills(
		List<Employee> employees, Position... positions) {
	positions = positions == null || positions.length == 0 
		? Position.values() : positions;
	List<Position> searchPositions = Arrays.stream(positions)
			.collect(Collectors.toList());
	return employees == null ? Collections.emptyList()
		: employees.stream()
			.filter(employee 
				-> searchPositions.contains(employee.getPosition()))
			.flatMap(employee -> employee.getSkills().stream())
			.distinct()
			.collect(Collectors.toList());
}

unit test

@Test
public void test_gatherEmployeeSkills() {
	List<Employee> company = createCompany();

	List<String> skills = AdvancedStreamExamples
			.gatherEmployeeSkills(company);

	assertThat(skills.size(), is(25));
}

Skill set per position

This method first received list of all skills per position and converts it to a stream. Stream can be collected to a String with Collectors.joining() method. It accepts delimiter, prefix and suffix.

code

public static String printEmployeeSkills(
		List<Employee> employees, Position position) {
	List<String> skills = gatherEmployeeSkills(employees, position);
	return skills.stream()
		.collect(Collectors.joining("; ",
			"Our " + position + "s have: ", " skills"));
}

unit test

@Test
public void test_printEmployeeSkills() {
	List<Employee> company = createCompany();

	String skills = AdvancedStreamExamples
			.printEmployeeSkills(company, Position.QA);

	assertThat(skills, is("Our employees have: "
		+ "Selenium; C#; Java; JavaScript; Protractor skills"));
}

Salary statistics

This method returns Map with Position as key and IntSummaryStatistics as value. Collectors.groupingBy() groups employees by position key and then using Collectors.summarizingInt() to get statistics of employee’s salary.

code

public static Map<Position, IntSummaryStatistics> salaryStatistics(
		List<Employee> employees) {
	return employees.stream()
		.collect(Collectors.groupingBy(Employee::getPosition,
			Collectors.summarizingInt(Employee::getSalary)));
}

unit test

@Test
public void test_salaryStatistics() {
	List<Employee> company = createCompany();

	Map<Position, IntSummaryStatistics> salaries = AdvancedStreamExamples
			.salaryStatistics(company);

	assertThat(salaries.get(Position.DEV).getAverage(), is(114D));
	assertThat(salaries.get(Position.QA).getAverage(), is(102.5D));
	assertThat(salaries.get(Position.DEV_OPS).getAverage(), is(119.5D));
}

Position with lowest average salary

Map with position and salary summary is retrieved and then with entrySet().stream() map is converted to stream of Entry<Position, IntSummaryStatistics> objects. Entries are sorted by average value in ascending order by custom comparator Double.compare(). findFirst() returns Optional<Entry>. Entry itself is obtained with get() method. Key which is basically the position is obtained with getKey() method.

code

public static Position positionWithLowestAverageSalary(
		List<Employee> employees) {
	return salaryStatistics(employees)
		.entrySet().stream()
		.sorted((entry1, entry2) 
			-> Double.compare(entry1.getValue().getAverage(),
				entry2.getValue().getAverage()))
		.findFirst()
		.get()
		.getKey();
}

unit test

@Test
public void test_positionWithLowestAverageSalary() {
	List<Employee> company = createCompany();

	Position position = AdvancedStreamExamples
			.positionWithLowestAverageSalary(company);

	assertThat(position, is(Position.QA));
}

Employees per each position

Grouping is done per position and employees are aggregated to list with Collectors.toList() method.

code

public static Map<Position, List<Employee>> employeesPerPosition(
		List<Employee> employees) {
	return employees.stream()
		.collect(Collectors.groupingBy(Employee::getPosition,
				Collectors.toList()));
}

unit test

@Test
public void test_employeesPerPosition() {
	List<Employee> company = createCompany();

	Map<Position, List<Employee>> employees = AdvancedStreamExamples
			.employeesPerPosition(company);

	assertThat(employees.get(Position.QA).size(), is(2));
	assertThat(employees.get(Position.QA).get(0).getName(),
		is("Ronald Wynn"));
	assertThat(employees.get(Position.QA).get(1).getName(),
		is("Erich Kohn"));
}

Employee names per each position

Similar to method above, but one more mapping is needed here. Employee name should be extracted and converted to List<String>. This is done with Collectors.mapping(Employee::getName, Collectors.toList()) method.

code

public static Map<Position, List<String>> employeeNamesPerPosition(
		List<Employee> employees) {
	return employees.stream()
		.collect(Collectors.groupingBy(Employee::getPosition,
			Collectors.mapping(Employee::getName,
						Collectors.toList())));
}

unit test

@Test
public void test_employeeNamesPerPosition() {
	List<Employee> company = createCompany();

	Map<Position, List<String>> employees = AdvancedStreamExamples
			.employeeNamesPerPosition(company);

	assertThat(employees.get(Position.QA).size(), is(2));
	assertThat(employees.get(Position.QA).get(0), is("Ronald Wynn"));
	assertThat(employees.get(Position.QA).get(1), is("Erich Kohn"));
}

Employee count per position

Getting the count is done by Collectors.counting() method. It returns Long by default. If Integer is needed then this can be changed to Collectors.reducing(0, e -> 1, Integer::sum).

code

public static Map<Position, Long> employeesCountPerPosition(
			List<Employee> employees) {
	return employees.stream()
		.collect(Collectors.groupingBy(Employee::getPosition,
						Collectors.counting()));
}

unit test

@Test
public void test_employeesCountPerPosition() {
	List<Employee> company = createCompany();

	Map<Position, Long> employees = AdvancedStreamExamples
				.employeesCountPerPosition(company);

	assertThat(employees.get(Position.DEV), is(6L));
	assertThat(employees.get(Position.QA), is(2L));
	assertThat(employees.get(Position.DEV_OPS), is(2L));
}

Employees with duplicated first name

Employees are grouped into map with key first name and List<Employee> as value. This map is converted to stream and filtered for List<Employee> greater than 1 element. List is flattened with flatMap() and collected to List<Employee>.

code

public static List<Employee> employeesWithDuplicateFirstName(
		List<Employee> employees) {
	return employees.stream()
		.collect(Collectors.groupingBy(Employee::getFirstName,
						Collectors.toList()))
		.entrySet().stream()
		.filter(entry -> entry.getValue().size() > 1)
		.flatMap(entry -> entry.getValue().stream())
		.collect(Collectors.toList());
}

unit test

@Test
public void test_employeesWithDuplicateFirstName() {
	List<Employee> company = createCompany();

	List<Employee> employees = AdvancedStreamExamples
			.employeesWithDuplicateFirstName(company);

	assertThat(employees.size(), is(2));
	assertThat(employees.get(0).getName(), is("John Doe"));
	assertThat(employees.get(1).getName(), is("John Smith"));
}

Conclusion

In this post I have just scratched the Java 8 Stream API. It offers vast amount of functionalities which can be very useful for data processing. Beware when generating stream pipeline because it might end up consuming too much resources.

If you find this post useful, please share it to reach more people. Sharing is caring!
Share on FacebookShare on LinkedInTweet about this on TwitterShare on Google+Email this to someone
Category: Java, Tutorials | Tags: