You can use the DatasetDescriptor.Builder#schema(Class<?> type) method to infer a dataset schema from the instance variable fields of a Java class.

For example, the following class defines a Java object that provides access to the ID, Title, Release Date, and IMDB URL for a movie database.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
 package org.kitesdk.examples.data;
 /** Movie class */
 class Movie {
   private int id;
   private String title;
   private String releaseDate;
   private String imdbUrl;

   public Movie(int id, String title, String releaseDate, String imdbUrl) {
     this.id = id;
     this.title = title;
     this.releaseDate = releaseDate;
     this.imdbUrl = imdbUrl;
   }
        
   public Movie() {
     // Empty constructor for serialization purposes
   }

   public int getId() {
     return id;
   }

   public void setId (int id) {
     this.id = id;
   }

   public String getTitle() {
     return title;
   }

   public void setTitle(String title) {
     this.title = title;
   }
  
   public String getReleaseDate() {
      return releaseDate;
   }
  
   public void setReleaseDate (String releaseDate) {
     this.releaseDate = releaseDate;
   }
  
   public String getImdbUrl () {
     return imdbUrl;
   }
  
   public void setImdbUrl (String imdbUrl) {
     this.imdbUrl = imdbUrl;
   }

   public void describeMovie() {
     System.out.println(title + ", ID: " + id + ", was released on " + 
       releaseDate + ". For more info, see " + imdbUrl + ".");
   }
 }

Use the schema(Class<?>) builder method to create a descriptor that uses the Avro schema inferred from the Movie class.

1
2
3
DatasetDescriptor movieDesc = new DatasetDescriptor.Builder()
    .schema(Movie.class)
    .build();

The Builder uses the field names and data types to construct an Avro schema definition, which for the Movie class looks like this.

1
2
3
4
5
6
7
8
9
10
11
{
  "type":"record",
  "name":"Movie",
  "namespace":"org.kitesdk.examples.data",
  "fields":[
    {"name":"id","type":"int"},
    {"name":"title","type":"string"},
    {"name":"releaseDate","type":"string"},
    {"name":"imdbUrl","type":"string"}
  ]
}