I am trying to load an XML file into Scala and then check if the XML tag is empty by running a SELECT query.
Here is the code to load XML file using Spark XML
val df = sqlContext.read
.format("com.databricks.spark.xml")
.option("rowTag", "Session")
.load("test1.xml")
And the actual XML file contents are below
<Aggregator>
<Session>
<Data>
<Log>
<Event EventTime="2015-01-09T02:13:19">
<Start />
</Event>
<Event EventTime="2015-01-08T20:27:21">
<Power State="PowerOn"/>
</Event>
<Event EventTime="2015-01-08T23:54:52">
<Play>
<Video VideoId="4294967295" VideoType="MP4"/>
</Play>
</Event>
<Event EventTime="2015-12-27T03:59:20">
<Stop/>
</Event>
<Event EventTime="2015-05-01T09:37:56">
<Pause>
<File pauseTime="2015-04-15T19:35:00" key="902"/>
</Pause>
</Event>
</Log>
</Data>
</Session>
<Session>
<Data>
<Log>
<Event EventTime="2015-01-08T23:54:52">
<Play>
<Video VideoId="4294967295" VideoType="MP4"/>
</Play>
</Event>
<Event EventTime="2015-12-27T03:59:20">
<Stop/>
</Event>
<Event EventTime="2015-05-01T09:37:56">
<Pause>
<File pauseTime="2015-04-15T19:35:00" key="902"/>
</Pause>
</Event>
</Log>
</Data>
</Session>
<Session>
<Data>
<Log/>
</Data>
</Session>
</Aggregator>
After loading this XML into dataframe I am trying to compare get all events which are Start by using below SQL query
df.registerTempTable("eventsData")
sqlContext.sql(""" select * from eventsData where Event.Start is null""").show()
However this is not running successfully and not returning accurate output.
Can someone help me to get those records where Event has Start as its child tag??
Aucun commentaire:
Enregistrer un commentaire