I am trying to load an XML file into Scala and then check if the XML tag is empty by running a SELECT query.
Here is the code to load XML file using Spark XML
val df = sqlContext.read
.format("com.databricks.spark.xml")
.option("rowTag", "Session")
.load("test1.xml")
And the actual XML file contents are below
<Aggregator>
<Session>
<Data>
<Log>
<Event EventTime="2015-01-09T02:13:19">
<Start />
</Event>
<Event EventTime="2015-01-08T20:27:21">
<Power State="PowerOn"/>
</Event>
<Event EventTime="2015-01-08T23:54:52">
<Play>
<Video VideoId="4294967295" VideoType="MP4"/>
</Play>
</Event>
<Event EventTime="2015-12-27T03:59:20">
<Stop/>
</Event>
<Event EventTime="2015-05-01T09:37:56">
<Pause>
<File pauseTime="2015-04-15T19:35:00" key="902"/>
</Pause>
</Event>
</Log>
</Data>
</Session>
<Session>
<Data>
<Log>
<Event EventTime="2015-01-08T23:54:52">
<Play>
<Video VideoId="4294967295" VideoType="MP4"/>
</Play>
</Event>
<Event EventTime="2015-12-27T03:59:20">
<Stop/>
</Event>
<Event EventTime="2015-05-01T09:37:56">
<Pause>
<File pauseTime="2015-04-15T19:35:00" key="902"/>
</Pause>
</Event>
</Log>
</Data>
</Session>
<Session>
<Data>
<Log/>
</Data>
</Session>
</Aggregator>
After loading this XML into dataframe I am trying to compare get all events which are Start by using below SQL query
df.registerTempTable("eventsData")
sqlContext.sql(""" select * from eventsData where Event.Start is null""").show()
However this is not running successfully and not giving the results I want.
Can someone help me to get those records where Event has Start as its child tag??
Aucun commentaire:
Enregistrer un commentaire